Expresiones Regulares en tiempo de matching

Sig: Expresiones Condicionales Sup: Algunas Extensiones Ant: Ejecución de Código dentro Err: Si hallas una errata ...

Subsecciones

Expresiones Regulares en tiempo de matching

Los paréntesis especiales:

                (??{ Código Perl })

hacen que el Código Perl sea evaluado durante el tiempo de matching. El resultado de la evaluación se trata como una expresión regular. El match continuará intentando casar con la expresión regular retornada.

Paréntesis con memoria dentro de una pattern code expression

Los paréntesis en la expresión regular retornada no cuentan en el patrón exterior. Véase el siguiente ejemplo:

pl@nereida:~/Lperltesting$ cat -n postponedregexp.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w
    2   use v5.10;
    3   use strict;
    4 
    5   my $r = qr{(?x)                # ignore spaces
    6               ([ab])             # save 'a' or 'b' in $1
    7               (??{ "($^N)"x3 })  # 3 more of the same as in $1
    8             };
    9   say "<$&> lastpar = $#-" if 'bbbb' =~ $r;
   10   say "<$&> lastpar = $#-" if 'aaaa' =~ $r;
   11   say "<abab> didn't match" unless 'abab' =~ $r;
   12   say "<aaab> didn't match" unless 'aaab' =~ $r;

Como se ve, hemos accedido desde el código interior al último paréntesis usando $^N. Sigue una ejecución:

pl@nereida:~/Lperltesting$ ./postponedregexp.pl
<bbbb> lastpar = 1
<aaaa> lastpar = 1
<abab> didn't match
<aaab> didn't match

Ejemplo: Secuencias de dígitos de longitud especificada por el primer dígito

Consideremos el problema de escribir una expresión regular que reconoce secuencias no vacías de dígitos tales que la longitud de la secuencia restante viene determinada por el primer dígito. Esta es una solución:

pl@nereida:~/Lperltesting$ cat -n intints.pl
    1 #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w
    2 use v5.10;
    3 use strict;
    4 
    5 my $r = qr{(?x)                # ignore spaces
    6            (\d)                # a digit
    7            ( (??{
    8                "\\d{$^N}"      # as many as the former
    9              })                # digit says
   10            )
   11           };
   12 say "<$&> <$1> <$2>" if '3428' =~ $r;
   13 say "<$&> <$1> <$2>" if '228' =~ $r;
   14 say "<$&> <$1> <$2>" if '14' =~ $r;
   15 say "24 does not match" unless '24' =~ $r;
   16 say "4324 does not match" unless '4324' =~ $r;

Cuando se ejecuta se obtiene:

pl@nereida:~/Lperltesting$ ./intints.pl
<3428> <3> <428>
<228> <2> <28>
<14> <1> <4>
24 does not match
4324 does not match

Ejemplo: Secuencias de dígitos no repetidos

Otro ejemplo: queremos escribir una expresión regular que reconozca secuencias de $n dígitos en las que no todos los dígitos se repiten. Donde quizá $n es capturado de un paréntesis anterior en la expresión regular. Para simplificar la ilustración de la técnica supongamos que $n = 7:

pl@nereida:~$  perl5.10.1 -wdE 0
main::(-e:1):   0
  DB<1>  x join '', map { "(?!".$_."{7})" } 0..9
0  '(?!0{7})(?!1{7})(?!2{7})(?!3{7})(?!4{7})(?!5{7})(?!6{7})(?!7{7})(?!8{7})(?!9{7})'
  DB<2>  x '7777777' =~ /(??{join '', map { "(?!".$_."{7})" } 0..9})(\d{7})/
  empty array
  DB<3>  x '7777778' =~ /(??{join '', map { "(?!".$_."{7})" } 0..9})(\d{7})/
0  7777778
  DB<4>  x '4444444' =~ /(??{join '', map { "(?!".$_."{7})" } 0..9})(\d{7})/
  empty array
  DB<5>  x '4422444' =~ /(??{join '', map { "(?!".$_."{7})" } 0..9})(\d{7})/
0  4422444

Palíndromos con independencia del acento

Se trata en este ejercicio de generalizar la expresión regular introducida en la sección 3.2.5 para reconocer los palabra-palíndromos.

Se trata de encontrar una regexp que acepte que la lectura derecha e inversa de una frase en Español pueda diferir en la acentuación (como es el caso del clásico palíndromo dábale arroz a la zorra el abad). Una solución trivial es preprocesar la cadena eliminando los acentos. Supondremos sin embargo que se quiere trabajar sobre la cadena original. He aquí una solucion:

    1 pl@nereida:~/Lperltesting$ cat actionspanishpalin.pl 
    2 #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w -CIOEioA
    3 use v5.10;
    4 use strict;
    5 use utf8;
    6 use re 'eval';
    7 use Switch;
    8 
    9 sub f {
   10   my $char = shift;
   11 
   12   switch($char) {
   13     case [ qw{a á} ] { return '[aá]' }
   14     case [ qw{e é} ] { return '[eé]' }
   15     case [ qw{i í} ] { return '[ií]' }
   16     case [ qw{o ó} ] { return '[oó]' }
   17     case [ qw{u ú} ] { return '[uú]' }
   18     else             { return $char  };
   19   }
   20 }
   21 
   22 my $regexp = qr/^(\W* (?: 
   23                             (\w) (?-2)(??{ f($^N) })
   24                           | \w? 
   25                       ) \W*
   26                   )
   27                 $
   28                /ix;
   29 
   30 my $input = <>; # Try: 'dábale arroz a la zorra el abad';
   31 chomp($input);
   32 if ($input =~ $regexp) {
   33   say "$input is a palindrome";
   34 }
   35 else {
   36   say "$input does not match";
   37 }

Sigue un ejemplo de ejecución:

pl@nereida:~/Lperltesting$ ./actionspanishpalin.pl 
dábale arroz a la zorra el abad
dábale arroz a la zorra el abad is a palindrome
pl@nereida:~/Lperltesting$ ./actionspanishpalin.pl 
éoíúaáuioé
éoíúaáuioé is a palindrome
pl@nereida:~/Lperltesting$ ./actionspanishpalin.pl 
dáed
dáed does not match

Postponiendo para conseguir recursividad

Véase el nodo Complex regex for maths formulas para la formulación del problema:

Hiya monks,

Im having trouble getting my head around a regular expression to match sequences. I need to catch all exceptions where a mathematical expression is illegal...

There must be either a letter or a digit either side of an operator parenthesis must open and close next to letters or digits, not next to operators, and do not have to exist variables must not be more than one letter Nothing other than a-z,A-Z,0-9,+,-,*,/,(,) can be used

Can anyone offer a hand on how best to tackle this problem?

many thanks

La respuesta dada por ikegami usa (?{{ ... }) para conseguir una conducta recursiva en versiones de perl anteriores a la 5.10:

pl@nereida:~/Lperltesting$ cat -n complexformula.pl
    1   #!/usr/bin/perl
    2   use strict;
    3   use warnings;
    4 
    5   sub is_valid_expr {
    6      use re 'eval'; # to allow Eval-group at runtime
    7 
    8      local our ($skip, $term, $expr);
    9      $skip = qr! \s* !x;
   10      $term = qr! $skip [a-zA-Z]+              # A term is an identifier
   11                | $skip [1-9][0-9]*            # or a number
   12                | $skip \( (??{ $expr }) $skip # or an expression
   13                        \)
   14                !x;
   15      $expr = qr! $term                         # A expr is a term
   16                  (?: $skip [-+*/] $term )*     # or a term + a term ...
   17                !x;
   18 
   19      return $_[0] =~ / ^ $expr $skip \z /x;
   20   }
   21 
   22   print(is_valid_expr($_) ? "$_ is valid\n" : "$_ is not valid\n") foreach (
   23    '(a + 3)',
   24    '(3 * 4)+(b + x)',
   25    '(5 - a)*z',
   26    '3 + 2',
   27 
   28    '!3 + 2',
   29    '3 + 2!',
   30 
   31    '3 a',
   32    '3 3',
   33    '3 * * 3',
   34 
   35    '2 - 3 * 4',
   36    '2 - 3 + 4',
   37   );

Sigue el resultado de la ejecución:

pl@nereida:~/Lperltesting$ perl complexformula.pl
(a + 3) is valid
(3 * 4)+(b + x) is valid
(5 - a)*z is valid
3 + 2 is valid
!3 + 2 is not valid
3 + 2! is not valid
3 a is not valid
3 3 is not valid
3 * * 3 is not valid
2 - 3 * 4 is valid
2 - 3 + 4 is valid

Caveats

Estos son algunos puntos a tener en cuenta cuando se usan patrones postpuestos. Véase la entrada (??{ code }) en la sección 'Extended-Patterns' en perlre:

WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine.

This is a postponed regular subexpression. The code is evaluated at run time, at the moment this subexpression may match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

The code is not interpolated.

As before, the rules to determine where the code ends are currently somewhat convoluted.

Because perl's regex engine is not currently re-entrant, delayed code may not invoke the regex engine either directly with m// or s///), or indirectly with functions such as split.

Recursing deeper than 50 times without consuming any input string will result in a fatal error. The maximum depth is compiled into perl, so changing it requires a custom build.

Sig: Expresiones Condicionales Sup: Algunas Extensiones Ant: Ejecución de Código dentro Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22