Ejecución de Código dentro de una Expresión Regular

Sig: Expresiones Regulares en tiempo Sup: Algunas Extensiones Ant: Perl 5.10: Numeración de Err: Si hallas una errata ...

Subsecciones

Ejecución de Código dentro de una Expresión Regular

Es posible introducir código Perl dentro de una expresión regular. Para ello se usa la notación (?{code}).

El siguiente texto esta tomado de la sección 'A-bit-of-magic:-executing-Perl-code-in-a-regular-expression' en perlretut:

Normally, regexps are a part of Perl expressions. Code evaluation expressions turn that around by allowing arbitrary Perl code to be a part of a regexp. A code evaluation expression is denoted (?code), with code a string of Perl statements.

Be warned that this feature is considered experimental, and may be changed without notice.

Code expressions are zero-width assertions, and the value they return depends on their environment.

There are two possibilities: either the code expression is used as a conditional in a conditional expression (?(condition)...), or it is not.

If the code expression is a conditional, the code is evaluated and the result (i.e., the result of the last statement) is used to determine truth or falsehood.
If the code expression is not used as a conditional, the assertion always evaluates true and the result is put into the special variable $^R . The variable $^R can then be used in code expressions later in the regexp

Resultado de la última ejecución

Las expresiones de código son zero-width assertions: no consumen entrada. El resultado de la ejecución se salva en la variable especial $^R.

Veamos un ejemplo:

pl@nereida:~/Lperltesting$ perl5.10.1 -wde 0
main::(-e:1):   0
  DB<1> $x = "abcdef"
  DB<2> $x =~ /abc(?{ "Hi mom\n" })def(?{ print $^R })$/
Hi mom
  DB<3> $x =~ /abc(?{ print "Hi mom\n"; 4 })def(?{ print "$^R\n" })/
Hi mom
4
  DB<4> $x =~ /abc(?{ print "Hi mom\n"; 4 })ddd(?{ print "$^R\n" })/ # does not match
  DB<5>

En el último ejemplo (línea DB<4>) ninguno de los print se ejecuta dado que no hay matching.

El Código empotrado no es interpolado

Tomado de la sección 'Extended-Patterns' en perlre:

This zero-width assertion evaluates any embedded Perl code. It always succeeds, and its code is not interpolated. Currently, the rules to determine where the code ends are somewhat convoluted.

Contenido del último paréntesis y la variable por defecto en acciones empotradas

Tomado de la sección 'Extended-Patterns' en perlre:

... can be used with the special variable $^N to capture the results of submatches in variables without having to keep track of the number of nested parentheses. For example:

pl@nereida:~/Lperltesting$ perl5.10.1 -wdE 0
main::(-e:1):   0
  DB<1> $x = "The brown fox jumps over the lazy dog"
  DB<2> x $x =~ /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i
0  'brown'
1  'fox'
  DB<3> p "color=$color animal=$animal\n"
color=brown animal=fox
  DB<4> $x =~ /the (\S+)(?{ print (substr($_,0,pos($_)))."\n" }) (\S+)/i
The brown

Inside the (?{...}) block, $_ refers to the string the regular expression is matching against. You can also use pos() to know what is the current position of matching within this string.

Los cuantificadores y el código empotrado

Si se usa un cuantificador sobre un código empotrado, actúa como un bucle:

pl@nereida:~/Lperltesting$ perl5.10.1 -wde 0
main::(-e:1):   0
  DB<1> $x = "aaaa"
  DB<2>  $x =~ /(a(?{ $c++ }))*/
  DB<3> p $c
4
  DB<4> $y = "abcd"
  DB<5> $y =~ /(?:(.)(?{ print "-$1-\n" }))*/
-a-
-b-
-c-
-d-

Ámbito

Tomado (y modificado el ejemplo) de la sección 'Extended-Patterns' en perlre:

...The code is properly scoped in the following sense: If the assertion is backtracked (compare la sección 'Backtracking' en perlre), all changes introduced after localization are undone, so that

pl@nereida:~/Lperltesting$ cat embededcodescope.pl use strict; our ($cnt, $res); sub echo { local our $pre = substr($_,0,pos($_)); local our $post = (pos($_) < length)? (substr($_,1+pos($_))) : ''; print("$pre(count = $cnt)$post\n"); } $_ = 'a' x 8; m< (?{ $cnt = 0 }) # Initialize $cnt. ( a (?{ local $cnt = $cnt + 1; # Update $cnt, backtracking-safe. echo(); }) )* aaaa (?{ $res = $cnt }) # On success copy to non-localized # location. >x; print "FINAL RESULT: cnt = $cnt res =$res\n";

will set $res = 4 . Note that after the match, $cnt returns to the globally introduced value, because the scopes that restrict local operators are unwound.

pl@nereida:~/Lperltesting$ perl5.8.8 -w embededcodescope.pl
a(count = 1)aaaaaa
aa(count = 2)aaaaa
aaa(count = 3)aaaa
aaaa(count = 4)aaa
aaaaa(count = 5)aa
aaaaaa(count = 6)a
aaaaaaa(count = 7)
aaaaaaaa(count = 8)
FINAL RESULT: cnt = 0 res =4

Caveats

Due to an unfortunate implementation issue, the Perl code contained in these blocks is treated as a compile time closure that can have seemingly bizarre consequences when used with lexically scoped variables inside of subroutines or loops. There are various workarounds for this, including simply using global variables instead. If you are using this construct and strange results occur then check for the use of lexically scoped variables.
For reasons of security, this construct is forbidden if the regular expression involves run-time interpolation of variables, unless the perilous use re 'eval' pragma has been used (see re), or the variables contain results of qr// operator (see "qr/STRING/imosx" in perlop).

This restriction is due to the wide-spread and remarkably convenient custom of using run-time determined strings as patterns. For example:
```
   1. $re = <>;
   2. chomp $re;
   3. $string =~ /$re/;
```
Before Perl knew how to execute interpolated code within a pattern, this operation was completely safe from a security point of view, although it could raise an exception from an illegal pattern. If you turn on the use re 'eval' , though, it is no longer secure, so you should only do so if you are also using taint checking. Better yet, use the carefully constrained evaluation within a Safe compartment. See perlsec for details about both these mechanisms. (Véase la sección 'Taint-mode' en perlsec)
Because Perl's regex engine is currently not re-entrant, interpolated code may not invoke the regex engine either directly with m// or s///, or indirectly with functions such as split.

Depurando con código empotrado Colisiones en los Nombres de las Subexpresiones Regulares

Las acciones empotradas pueden utilizarse como mecanismo de depuración y de descubrimiento del comportamiento de nuestras expresiones regulares.

En el siguiente programa se produce una colisión entre los nombres <i> y <j> de los patrones que ocurren en el patrón <expr> y en el patrón principal:

pl@nereida:~/Lperltesting$ cat -n clashofnamedofssets.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1
    2   use v5.10;
    3 
    4   my $input;
    5 
    6   local $" = ", ";
    7 
    8   my $parser = qr{
    9       ^ (?<i> (?&expr)) (?<j> (?&expr)) \z
   10         (?{
   11              say "main $+ hash:";
   12              say " ($_ => $+{$_}) " for sort keys %+;
   13          })
   14 
   15       (?(DEFINE)
   16           (?<expr>
   17               (?<i> . )
   18               (?<j> . )
   19                 (?{
   20                     say "expr \$+ hash:";
   21                     say " ($_ => $+{$_}) " for sort keys %+;
   22                 })
   23           )
   24       )
   25   }x;
   26 
   27   $input = <>;
   28   chomp($input);
   29   if ($input =~ $parser) {
   30     say "matches: ($&)";
   31   }

La colisión hace que la salida sea esta:

pl@nereida:~/Lperltesting$ ./clashofnamedofssets.pl
abab
expr $+ hash:
 (i => a)
 (j => b)
expr $+ hash:
 (i => ab)
 (j => b)
main $+ hash:
 (i => ab)
 (j => ab)
matches: (abab)

Si se evitan las colisiones, se evita la pérdida de información:

pl@nereida:~/Lperltesting$ cat -n namedoffsets.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1
    2   use v5.10;
    3 
    4   my $input;
    5 
    6   local $" = ", ";
    7 
    8   my $parser = qr{
    9       ^ (?<i> (?&expr)) (?<j> (?&expr)) \z
   10         (?{
   11              say "main $+ hash:";
   12              say " ($_ => $+{$_}) " for sort keys %+;
   13          })
   14 
   15       (?(DEFINE)
   16           (?<expr>
   17               (?<i_e> . )
   18               (?<j_e> . )
   19                 (?{
   20                     say "expr \$+ hash:";
   21                     say " ($_ => $+{$_}) " for sort keys %+;
   22                 })
   23           )
   24       )
   25   }x;
   26 
   27   $input = <>;
   28   chomp($input);
   29   if ($input =~ $parser) {
   30     say "matches: ($&)";
   31   }

que al ejecutarse produce:

pl@nereida:~/Lperltesting$ ./namedoffsets.pl
abab
expr $+ hash:
 (i_e => a)
 (j_e => b)
expr $+ hash:
 (i => ab)
 (i_e => a)
 (j_e => b)
main $+ hash:
 (i => ab)
 (j => ab)
matches: (abab)

Sig: Expresiones Regulares en tiempo Sup: Algunas Extensiones Ant: Perl 5.10: Numeración de Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22