Es posible introducir código Perl dentro de una expresión regular.
Para ello se usa la notación (?{code})
.
El siguiente texto esta tomado de la sección 'A-bit-of-magic:-executing-Perl-code-in-a-regular-expression' en perlretut:
Normally, regexps are a part of Perl expressions. Code evaluation expressions turn that around by allowing arbitrary Perl code to be a part of a regexp. A code evaluation expression is denoted (?code), with code a string of Perl statements.
Be warned that this feature is considered experimental, and may be changed without notice.
Code expressions are zero-width assertions, and the value they return depends on their environment.
There are two possibilities: either the
code expression is used as a conditional in a conditional expression
(?(condition)...)
, or it is not.
If the code expression is a conditional, the code is evaluated and the result (i.e., the result of the last statement) is used to determine truth or falsehood.
If the code expression is not used as a conditional, the assertion always evaluates true and the result is put into the special variable$^R
. The variable$^R
can then be used in code expressions later in the regexp
Las expresiones de código son zero-width assertions: no consumen entrada.
El resultado de la ejecución se salva en la variable especial $^R
.
Veamos un ejemplo:
pl@nereida:~/Lperltesting$ perl5.10.1 -wde 0 main::(-e:1): 0 DB<1> $x = "abcdef" DB<2> $x =~ /abc(?{ "Hi mom\n" })def(?{ print $^R })$/ Hi mom DB<3> $x =~ /abc(?{ print "Hi mom\n"; 4 })def(?{ print "$^R\n" })/ Hi mom 4 DB<4> $x =~ /abc(?{ print "Hi mom\n"; 4 })ddd(?{ print "$^R\n" })/ # does not match DB<5>En el último ejemplo (línea
DB<4>
) ninguno de los print
se ejecuta dado que no hay matching.
Tomado de la sección 'Extended-Patterns' en perlre:
This zero-width assertion evaluates any embedded Perl code. It always succeeds, and its code is not interpolated. Currently, the rules to determine where the code ends are somewhat convoluted.
Tomado de la sección 'Extended-Patterns' en perlre:
... can be used with the special variable $^N
to
capture the results of submatches in variables without having to keep
track of the number of nested parentheses. For example:
pl@nereida:~/Lperltesting$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> $x = "The brown fox jumps over the lazy dog" DB<2> x $x =~ /the (\S+)(?{ $color = $^N }) (\S+)(?{ $animal = $^N })/i 0 'brown' 1 'fox' DB<3> p "color=$color animal=$animal\n" color=brown animal=fox DB<4> $x =~ /the (\S+)(?{ print (substr($_,0,pos($_)))."\n" }) (\S+)/i The brown
Inside the(?{...})
block,$_
refers to the string the regular expression is matching against. You can also usepos()
to know what is the current position of matching within this string.
Si se usa un cuantificador sobre un código empotrado, actúa como un bucle:
pl@nereida:~/Lperltesting$ perl5.10.1 -wde 0 main::(-e:1): 0 DB<1> $x = "aaaa" DB<2> $x =~ /(a(?{ $c++ }))*/ DB<3> p $c 4 DB<4> $y = "abcd" DB<5> $y =~ /(?:(.)(?{ print "-$1-\n" }))*/ -a- -b- -c- -d-
Tomado (y modificado el ejemplo) de la sección 'Extended-Patterns' en perlre:
...The code is properly scoped in the following sense: If the assertion is backtracked (compare la sección 'Backtracking' en perlre), all changes introduced after localization are undone, so that
pl@nereida:~/Lperltesting$ cat embededcodescope.pl use strict; our ($cnt, $res); sub echo { local our $pre = substr($_,0,pos($_)); local our $post = (pos($_) < length)? (substr($_,1+pos($_))) : ''; print("$pre(count = $cnt)$post\n"); } $_ = 'a' x 8; m< (?{ $cnt = 0 }) # Initialize $cnt. ( a (?{ local $cnt = $cnt + 1; # Update $cnt, backtracking-safe. echo(); }) )* aaaa (?{ $res = $cnt }) # On success copy to non-localized # location. >x; print "FINAL RESULT: cnt = $cnt res =$res\n";
will set$res = 4
. Note that after the match,$cnt
returns to the globally introduced value, because the scopes that restrict local operators are unwound.
pl@nereida:~/Lperltesting$ perl5.8.8 -w embededcodescope.pl a(count = 1)aaaaaa aa(count = 2)aaaaa aaa(count = 3)aaaa aaaa(count = 4)aaa aaaaa(count = 5)aa aaaaaa(count = 6)a aaaaaaa(count = 7) aaaaaaaa(count = 8) FINAL RESULT: cnt = 0 res =4
Due to an unfortunate implementation issue, the Perl code contained in these blocks is treated as a compile time closure that can have seemingly bizarre consequences when used with lexically scoped variables inside of subroutines or loops. There are various workarounds for this, including simply using global variables instead. If you are using this construct and strange results occur then check for the use of lexically scoped variables.
For reasons of security, this construct is forbidden if the regular expression involves run-time interpolation of variables, unless the periloususe re 'eval'
pragma has been used (see re), or the variables contain results ofqr//
operator (see"qr/STRING/imosx"
in perlop).
This restriction is due to the wide-spread and remarkably convenient custom of using run-time determined strings as patterns. For example:
1. $re = <>; 2. chomp $re; 3. $string =~ /$re/;
Before Perl knew how to execute interpolated code within a pattern, this
operation was completely safe from a security point of view, although
it could raise an exception from an illegal pattern. If you turn on the
use re 'eval'
, though, it is no longer secure, so you should only do
so if you are also using taint
checking. Better yet, use the carefully
constrained evaluation within a Safe
compartment. See perlsec for details
about both these mechanisms. (Véase la sección 'Taint-mode' en perlsec)
Because Perl's regex engine is currently not re-entrant, interpolated code may not invoke the regex engine either directly withm//
ors///
, or indirectly with functions such as split.
Las acciones empotradas pueden utilizarse como mecanismo de depuración y de descubrimiento del comportamiento de nuestras expresiones regulares.
En el siguiente programa se produce una colisión
entre los nombres <i>
y <j>
de los patrones
que ocurren en el patrón <expr>
y en el patrón principal:
pl@nereida:~/Lperltesting$ cat -n clashofnamedofssets.pl 1 #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 2 use v5.10; 3 4 my $input; 5 6 local $" = ", "; 7 8 my $parser = qr{ 9 ^ (?<i> (?&expr)) (?<j> (?&expr)) \z 10 (?{ 11 say "main $+ hash:"; 12 say " ($_ => $+{$_}) " for sort keys %+; 13 }) 14 15 (?(DEFINE) 16 (?<expr> 17 (?<i> . ) 18 (?<j> . ) 19 (?{ 20 say "expr \$+ hash:"; 21 say " ($_ => $+{$_}) " for sort keys %+; 22 }) 23 ) 24 ) 25 }x; 26 27 $input = <>; 28 chomp($input); 29 if ($input =~ $parser) { 30 say "matches: ($&)"; 31 }La colisión hace que la salida sea esta:
pl@nereida:~/Lperltesting$ ./clashofnamedofssets.pl abab expr $+ hash: (i => a) (j => b) expr $+ hash: (i => ab) (j => b) main $+ hash: (i => ab) (j => ab) matches: (abab)Si se evitan las colisiones, se evita la pérdida de información:
pl@nereida:~/Lperltesting$ cat -n namedoffsets.pl 1 #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 2 use v5.10; 3 4 my $input; 5 6 local $" = ", "; 7 8 my $parser = qr{ 9 ^ (?<i> (?&expr)) (?<j> (?&expr)) \z 10 (?{ 11 say "main $+ hash:"; 12 say " ($_ => $+{$_}) " for sort keys %+; 13 }) 14 15 (?(DEFINE) 16 (?<expr> 17 (?<i_e> . ) 18 (?<j_e> . ) 19 (?{ 20 say "expr \$+ hash:"; 21 say " ($_ => $+{$_}) " for sort keys %+; 22 }) 23 ) 24 ) 25 }x; 26 27 $input = <>; 28 chomp($input); 29 if ($input =~ $parser) { 30 say "matches: ($&)"; 31 }
que al ejecutarse produce:
pl@nereida:~/Lperltesting$ ./namedoffsets.pl abab expr $+ hash: (i_e => a) (j_e => b) expr $+ hash: (i => ab) (i_e => a) (j_e => b) main $+ hash: (i => ab) (j => ab) matches: (abab)