Citando la sección Defining named patterns en el documento la sección 'Defining-named-patterns' en perlretut para perl5.10:
Some regular expressions use identical subpatterns in several places. Starting with Perl 5.10, it is possible to define named subpatterns in a section of the pattern so that they can be called up by name anywhere in the pattern. This syntactic pattern for this definition group is"(?(DEFINE)(?<name>pattern)...)"
An insertion of a named pattern is written as(?&name)
.
Veamos un ejemplo que define el lenguaje de los números en punto flotante:
pl@nereida:~/Lperltesting$ cat -n definingnamedpatterns.pl 1 #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w 2 use v5.10; 3 4 my $regexp = qr{ 5 ^ (?<num> 6 (?&osg)[\t\ ]* (?: (?&int)(?&dec)? | (?&dec) ) 7 ) 8 (?: [eE] 9 (?<exp> (?&osg)(?&int)) )? 10 $ 11 (?(DEFINE) 12 (?<osg>[-+]?) # optional sign 13 (?<int>\d++) # integer 14 (?<dec>\.(?&int)) # decimal fraction 15 ) 16 }x; 17 18 my $input = <>; 19 chomp($input); 20 my @r; 21 if (@r = $input =~ $regexp) { 22 my $exp = $+{exp} || ''; 23 say "$input matches: (num => '$+{num}', exp => '$exp')"; 24 } 25 else { 26 say "does not match"; 27 }perlretut comenta sobre este ejemplo:
The example above illustrates this feature.
The three subpatterns that
are used more than once are the optional sign,
the digit sequence for
an integer
and the decimal fraction. The DEFINE
group at the end of
the pattern contains their definition. Notice that the decimal fraction
pattern is the first place where we can reuse the integer pattern.
Curiosamente, (DEFINE)
se considera un caso particular de
las expresiones regulares condicionales de la forma (?(condition)yes-pattern)
(véase la sección 3.2.10).
Esto es lo que dice la sección 'Extended-Patterns' en perlre
al respecto:
A special form is the (DEFINE)
predicate, which never executes
directly its yes-pattern, and does not allow a no-pattern. This allows
to define subpatterns which will be executed only by using the recursion
mechanism. This way, you can define a set of regular expression rules
that can be bundled into any pattern you choose.
It is recommended that for this usage you put the DEFINE
block at the
end of the pattern, and that you name any subpatterns defined within it.
Also, it's worth noting that patterns defined this way probably will not be as efficient, as the optimiser is not very clever about handling them.
An example of how this might be used is as follows:
1. /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT)) 2. (?(DEFINE) 3. (?<NAME_PAT>....) 4. (?<ADRESS_PAT>....) 5. )/x
Note that capture buffers matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing buffers is necessary. Thus$+{NAME_PAT}
would not be defined even though$+{NAME}
would be.
%+
y %-
. Con respecto a el hash %+
:
%LAST_PAREN_MATCH
, %+
Similar to @+
, the %+
hash allows access to the named capture buffers,
should they exist, in the last successful match in the currently active
dynamic scope.
For example, $+{foo}
is equivalent to $1
after the following match:
1. 'foo' =~ /(?<foo>foo)/;
The keys of the %+
hash list only the names of buffers that have
captured (and that are thus associated to defined values).
The underlying behaviour of %+
is provided by the Tie::Hash::NamedCapture
module.
Note: %-
and %+
are tied views into a common internal
hash associated with the last successful regular expression. Therefore
mixing iterative access to them via each
may have unpredictable
results. Likewise, if the last successful match changes, then the results
may be surprising.
%-
Similar to %+
, this variable allows access to the named capture
buffers in the last successful match in the currently active dynamic
scope. To each capture buffer name found in the regular expression,
it associates a reference to an array containing the list of values
captured by all buffers with that name (should there be several of them),
in the order where they appear.
Here's an example:
1. if ('1234' =~ /(?<A>1)(?<B>2)(?<A>3)(?<B>4)/) { 2. foreach my $bufname (sort keys %-) { 3. my $ary = $-{$bufname}; 4. foreach my $idx (0..$#$ary) { 5. print "\$-{$bufname}[$idx] : ", 6. (defined($ary->[$idx]) ? "'$ary->[$idx]'" : "undef"), 7. "\n"; 8. } 9. } 10. }
would print out:
1. $-{A}[0] : '1' 2. $-{A}[1] : '3' 3. $-{B}[0] : '2' 4. $-{B}[1] : '4'
The keys of the %-
hash correspond to all buffer names found in
the regular expression.