Definición de Nombres de Patrones

Sig: Patrones Recursivos Sup: Algunas Extensiones Ant: Mirando hacia adetrás y Err: Si hallas una errata ...

Subsecciones

Definición de Nombres de Patrones

Perl 5.10 introduce la posibilidad de definir subpatrones en una sección del patrón.

Lo que dice `perlretut` sobre la definición de nombres de patrones

Citando la sección Defining named patterns en el documento la sección 'Defining-named-patterns' en perlretut para perl5.10:

Some regular expressions use identical subpatterns in several places. Starting with Perl 5.10, it is possible to define named subpatterns in a section of the pattern so that they can be called up by name anywhere in the pattern. This syntactic pattern for this definition group is "(?(DEFINE)(?<name>pattern)...)" An insertion of a named pattern is written as (?&name).

Veamos un ejemplo que define el lenguaje de los números en punto flotante:

pl@nereida:~/Lperltesting$ cat -n definingnamedpatterns.pl
 1  #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1 -w
 2  use v5.10;
 3
 4  my $regexp = qr{
 5     ^ (?<num>
 6               (?&osg)[\t\ ]* (?: (?&int)(?&dec)? | (?&dec) )
 7       )
 8       (?: [eE]
 9       (?<exp> (?&osg)(?&int)) )?
10     $
11        (?(DEFINE)
12         (?<osg>[-+]?)         # optional sign
13         (?<int>\d++)          # integer
14         (?<dec>\.(?&int))     # decimal fraction
15        )
16  }x;
17
18  my $input = <>;
19  chomp($input);
20  my @r;
21  if (@r = $input =~ $regexp) {
22    my $exp = $+{exp} || '';
23    say "$input matches: (num => '$+{num}', exp => '$exp')";
24  }
25  else {
26    say "does not match";
27  }

perlretut comenta sobre este ejemplo:

The example above illustrates this feature. The three subpatterns that are used more than once are the optional sign, the digit sequence for an integer and the decimal fraction. The DEFINE group at the end of the pattern contains their definition. Notice that the decimal fraction pattern is the first place where we can reuse the integer pattern.

Lo que dice `perlre` sobre la definición de patrones

Curiosamente, (DEFINE) se considera un caso particular de las expresiones regulares condicionales de la forma (?(condition)yes-pattern) (véase la sección 3.2.10). Esto es lo que dice la sección 'Extended-Patterns' en perlre al respecto:

A special form is the (DEFINE) predicate, which never executes directly its yes-pattern, and does not allow a no-pattern. This allows to define subpatterns which will be executed only by using the recursion mechanism. This way, you can define a set of regular expression rules that can be bundled into any pattern you choose.

It is recommended that for this usage you put the DEFINE block at the end of the pattern, and that you name any subpatterns defined within it.

Also, it's worth noting that patterns defined this way probably will not be as efficient, as the optimiser is not very clever about handling them.

An example of how this might be used is as follows:

   1. /(?<NAME>(?&NAME_PAT))(?<ADDR>(?&ADDRESS_PAT))
   2.        (?(DEFINE)
   3.          (?<NAME_PAT>....)
   4.          (?<ADRESS_PAT>....)
   5. )/x

Note that capture buffers matched inside of recursion are not accessible after the recursion returns, so the extra layer of capturing buffers is necessary. Thus $+{NAME_PAT} would not be defined even though $+{NAME} would be.

Lo que dice `perlvar` sobre patrones con nombre

Esto es lo que dice perlvar respecto a las variables implicadas %+ y %-. Con respecto a el hash %+:

%LAST_PAREN_MATCH, %+
Similar to @+ , the %+ hash allows access to the named capture buffers, should they exist, in the last successful match in the currently active dynamic scope.
For example, $+{foo} is equivalent to $1 after the following match:
```
   1. 'foo' =~ /(?<foo>foo)/;
```
The keys of the %+ hash list only the names of buffers that have captured (and that are thus associated to defined values).
The underlying behaviour of %+ is provided by the Tie::Hash::NamedCapture module.
Note: %- and %+ are tied views into a common internal hash associated with the last successful regular expression. Therefore mixing iterative access to them via each may have unpredictable results. Likewise, if the last successful match changes, then the results may be surprising.
%-
Similar to %+ , this variable allows access to the named capture buffers in the last successful match in the currently active dynamic scope. To each capture buffer name found in the regular expression, it associates a reference to an array containing the list of values captured by all buffers with that name (should there be several of them), in the order where they appear.
Here's an example:
```
   1. if ('1234' =~ /(?<A>1)(?<B>2)(?<A>3)(?<B>4)/) {
   2.   foreach my $bufname (sort keys %-) {
   3.     my $ary = $-{$bufname};
   4.     foreach my $idx (0..$#$ary) {
   5.       print "\$-{$bufname}[$idx] : ",
   6.             (defined($ary->[$idx]) ? "'$ary->[$idx]'" : "undef"),
   7.             "\n";
   8.     }
   9.   }
  10. }
```
would print out:
```
   1. $-{A}[0] : '1'
   2. $-{A}[1] : '3'
   3. $-{B}[0] : '2'
   4. $-{B}[1] : '4'
```
The keys of the %- hash correspond to all buffer names found in the regular expression.

Sig: Patrones Recursivos Sup: Algunas Extensiones Ant: Mirando hacia adetrás y Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22

Definición de Nombres de Patrones

Lo que dice perlretut sobre la definición de nombres de patrones

Lo que dice perlre sobre la definición de patrones

Lo que dice perlvar sobre patrones con nombre

Lo que dice `perlretut` sobre la definición de nombres de patrones

Lo que dice `perlre` sobre la definición de patrones

Lo que dice `perlvar` sobre patrones con nombre