If a subrule call is quantified with a repetition specifier:
<rule: file_sequence> <file>+
then each repeated match overwrites the corresponding entry in the
surrounding rule’s result-hash, so only the result of the final
repetition will be
retained. That is, if the above example matched the string foo.pl bar.py baz.php
,
then the result-hash would contain:
file_sequence { "" => 'foo.pl bar.py baz.php', file => 'baz.php', }
Existe un caveat con el uso de los operadores de repetición y el manejo de los blancos. Véase el siguiente programa:
pl@nereida:~/Lregexpgrammars/demo$ cat -n numbers3.pl 1 use strict; 2 use warnings; 3 use 5.010; 4 use Data::Dumper; 5 6 my $rbb = do { 7 use Regexp::Grammars; 8 9 qr{ 10 <numbers> 11 12 <rule: numbers> 13 (<number>)+ 14 15 <token: number> \s*\d+ 16 }xms; 17 }; 18 19 while (my $input = <>) { 20 if ($input =~ m{$rbb}) { 21 say("matches: <$&>"); 22 say Dumper \%/; 23 } 24 }Obsérvese el uso explícito de espacios
\s*\d+
en la definición de number
.
Sigue un ejemplo de ejecución:
pl@nereida:~/Lregexpgrammars/demo$ perl5_10_1 numbers3.pl 1 2 3 4 matches: <1 2 3 4> $VAR1 = { '' => '1 2 3 4', 'numbers' => { '' => '1 2 3 4', 'number' => ' 4' } };
Si se eliminan los blancos de la definición de
number
:
pl@nereida:~/Lregexpgrammars/demo$ cat -n numbers.pl 1 use strict; 2 use warnings; 3 use 5.010; 4 use Data::Dumper; 5 6 my $rbb = do { 7 use Regexp::Grammars; 8 9 qr{ 10 <numbers> 11 12 <rule: numbers> 13 (<number>)+ 14 15 <token: number> \d+ 16 }xms; 17 }; 18 19 while (my $input = <>) { 20 if ($input =~ m{$rbb}) { 21 say("matches: <$&>"); 22 say Dumper \%/; 23 } 24 }se obtiene una conducta que puede sorprender:
pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 numbers.pl 12 34 56 matches: <12> $VAR1 = { '' => '12', 'numbers' => { '' => '12', 'number' => '12' } };
La explicación está en la documentación: véase la sección Grammar Syntax:
<rule: IDENTIFIER>
Define a rule whose name is specified by the supplied identifier.
Everything following the<rule:...>
directive (up to the next<rule:...>
or<token:...>
directive) is treated as part of the rule being defined.
Any whitespace in the rule is replaced by a call to the<.ws>
subrule (which defaults to matching\s*
, but may be explicitly redefined).
También podríamos haber resuelto el problema introduciendo un blanco explícito dentro del cierre positivo:
<rule: numbers> (<number> )+ <token: number> \d+
Usually, that’s not the desired outcome, so Regexp::Grammars provides another mechanism by which to call a subrule; one that saves all repetitions of its results.
A regular subrule call consists of the rule’s name surrounded by angle
brackets. If, instead, you surround the rule’s name with <[...]>
(angle and square brackets) like so:
<rule: file_sequence> <[file]>+
then the rule is invoked in exactly the same way, but the result of that
submatch is pushed onto an array nested inside the appropriate result-hash
entry. In other words, if the above example matched the same
foo.pl bar.py baz.php
string, the result-hash would contain:
file_sequence { "" => 'foo.pl bar.py baz.php', file => [ 'foo.pl', 'bar.py', 'baz.php' ], }
Teniendo en cuenta lo dicho anteriormente sobre los blancos dentro de los cuantificadores, es necesario introducir blancos dentro del operador de repetición:
pl@nereida:~/Lregexpgrammars/demo$ cat -n numbers4.pl 1 use strict; 2 use warnings; 3 use 5.010; 4 use Data::Dumper; 5 6 my $rbb = do { 7 use Regexp::Grammars; 8 9 qr{ 10 <numbers> 11 12 <rule: numbers> 13 (?: <[number]> )+ 14 15 <token: number> \d+ 16 }xms; 17 }; 18 19 while (my $input = <>) { 20 if ($input =~ m{$rbb}) { 21 say("matches: <$&>"); 22 say Dumper \%/; 23 } 24 }Al ejecutar este programa obtenemos:
pl@nereida:~/Lregexpgrammars/demo$ perl5_10_1 numbers4.pl 1 2 3 4 matches: <1 2 3 4 > $VAR1 = { '' => '1 2 3 4 ', 'numbers' => { '' => '1 2 3 4 ', 'number' => [ '1', '2', '3', '4' ] } };
This listifying subrule call can also be useful for non-repeated subrule calls, if the same subrule is invoked in several places in a grammar. For example if a cmdline option could be given either one or two values, you might parse it:
<rule: size_option> -size <[size]> (?: x <[size]> )?
The result-hash entry for size
would then always contain an array,
with either one or two elements, depending on the input being parsed.
Sigue un ejemplo:
pl@nereida:~/Lregexpgrammars/demo$ cat -n sizes.pl 1 use strict; 2 use warnings; 3 use 5.010; 4 use Data::Dumper; 5 6 my $rbb = do { 7 use Regexp::Grammars; 8 9 qr{ 10 <command> 11 12 <rule: command> ls <size_option> 13 14 <rule: size_option> 15 -size <[size]> (?: x <[size]> )? 16 17 <token: size> \d+ 18 }x; 19 }; 20 21 while (my $input = <>) { 22 while ($input =~ m{$rbb}g) { 23 say("matches: <$&>"); 24 say Dumper \%/; 25 } 26 }Veamos su comportamiento con diferentes entradas:
pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 sizes.pl ls -size 4 matches: <ls -size 4 > $VAR1 = { '' => 'ls -size 4 ', 'command' => { 'size_option' => { '' => '-size 4 ', 'size' => [ '4' ] }, '' => 'ls -size 4 ' } }; ls -size 2x8 matches: <ls -size 2x8 > $VAR1 = { '' => 'ls -size 2x8 ', 'command' => { 'size_option' => { '' => '-size 2x8 ', 'size' => [ '2', '8' ] }, '' => 'ls -size 2x8 ' } };
Listifying subrules can also be given aliases, just like ordinary subrules. The alias is always specified inside the square brackets:
<rule: size_option> -size <[size=pos_integer]> (?: x <[size=pos_integer]> )?
Here, the sizes are parsed using thepos_integer
rule, but saved in the result-hash in an array under the keysize
.
Sigue un ejemplo:
pl@nereida:~/Lregexpgrammars/demo$ cat -n aliasedsizes.pl 1 use strict; 2 use warnings; 3 use 5.010; 4 use Data::Dumper; 5 6 my $rbb = do { 7 use Regexp::Grammars; 8 9 qr{ 10 <command> 11 12 <rule: command> ls <size_option> 13 14 <rule: size_option> 15 -size <[size=int]> (?: x <[size=int]> )? 16 17 <token: int> \d+ 18 }x; 19 }; 20 21 while (my $input = <>) { 22 while ($input =~ m{$rbb}g) { 23 say("matches: <$&>"); 24 say Dumper \%/; 25 } 26 }Veamos el resultado de una ejecución:
pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 aliasedsizes.pl ls -size 2x4 matches: <ls -size 2x4 > $VAR1 = { '' => 'ls -size 2x4 ', 'command' => { 'size_option' => { '' => '-size 2x4 ', 'size' => [ '2', '4' ] }, '' => 'ls -size 2x4 ' } };
En este ejemplo aparece <number>+
sin
corchetes ni paréntesis:
pl@nereida:~/Lregexpgrammars/demo$ cat -n numbers5.pl 1 use strict; 2 use warnings; 3 use 5.010; 4 use Data::Dumper; 5 6 my $rbb = do { 7 use Regexp::Grammars; 8 9 qr{ 10 <numbers> 11 12 <rule: numbers> 13 <number>+ 14 15 <token: number> \d+ 16 }xms; 17 }; 18 19 while (my $input = <>) { 20 if ($input =~ m{$rbb}) { 21 say("matches: <$&>"); 22 say Dumper \%/; 23 } 24 }Este programa produce un mensaje de advertencia:
pl@nereida:~/Lregexpgrammars/demo$ perl5.10.1 numbers5.pl warn | Repeated subrule <number>+ will only capture its final match | (Did you mean <[number]>+ instead?) |
Si se quiere evitar el mensaje y se está dispuesto a asumir la pérdida de los valores asociados con los elementos de la lista se deberán poner el operando entre paréntesis (con o sin memoria).
Esto es lo que dice la documentación sobre este warning:
Repeated subrule <rule> will only capture its final match
You specified a subrule call with a repetition qualifier, such as:
<ListElem>*
or:
<ListElem>+
Because each subrule call saves its result in a hash entry of the same name, each repeated match will overwrite the previous ones, so only the last match will ultimately be saved. If you want to save all the matches, you need to tell Regexp::Grammars to save the sequence of results as a nested array within the hash entry, like so:
<[ListElem]>*
or:
<[ListElem]>+
If you really did intend to throw away every result but the final one, you can silence the warning by placing the subrule call inside any kind of parentheses. For example:
(<ListElem>)*
or:
(?: <ListElem> )+