Regexp::Grammars
also offers full manual control over the distillation
process. If you use the reserved word MATCH
as the alias for a subrule
call:
<MATCH=filename>
or a subpattern match:
<MATCH=( \w+ )>
or a code block:
<MATCH=(?{ 42 })>
then the current rule will treat the return value of that subrule, pattern, or code block as its complete result, and return that value instead of the usual result-hash it constructs. This is the case even if the result has other entries that would normally also be returned.
For example, in a rule like:
<rule: term> <MATCH=literal> | <left_paren> <MATCH=expr> <right_paren>
The use ofMATCH
aliases causes the rule to return either whatever<literal>
returns, or whatever<expr>
returns (provided it's between left and right parentheses).
Note that, in this second case, even though<left_paren>
and<right_paren>
are captured to the result-hash, they are not returned, because theMATCH
alias overrides the normal return the result-hash semantics and returns only what its associated subrule (i.e.<expr>
) produces.
El siguiente ejemplo ilustra el uso del alias MATCH
:
$ cat -n demo_calc.pl 1 #!/usr/local/lib/perl/5.10.1/bin/perl5.10.1 2 use v5.10; 3 use warnings; 4 5 my $calculator = do{ 6 use Regexp::Grammars; 7 qr{ 8 <Answer> 9 10 <rule: Answer> 11 <X=Mult> <Op=([+-])> <Y=Answer> 12 | <MATCH=Mult> 13 14 <rule: Mult> 15 <X=Pow> <Op=([*/%])> <Y=Mult> 16 | <MATCH=Pow> 17 18 <rule: Pow> 19 <X=Term> <Op=(\^)> <Y=Pow> 20 | <MATCH=Term> 21 22 <rule: Term> 23 <MATCH=Literal> 24 | \( <MATCH=Answer> \) 25 26 <token: Literal> 27 <MATCH=( [+-]? \d++ (?: \. \d++ )?+ )> 28 }xms 29 }; 30 31 while (my $input = <>) { 32 if ($input =~ $calculator) { 33 use Data::Dumper 'Dumper'; 34 warn Dumper \%/; 35 } 36 }
Veamos una ejecución:
$ ./demo_calc.pl 2+3*5 $VAR1 = { '' => '2+3*5', 'Answer' => { '' => '2+3*5', 'Op' => '+', 'X' => '2', 'Y' => { '' => '3*5', 'Op' => '*', 'X' => '3', 'Y' => '5' } } }; 4-5-2 $VAR1 = { '' => '4-5-2', 'Answer' => { '' => '4-5-2', 'Op' => '-', 'X' => '4', 'Y' => { '' => '5-2', 'Op' => '-', 'X' => '5', 'Y' => '2' } } };Obsérvese como el árbol construido para la expresión
4-5-2
se hunde a derechas dando lugar a una jerarquía errónea.
Para arreglar el problema sería necesario eliminar la
recursividad por la izquierda en las reglas correspondientes.
It's also possible to control what a rule returns from within a code block. Regexp::Grammars provides a set of reserved variables that give direct access to the result-hash.
The result-hash itself can be accessed as %MATCH
within any code block
inside a rule. For example:
<rule: sum> <X=product> \+ <Y=product> <MATCH=(?{ $MATCH{X} + $MATCH{Y} })>
Here, the rule matches a product (aliased'X'
in the result-hash), then a literal'+'
, then another product (aliased to'Y'
in the result-hash). The rule then executes the code block, which accesses the two saved values (as$MATCH{X}
and$MATCH{Y}
), adding them together. Because the block is itself aliased toMATCH
, the sum produced by the block becomes the (only) result of the rule.
It is also possible to set the rule result from within a code block
(instead of aliasing it). The special override return value is
represented by the special variable $MATCH
. So the previous example
could be rewritten:
<rule: sum> <X=product> \+ <Y=product> (?{ $MATCH = $MATCH{X} + $MATCH{Y} })
Both forms are identical in effect. Any assignment to $MATCH
overrides
the normal return all subrule results behaviour.
Assigning to $MATCH
directly is particularly handy if the result may
not always be distillable, for example:
<rule: sum> <X=product> \+ <Y=product> (?{ if (!ref $MATCH{X} && !ref $MATCH{Y}) { # Reduce to sum, if both terms are simple scalars... $MATCH = $MATCH{X} + $MATCH{Y}; } else { # Return full syntax tree for non-simple case... $MATCH{op} = '+'; } })
Note that you can also partially override the subrule return
behaviour. Normally, the subrule returns the complete text it matched
under the empty key of its result-hash. That is, of course,
$MATCH{""}
,
so you can override just that behaviour by directly assigning to that
entry.
For example, if you have a rule that matches key/value pairs from a configuration file, you might prefer that any trailing comments not be included in the matched text entry of the rule's result-hash. You could hide such comments like so:
<rule: config_line> <key> : <value> <comment>? (?{ # Edit trailing comments out of "matched text" entry... $MATCH = "$MATCH{key} : $MATCH{value}"; })
Some more examples of the uses of $MATCH
:
<rule: FuncDecl> # Keyword Name Keep return the name (as a string)... func <Identifier> ; (?{ $MATCH = $MATCH{'Identifier'} }) <rule: NumList> # Numbers in square brackets... \[ ( \d+ (?: , \d+)* ) \] # Return only the numbers... (?{ $MATCH = $CAPTURE }) <token: Cmd> # Match standard variants then standardize the keyword... (?: mv | move | rename ) (?{ $MATCH = 'mv'; })
$CAPTURE
and$CONTEXT
are both aliases for the built-in read-only$^N
variable, which always contains the substring matched by the nearest preceding(...)
capture.$^N
still works perfectly well, but these are provided to improve the readability of code blocks and error messages respectively.
El siguiente código implementa una calculadora usando destilación en el código:
pl@nereida:~/Lregexpgrammars/demo$ cat -n demo_calc_inline.pl 1 use v5.10; 2 use warnings; 3 4 my $calculator = do{ 5 use Regexp::Grammars; 6 qr{ 7 <Answer> 8 9 <rule: Answer> 10 <X=Mult> \+ <Y=Answer> 11 (?{ $MATCH = $MATCH{X} + $MATCH{Y}; }) 12 | <X=Mult> - <Y=Answer> 13 (?{ $MATCH = $MATCH{X} - $MATCH{Y}; }) 14 | <MATCH=Mult> 15 16 <rule: Mult> 17 <X=Pow> \* <Y=Mult> 18 (?{ $MATCH = $MATCH{X} * $MATCH{Y}; }) 19 | <X=Pow> / <Y=Mult> 20 (?{ $MATCH = $MATCH{X} / $MATCH{Y}; }) 21 | <X=Pow> % <Y=Mult> 22 (?{ $MATCH = $MATCH{X} % $MATCH{Y}; }) 23 | <MATCH=Pow> 24 25 <rule: Pow> 26 <X=Term> \^ <Y=Pow> 27 (?{ $MATCH = $MATCH{X} ** $MATCH{Y}; }) 28 | <MATCH=Term> 29 30 <rule: Term> 31 <MATCH=Literal> 32 | \( <MATCH=Answer> \) 33 34 <token: Literal> 35 <MATCH=( [+-]? \d++ (?: \. \d++ )?+ )> 36 }xms 37 }; 38 39 while (my $input = <>) { 40 if ($input =~ $calculator) { 41 say '--> ', $/{Answer}; 42 } 43 }
4-2-2
8/4/2
2^2^3