Destilación del resultado

Sig: Llamadas privadas a subreglas Sup: Análisis Sintáctico con Regexp::Grammars Ant: Llamadas a subreglas desmemoriadas Err: Si hallas una errata ...

Subsecciones

- Destilación manual
- Destilación en el programa

Destilación del resultado

Destilación manual

Regexp::Grammars also offers full manual control over the distillation process. If you use the reserved word MATCH as the alias for a subrule call:

    <MATCH=filename>

or a subpattern match:

    <MATCH=( \w+ )>

or a code block:

    <MATCH=(?{ 42 })>

then the current rule will treat the return value of that subrule, pattern, or code block as its complete result, and return that value instead of the usual result-hash it constructs. This is the case even if the result has other entries that would normally also be returned.

For example, in a rule like:

    <rule: term>
          <MATCH=literal>
        | <left_paren> <MATCH=expr> <right_paren>

The use of MATCH aliases causes the rule to return either whatever <literal> returns, or whatever <expr> returns (provided it's between left and right parentheses).

Note that, in this second case, even though <left_paren> and <right_paren> are captured to the result-hash, they are not returned, because the MATCH alias overrides the normal return the result-hash semantics and returns only what its associated subrule (i.e. <expr>) produces.

El siguiente ejemplo ilustra el uso del alias MATCH:

$ cat -n demo_calc.pl
 1  #!/usr/local/lib/perl/5.10.1/bin/perl5.10.1
 2  use v5.10;
 3  use warnings;
 4
 5  my $calculator = do{
 6      use Regexp::Grammars;
 7      qr{
 8          <Answer>
 9
10          <rule: Answer>
11              <X=Mult> <Op=([+-])> <Y=Answer>
12            | <MATCH=Mult>
13
14          <rule: Mult>
15              <X=Pow> <Op=([*/%])> <Y=Mult>
16            | <MATCH=Pow>
17
18          <rule: Pow>
19              <X=Term> <Op=(\^)> <Y=Pow>
20            | <MATCH=Term>
21
22          <rule: Term>
23                 <MATCH=Literal>
24            | \( <MATCH=Answer> \)
25
26          <token: Literal>
27              <MATCH=( [+-]? \d++ (?: \. \d++ )?+ )>
28      }xms
29  };
30
31  while (my $input = <>) {
32      if ($input =~ $calculator) {
33          use Data::Dumper 'Dumper';
34          warn Dumper \%/;
35      }
36  }

Veamos una ejecución:

$ ./demo_calc.pl
2+3*5
$VAR1 = {
          '' => '2+3*5',
          'Answer' => {
                        '' => '2+3*5',
                        'Op' => '+',
                        'X' => '2',
                        'Y' => {
                                 '' => '3*5',
                                 'Op' => '*',
                                 'X' => '3',
                                 'Y' => '5'
                               }
                      }
        };
4-5-2
$VAR1 = {
          '' => '4-5-2',
          'Answer' => {
                        '' => '4-5-2',
                        'Op' => '-',
                        'X' => '4',
                        'Y' => {
                                 '' => '5-2',
                                 'Op' => '-',
                                 'X' => '5',
                                 'Y' => '2'
                               }
                      }
        };

Obsérvese como el árbol construido para la expresión 4-5-2 se hunde a derechas dando lugar a una jerarquía errónea. Para arreglar el problema sería necesario eliminar la recursividad por la izquierda en las reglas correspondientes.

Destilación en el programa

It's also possible to control what a rule returns from within a code block. Regexp::Grammars provides a set of reserved variables that give direct access to the result-hash.

The result-hash itself can be accessed as %MATCH within any code block inside a rule. For example:

    <rule: sum> 
        <X=product> \+ <Y=product>
            <MATCH=(?{ $MATCH{X} + $MATCH{Y} })>

Here, the rule matches a product (aliased 'X' in the result-hash), then a literal '+', then another product (aliased to 'Y' in the result-hash). The rule then executes the code block, which accesses the two saved values (as $MATCH{X} and $MATCH{Y}), adding them together. Because the block is itself aliased to MATCH, the sum produced by the block becomes the (only) result of the rule.

It is also possible to set the rule result from within a code block (instead of aliasing it). The special override return value is represented by the special variable $MATCH. So the previous example could be rewritten:

    <rule: sum> 
        <X=product> \+ <Y=product>
            (?{ $MATCH = $MATCH{X} + $MATCH{Y} })

Both forms are identical in effect. Any assignment to $MATCH overrides the normal return all subrule results behaviour.

Assigning to $MATCH directly is particularly handy if the result may not always be distillable, for example:

    <rule: sum> 
        <X=product> \+ <Y=product>
            (?{ if (!ref $MATCH{X} && !ref $MATCH{Y}) {
                    # Reduce to sum, if both terms are simple scalars...
                    $MATCH = $MATCH{X} + $MATCH{Y};
                }
                else {
                    # Return full syntax tree for non-simple case...
                    $MATCH{op} = '+';
                }
            })

Note that you can also partially override the subrule return behaviour. Normally, the subrule returns the complete text it matched under the empty key of its result-hash. That is, of course, $MATCH{""}, so you can override just that behaviour by directly assigning to that entry.

For example, if you have a rule that matches key/value pairs from a configuration file, you might prefer that any trailing comments not be included in the matched text entry of the rule's result-hash. You could hide such comments like so:

    <rule: config_line>
        <key> : <value>  <comment>?
            (?{
                # Edit trailing comments out of "matched text" entry...
                $MATCH = "$MATCH{key} : $MATCH{value}";
            })

Some more examples of the uses of $MATCH:

    <rule: FuncDecl>
      # Keyword  Name               Keep return the name (as a string)...
        func     <Identifier> ;     (?{ $MATCH = $MATCH{'Identifier'} })


    <rule: NumList>
      # Numbers in square brackets...
        \[ 
            ( \d+ (?: , \d+)* )
        \]

      # Return only the numbers...
        (?{ $MATCH = $CAPTURE })


    <token: Cmd>
      # Match standard variants then standardize the keyword...
        (?: mv | move | rename )      (?{ $MATCH = 'mv'; })

$CAPTURE and $CONTEXT are both aliases for the built-in read-only $^N variable, which always contains the substring matched by the nearest preceding (...) capture. $^N still works perfectly well, but these are provided to improve the readability of code blocks and error messages respectively.

El siguiente código implementa una calculadora usando destilación en el código:

pl@nereida:~/Lregexpgrammars/demo$ cat -n demo_calc_inline.pl
 1  use v5.10;
 2  use warnings;
 3
 4  my $calculator = do{
 5      use Regexp::Grammars;
 6      qr{
 7          <Answer>
 8
 9          <rule: Answer>
10              <X=Mult> \+ <Y=Answer>
11                  (?{ $MATCH = $MATCH{X} + $MATCH{Y}; })
12            | <X=Mult> - <Y=Answer>
13                  (?{ $MATCH = $MATCH{X} - $MATCH{Y}; })
14            | <MATCH=Mult>
15
16          <rule: Mult>
17              <X=Pow> \* <Y=Mult>
18                  (?{ $MATCH = $MATCH{X} * $MATCH{Y}; })
19            | <X=Pow>  / <Y=Mult>
20                  (?{ $MATCH = $MATCH{X} / $MATCH{Y}; })
21            | <X=Pow>  % <Y=Mult>
22                  (?{ $MATCH = $MATCH{X} % $MATCH{Y}; })
23            | <MATCH=Pow>
24
25          <rule: Pow>
26              <X=Term> \^ <Y=Pow>
27                  (?{ $MATCH = $MATCH{X} ** $MATCH{Y}; })
28            | <MATCH=Term>
29
30          <rule: Term>
31                 <MATCH=Literal>
32            | \( <MATCH=Answer> \)
33
34          <token: Literal>
35              <MATCH=( [+-]? \d++ (?: \. \d++ )?+ )>
36      }xms
37  };
38
39  while (my $input = <>) {
40      if ($input =~ $calculator) {
41          say '--> ', $/{Answer};
42      }
43  }

Ejercicio 3.11.1 Cual es la salida del programa anterior para las entradas:

4-2-2
8/4/2
2^2^3

Sig: Llamadas privadas a subreglas Sup: Análisis Sintáctico con Regexp::Grammars Ant: Llamadas a subreglas desmemoriadas Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22