Introducción al Anaĺisis Sintáctico con Expresiones Regulares

Sig: Construyendo el AST con Sup: Analisis Sintáctico con Expresiones Ant: Analisis Sintáctico con Expresiones Err: Si hallas una errata ...

Subsecciones

Introducción al Anaĺisis Sintáctico con Expresiones Regulares

Como se ha comentado en la sección 3.2.5 Perl 5.10 permite el reconocimiento de expresiones definidas mediante gramáticas recursivas, siempre que estas puedan ser analizadas por un analizador recursivo descendente. Sin embargo, las expresiones regulares Perl 5.10 hace difícil construir una representación del árbol de análisis sintáctico abstracto. Además, la necesidad de explicitar en la regexp los blancos existentes entre los símbolos hace que la descripción sea menos robusta y menos legible.

Ejemplo: Traducción de expresiones aritméticas en infijo a postfijo

El siguiente ejemplo muestra una expresión regular que traduce expresiones de diferencias en infijo a postfijo.

Se usa una variable $tran para calcular la traducción de la subexpresión vista hasta el momento.

La gramática original que consideramos es recursiva a izquierdas:

 exp ->   exp '-' digits
        | digits

aplicando las técnicas explicadas en 4.8.1 y en el nodo de perlmonks 553889 transformamos la gramática en:

exp ->   digits rest
rest ->   '-' rest
        | # empty

Sigue el código:

pl@nereida:~/Lperltesting$ cat -n infixtopostfix.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1
    2   use v5.10;
    3 
    4   # Infix to postfix translator using 5.10 regexp
    5   # original grammar:
    6   #  exp ->   exp '-' digits
    7   #         | digits
    8   #
    9   # Applying left-recursion elimination we have:
   10   #  exp ->   digits rest
   11   #  rest ->   '-' rest
   12   #          | # empty
   13   #
   14   my $input;
   15   local our $tran = '';
   16 
   17   my $regexp = qr{
   18       (?&exp)
   19 
   20       (?(DEFINE)
   21           (?<exp>    ((?&digits)) \s* (?{ $tran .= "$^N "; say "tran=$tran"; }) (?&rest)
   22                         (?{
   23                            say "exp -> digits($^N) rest";
   24                         })
   25           )
   26 
   27           (?<rest>     \s* - ((?&digits)) (?{ $tran .= "$^N - "; say "tran=$tran"; }) (?&rest)
   28                           (?{
   29                              say "rest -> - digits($^N) rest";
   30                           })
   31                     |  # empty
   32                           (?{
   33                              say "rest -> empty";
   34                           })
   35           )
   36 
   37           (?<digits>  \s* (\d+)
   38           )
   39       )
   40   }xms;
   41 
   42   $input = <>;
   43   chomp($input);
   44   if ($input =~ $regexp) {
   45     say "matches: $&\ntran=$tran";
   46   }
   47   else {
   48     say "does not match";
   49   }

La variable $^N contiene el valor que casó con el último paréntesis. Al ejecutar el código anterior obtenemos:

Véase la ejecución:

pl@nereida:~/Lperltesting$ ./infixtopostfix.pl
ab 5 - 3 -2 cd;
tran= 5
tran= 5  3 -
tran= 5  3 - 2 -
rest -> empty
rest -> - digits(2) rest
rest -> - digits( 3) rest
exp -> digits( 5) rest
matches:  5 - 3 -2
tran= 5  3 - 2 -

Como se ve, el recorrido primero profundo se traduce en la reconstrucción de una derivación a derechas.

Accediendo a los atributos de paréntesis anteriores mediante acciones intermedias

Es difícil extender el ejemplo anterior a lenguajes mas complejos debido a la limitación de que sólo se dispone de acceso al último paréntesis vía $^N. En muchos casos es necesario poder acceder a paréntesis/atributos anteriores.

El siguiente código considera el caso de expresiones con sumas, restas, multiplicaciones y divisiones. Utiliza la variable op y una acción intermedia (líneas 51-53) para almacenar el segundo paréntesis necesitado:

pl@nereida:~/Lperltesting$ cat -n ./calc510withactions3.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1
    2   use v5.10;
    3 
    4   # Infix to postfix translator using 5.10 regexp
    5   # Original grammar:
    6 
    7   #  exp ->   exp [-+] term
    8   #         | term
    9   #  term ->   term [*/] digits
   10   #          | digits
   11 
   12   # Applying left-recursion elimination we have:
   13 
   14   #  exp  ->   term re
   15   #  re   ->   [+-] term re
   16   #          | # empty
   17   #  term ->   digits rt
   18   #  rt   ->   [*/] rt
   19   #          | # empty
   20 
   21 
   22   my $input;
   23   my @stack;
   24 
   25   local our $op = '';
   26   my $regexp = qr{
   27       (?&exp)
   28 
   29       (?(DEFINE)
   30           (?<exp>    (?&term) (?&re)
   31                        (?{ say "exp -> term re" })
   32           )
   33 
   34           (?<re>     \s* ([+-]) (?&term) \s* (?{ push @stack, $^N }) (?&re)
   35                        (?{ say "re -> [+-] term re" })
   36                    | # empty
   37                        (?{ say "re -> empty" })
   38           )
   39 
   40           (?<term>   ((?&digits))
   41                         (?{ # intermediate action
   42                             push @stack, $^N
   43                         })
   44                      (?&rt)
   45                         (?{
   46                             say "term-> digits($^N) rt";
   47                         })
   48           )
   49 
   50           (?<rt>     \s*([*/])
   51                              (?{ # intermediate action
   52                                 local $op = $^N;
   53                              })
   54                      ((?&digits)) \s*
   55                              (?{  # intermediate action
   56                                   push @stack, $^N, $op
   57                               })
   58                      (?&rt) # end of <rt> definition
   59                              (?{
   60                                   say "rt -> [*/] digits($^N) rt"
   61                               })
   62                     |  # empty
   63                        (?{ say "rt -> empty" })
   64           )
   65 
   66           (?<digits>  \s* \d+
   67           )
   68       )
   69   }xms;
   70 
   71   $input = <>;
   72   chomp($input);
   73   if ($input =~ $regexp) {
   74     say "matches: $&\nStack=(@stack)";
   75   }
   76   else {
   77     say "does not match";
   78   }

Sigue una ejecución:

pl@nereida:~/Lperltesting$ ./calc510withactions3.pl
5-8/4/2-1
rt -> empty
term-> digits(5) rt
rt -> empty
rt -> [*/] digits(2) rt
rt -> [*/] digits(4) rt
term-> digits(8) rt
rt -> empty
term-> digits(1) rt
re -> empty
re -> [+-] term re
re -> [+-] term re
exp -> term re
matches: 5-8/4/2-1
Stack=(5 8 4 / 2 / - 1 -)

Accediendo a los atributos de paréntesis anteriores mediante `@-`

Sigue una solución alternativa que obvia la necesidad de introducir incómodas acciones intermedias. Utilizamos las variables @- y @+:

Since Perl 5.6.1 the special variables @- and @+ can functionally replace $`, $& and $'. These arrays contain pointers to the beginning and end of each match (see perlvar for the full story), so they give you essentially the same information, but without the risk of excessive string copying.

Véanse los párrafos en las páginas , ) y para mas información sobre @- y @+.

Nótese la función rc en las líneas 21-28. rc(1) nos retorna lo que casó con el último paréntesis, rc(2) lo que casó con el penúltimo, etc.

pl@nereida:~/Lperltesting$ cat -n calc510withactions4.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1
    2   use v5.10;
    3 
    4   # Infix to postfix translator using 5.10 regexp
    5   # Original grammar:
    6 
    7   #  exp ->   exp [-+] term
    8   #         | term
    9   #  term ->   term [*/] digits
   10   #          | digits
   11 
   12   # Applying left-recursion elimination we have:
   13 
   14   #  exp  ->   term re
   15   #  re   ->   [+-] term re
   16   #          | # empty
   17   #  term ->   digits rt
   18   #  rt   ->   [*/] rt
   19   #          | # empty
   20 
   21   sub rc {
   22     my $ofs = - shift;
   23 
   24     # Number of parenthesis that matched
   25     my $np = @-;
   26     # $_ contains the string being matched
   27     substr($_, $-[$ofs], $+[$np+$ofs] - $-[$ofs])
   28   }
   29 
   30   my $input;
   31   my @stack;
   32 
   33   my $regexp = qr{
   34       (?&exp)
   35 
   36       (?(DEFINE)
   37           (?<exp>    (?&term) (?&re)
   38                        (?{ say "exp -> term re" })
   39           )
   40 
   41           (?<re>     \s* ([+-]) (?&term) \s* (?{ push @stack, rc(1) }) (?&re)
   42                        (?{ say "re -> [+-] term re" })
   43                    | # empty
   44                        (?{ say "re -> empty" })
   45           )
   46 
   47           (?<term>   ((?&digits))
   48                         (?{ # intermediate action
   49                             push @stack, rc(1)
   50                         })
   51                      (?&rt)
   52                         (?{
   53                             say "term-> digits(".rc(1).") rt";
   54                         })
   55           )
   56 
   57           (?<rt>     \s*([*/]) ((?&digits)) \s*
   58                              (?{  # intermediate action
   59                                   push @stack, rc(1), rc(2)
   60                               })
   61                      (?&rt) # end of <rt> definition
   62                              (?{
   63                                   say "rt -> [*/] digits(".rc(1).") rt"
   64                               })
   65                     |  # empty
   66                        (?{ say "rt -> empty" })
   67           )
   68 
   69           (?<digits>  \s* \d+
   70           )
   71       )
   72   }xms;
   73 
   74   $input = <>;
   75   chomp($input);
   76   if ($input =~ $regexp) {
   77     say "matches: $&\nStack=(@stack)";
   78   }
   79   else {
   80     say "does not match";
   81   }

Ahora accedemos a los atributos asociados con los dos paréntesis, en la regla de <rt> usando la función rc:

         (?<rt>     \s*([*/]) ((?&digits)) \s*
                            (?{  # intermediate action
                                 push @stack, rc(1), rc(2)
                             })

Sigue una ejecución del programa:

pl@nereida:~/Lperltesting$ ./calc510withactions4.pl
5-8/4/2-1
rt -> empty
term-> digits(5) rt
rt -> empty
rt -> [*/] digits(2) rt
rt -> [*/] digits(4) rt
term-> digits(8) rt
rt -> empty
term-> digits(1) rt
re -> empty
re -> [+-] term re
re -> [+-] term re
exp -> term re
matches: 5-8/4/2-1
Stack=(5 8 4 / 2 / - 1 -)
pl@nereida:~/Lperltesting$

Accediendo a los atributos de paréntesis anteriores mediante paréntesis con nombre

Una nueva solución: dar nombre a los paréntesis y acceder a los mismos:

47          (?<rt>     \s*(?<op>[*/]) (?<num>(?&digits)) \s*
48                             (?{  # intermediate action
49                                  push @stack, $+{num}, $+{op}
50                              })

Sigue el código completo:

pl@nereida:~/Lperltesting$ cat -n ./calc510withnamedpar.pl
    1   #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1
    2   use v5.10;
    3 
    4   # Infix to postfix translator using 5.10 regexp
    5   # Original grammar:
    6 
    7   #  exp ->   exp [-+] term
    8   #         | term
    9   #  term ->   term [*/] digits
   10   #          | digits
   11 
   12   # Applying left-recursion elimination we have:
   13 
   14   #  exp  ->   term re
   15   #  re   ->   [+-] term re
   16   #          | # empty
   17   #  term ->   digits rt
   18   #  rt   ->   [*/] rt
   19   #          | # empty
   20 
   21   my @stack;
   22 
   23   my $regexp = qr{
   24       (?&exp)
   25 
   26       (?(DEFINE)
   27           (?<exp>    (?&term) (?&re)
   28                        (?{ say "exp -> term re" })
   29           )
   30 
   31           (?<re>     \s* ([+-]) (?&term) \s* (?{ push @stack, $^N }) (?&re)
   32                        (?{ say "re -> [+-] term re" })
   33                    | # empty
   34                        (?{ say "re -> empty" })
   35           )
   36 
   37           (?<term>   ((?&digits))
   38                         (?{ # intermediate action
   39                             push @stack, $^N
   40                         })
   41                      (?&rt)
   42                         (?{
   43                             say "term-> digits($^N) rt";
   44                         })
   45           )
   46 
   47           (?<rt>     \s*(?<op>[*/]) (?<num>(?&digits)) \s*
   48                              (?{  # intermediate action
   49                                   push @stack, $+{num}, $+{op}
   50                               })
   51                      (?&rt) # end of <rt> definition
   52                              (?{
   53                                   say "rt -> [*/] digits($^N) rt"
   54                               })
   55                     |  # empty
   56                        (?{ say "rt -> empty" })
   57           )
   58 
   59           (?<digits>  \s* \d+
   60           )
   61       )
   62   }xms;
   63 
   64   my $input = <>;
   65   chomp($input);
   66   if ($input =~ $regexp) {
   67     say "matches: $&\nStack=(@stack)";
   68   }
   69   else {
   70     say "does not match";
   71   }

Ejecución:

pl@nereida:~/Lperltesting$ ./calc510withnamedpar.pl
5-8/4/2-1
rt -> empty
term-> digits(5) rt
rt -> empty
rt -> [*/] digits(2) rt
rt -> [*/] digits(4) rt
term-> digits(8) rt
rt -> empty
term-> digits(1) rt
re -> empty
re -> [+-] term re
re -> [+-] term re
exp -> term re
matches: 5-8/4/2-1
Stack=(5 8 4 / 2 / - 1 -)

Véase También

El nodo Backreference variables in code embedded inside Perl 5.10 regexps en PerlMonks
El nodo Strange behavior of @- and @+ in perl5.10 regexps en PerlMonks

Sig: Construyendo el AST con Sup: Analisis Sintáctico con Expresiones Ant: Analisis Sintáctico con Expresiones Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22

Introducción al Anaĺisis Sintáctico con Expresiones Regulares

Ejemplo: Traducción de expresiones aritméticas en infijo a postfijo

Accediendo a los atributos de paréntesis anteriores mediante acciones intermedias

Accediendo a los atributos de paréntesis anteriores mediante @-

Accediendo a los atributos de paréntesis anteriores mediante paréntesis con nombre

Véase También

Accediendo a los atributos de paréntesis anteriores mediante `@-`