Depuración

Sig: Mensajes de log del Sup: Análisis Sintáctico con Regexp::Grammars Ant: Casando con las claves Err: Si hallas una errata ...

Subsecciones

- Debugging grammar creation
- Debugging grammar execution

Depuración

Regexp::Grammars provides a number of features specifically designed to help debug both grammars and the data they parse.

All debugging messages are written to a log file (which, by default, is just STDERR). However, you can specify a disk file explicitly by placing a "<logfile:...>" directive at the start of your grammar^3.10:

        $grammar = qr{

            <logfile: LaTeX_parser_log >

            \A <LaTeX_file> \Z    # Pattern to match

            <rule: LaTeX_file>
                # etc.
        }x;

You can also explicitly specify that messages go to the terminal:

            <logfile: - >

Whenever a log file has been directly specified, Regexp::Grammars automatically does verbose static analysis of your grammar. That is, whenever it compiles a grammar containing an explicit "<logfile:...>" directive it logs a series of messages explaining how it has interpreted the various components of that grammar. For example, the following grammar:

pl@nereida:~/Lregexpgrammars/demo$ cat -n log.pl
     1  #!/usr/bin/env perl5.10.1
     2  use strict;
     3  use warnings;
     4  use 5.010;
     5  use Data::Dumper;
     6
     7  my $rbb = do {
     8      use Regexp::Grammars;
     9
    10      qr{
    11        <logfile: ->
    12
    13        <numbers>
    14
    15        <rule: numbers>
    16          <number> ** <.comma>
    17
    18        <token: number> \d+
    19
    20        <token: comma>   ,
    21      }xms;
    22  };
    23
    24  while (my $input = <>) {
    25      if ($input =~ m{$rbb}) {
    26          say("matches: <$&>");
    27          say Dumper \%/;
    28      }
    29  }

would produce the following analysis in the terminal:

pl@nereida:~/Lregexpgrammars/demo$ ./log.pl
  warn | Repeated subrule <number>* will only capture its final match
       | (Did you mean <[number]>* instead?)
       |
  info | Processing the main regex before any rule definitions
       |    |
       |    |...Treating <numbers> as:
       |    |      |  match the subrule <numbers>
       |    |       \ saving the match in $MATCH{'numbers'}
       |    |
       |     \___End of main regex
       |
       | Defining a rule: <numbers>
       |    |...Returns: a hash
       |    |
       |    |...Treating <number> as:
       |    |      |  match the subrule <number>
       |    |       \ saving the match in $MATCH{'number'}
       |    |
       |    |...Treating <.comma> as:
       |    |      |  match the subrule <comma>
       |    |       \ but don't save anything
       |    |
       |    |...Treating <number> ** <.comma> as:
       |    |      |  repeatedly match the subrule <number>
       |    |       \ as long as the matches are separated by matches of <.comma>
       |    |
       |     \___End of rule definition
       |
       | Defining a rule: <number>
       |    |...Returns: a hash
       |    |
       |    |...Treating '\d' as:
       |    |       \ normal Perl regex syntax
       |    |
       |    |...Treating '+ ' as:
       |    |       \ normal Perl regex syntax
       |    |
       |     \___End of rule definition
       |
       | Defining a rule: <comma>
       |    |...Returns: a hash
       |    |
       |    |...Treating ', ' as:
       |    |       \ normal Perl regex syntax
       |    |
       |     \___End of rule definition
       |
2, 3, 4
matches: <2, 3, 4>
$VAR1 = {
          '' => '2, 3, 4',
          'numbers' => {
                         '' => '2, 3, 4',
                         'number' => '4'
                       }
        };

This kind of static analysis is a useful starting point in debugging a miscreant grammar^3.11, because it enables you to see what you actually specified (as opposed to what you thought you'd specified).

Debugging grammar execution

Regexp::Grammars also provides a simple interactive debugger, with which you can observe the process of parsing and the data being collected in any result-hash.

To initiate debugging, place a <debug:...> directive anywhere in your grammar. When parsing reaches that directive the debugger will be activated, and the command specified in the directive immediately executed. The available commands are:

        <debug: on>    - Enable debugging, stop when entire grammar matches
        <debug: match> - Enable debugging, stope when a rule matches
        <debug: try>   - Enable debugging, stope when a rule is tried
        <debug: off>   - Disable debugging and continue parsing silently

        <debug: continue> - Synonym for <debug: on>
        <debug: run>      - Synonym for <debug: on>
        <debug: step>     - Synonym for <debug: try>

These directives can be placed anywhere within a grammar and take effect when that point is reached in the parsing. Hence, adding a <debug:step> directive is very much like setting a breakpoint at that point in the grammar. Indeed, a common debugging strategy is to turn debugging on and off only around a suspect part of the grammar:

        <rule: tricky>   # This is where we think the problem is...
            <debug:step>
            <preamble> <text> <postscript>
            <debug:off>

Once the debugger is active, it steps through the parse, reporting rules that are tried, matches and failures, backtracking and restarts, and the parser's location within both the grammar and the text being matched. That report looks like this:

        ===============> Trying <grammar> from position 0
        > cp file1 file2 |...Trying <cmd>
                         |   |...Trying <cmd=(cp)>
                         |   |    \FAIL <cmd=(cp)>
                         |    \FAIL <cmd>
                          \FAIL <grammar>
        ===============> Trying <grammar> from position 1
         cp file1 file2  |...Trying <cmd>
                         |   |...Trying <cmd=(cp)>
         file1 file2     |   |    \_____<cmd=(cp)> matched 'cp'
        file1 file2      |   |...Trying <[file]>+
         file2           |   |    \_____<[file]>+ matched 'file1'
                         |   |...Trying <[file]>+
        [eos]            |   |    \_____<[file]>+ matched ' file2'
                         |   |...Trying <[file]>+
                         |   |    \FAIL <[file]>+
                         |   |...Trying <target>
                         |   |   |...Trying <file>
                         |   |   |    \FAIL <file>
                         |   |    \FAIL <target>
         <~~~~~~~~~~~~~~ |   |...Backtracking 5 chars and trying new match
        file2            |   |...Trying <target>
                         |   |   |...Trying <file>
                         |   |   |    \____ <file> matched 'file2'
        [eos]            |   |    \_____<target> matched 'file2'
                         |    \_____<cmd> matched ' cp file1 file2'
                          \_____<grammar> matched ' cp file1 file2'

The first column indicates the point in the input at which the parser is trying to match, as well as any backtracking or forward searching it may need to do. The remainder of the columns track the parser's hierarchical traversal of the grammar, indicating which rules are tried, which succeed, and what they match.

Provided the logfile is a terminal (as it is by default), the debugger also pauses at various points in the parsing process-before trying a rule, after a rule succeeds, or at the end of the parse-according to the most recent command issued. When it pauses, you can issue a new command by entering a single letter:

        m       - to continue until the next subrule matches
        t or s  - to continue until the next subrule is tried
        r or c  - to continue to the end of the grammar
        o       - to switch off debugging

Note that these are the first letters of the corresponding <debug:...> commands, listed earlier. Just hitting ENTER while the debugger is paused repeats the previous command.

While the debugger is paused you can also type a d, which will display the result-hash for the current rule. This can be useful for detecting which rule isn't returning the data you expected.

Veamos un ejemplo. El siguiente programa activa el depurador:

pl@nereida:~/Lregexpgrammars/demo$ cat -n demo_debug.pl
     1  #!/usr/bin/env perl5.10.1
     2  use 5.010;
     3  use warnings;
     4
     5      use Regexp::Grammars;
     6
     7      my $balanced_brackets = qr{
     8          <debug:on>
     9
    10          <left_delim=(  \( )>
    11          (?:
    12              <[escape=(  \\ )]>
    13          |   <recurse=( (?R) )>
    14          |   <[simple=(  .  )]>
    15          )*
    16          <right_delim=( \) )>
    17      }xms;
    18
    19      while (<>) {
    20          if (/$balanced_brackets/) {
    21              say 'matched:';
    22              use Data::Dumper 'Dumper';
    23              warn Dumper \%/;
    24          }
    25      }

Al ejecutar obtenemos

pl@nereida:~/Lregexpgrammars/demo$ ./demo_debug.pl
(a)
=====> Trying <grammar> from position 0
(a)\n  |...Trying <left_delim=(  \( )>

a)\n   |    _____<left_delim=(  \( )> matched '('      c
       |...Trying <[escape=(  \ )]>
       |    \FAIL <[escape=(  \ )]>
       |...Trying <recurse=( (?R) )>
=====> Trying <grammar> from position 1
a)\n   |   |...Trying <left_delim=(  \( )>

       |   |    \FAIL <left_delim=(  \( )>
        \FAIL <grammar>
       |...Trying <[simple=(  .  )]>
)\n    |    _____<[simple=(  .  )]> matched 'a'
       |...Trying <[escape=(  \ )]>

       |    \FAIL <[escape=(  \ )]>
       |...Trying <recurse=( (?R) )>
=====> Trying <grammar> from position 2
)\n    |   |...Trying <left_delim=(  \( )>
       |   |    \FAIL <left_delim=(  \( )>

        \FAIL <grammar>
       |...Trying <[simple=(  .  )]>
\n     |    _____<[simple=(  .  )]> matched ')'
       |...Trying <[escape=(  \ )]>
       |    \FAIL <[escape=(  \ )]>

       |...Trying <recurse=( (?R) )>
=====> Trying <grammar> from position 3
\n     |   |...Trying <left_delim=(  \( )>
       |   |    \FAIL <left_delim=(  \( )>
        \FAIL <grammar>

       |...Trying <[simple=(  .  )]>
[eos]  |    _____<[simple=(  .  )]> matched ''
       |...Trying <[escape=(  \ )]>
       |    \FAIL <[escape=(  \ )]>
       |...Trying <recurse=( (?R) )>

=====> Trying <grammar> from position 4
[eos]  |   |...Trying <left_delim=(  \( )>
       |   |    \FAIL <left_delim=(  \( )>
        \FAIL <grammar>
       |...Trying <[simple=(  .  )]>

       |    \FAIL <[simple=(  .  )]>
       |...Trying <right_delim=( \) )>
       |    \FAIL <right_delim=( \) )>
 <~~~~ |...Backtracking 1 char and trying new match
\n     |...Trying <right_delim=( \) )>
       |    \FAIL <right_delim=( \) )>

 <~~~~ |...Backtracking 1 char and trying new match
)\n    |...Trying <right_delim=( \) )>
\n     |    _____<right_delim=( \) )> matched ')'
        _____<grammar> matched '(a)'   d
              :         {
              :           '' => '(a)',
              :           'left_delim' => '(',
              :           'simple' => [
              :                         'a'
              :                       ],
              :           'right_delim' => ')'
              :         };      o
matched:
$VAR1 = {
          '' => '(a)',
          'left_delim' => '(',
          'simple' => [
                        'a'
                      ],
          'right_delim' => ')'
        };

Sig: Mensajes de log del Sup: Análisis Sintáctico con Regexp::Grammars Ant: Casando con las claves Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22

Depuración

Debugging grammar creation

Debugging grammar execution