










Sig: El Módulo Parse::Flex
Sup: Construcción de Analizadores Léxicos
Ant: Condiciones de arranque
Err: Si hallas una errata ...
Relacionada con Parse::Lex está la clase Parse::CLex,
la cual avanza consumiendo la cadena analizada mediante el uso del operador
de sustitución (s///). Los analizadores producidos mediante esta segunda clase
no permiten el uso de anclas en las expresiones regulares. Tampoco disponen de acceso a la subclase
Parse::Token.
He aqui el mismo ejemplo, usando la clase Parse::CLex:
> cat -n ctokenizer.pl
1 #!/usr/local/bin/perl -w
2
3 require 5.000;
4 BEGIN { unshift @INC, "../lib"; }
5 use Parse::CLex;
6
7 @token = (
8 qw(
9 ADDOP [-+]
10 LEFTP [\(]
11 RIGHTP [\)]
12 INTEGER [1-9][0-9]*
13 NEWLINE \n
14 ),
15 qw(STRING), [qw(" (?:[^"]+|"")* ")],
16 qw(ERROR .*), sub {
17 die qq!can\'t analyze: "$_[1]"!;
18 }
19 );
20
21 Parse::CLex->trace;
22 $lexer = Parse::CLex->new(@token);
23
24 $lexer->from(\*DATA);
25 print "Tokenization of DATA:\n";
26
27 TOKEN:while (1) {
28 $token = $lexer->next;
29 if (not $lexer->eoi) {
30 print "Record number: ", $lexer->line, "\n";
31 print "Type: ", $token->name, "\t";
32 print "Content:->", $token->getText, "<-\n";
33 } else {
34 last TOKEN;
35 }
36 }
37
38 __END__
39 1+2-5
40 "This is a multiline
41 string with an embedded "" in it"
42 this is an invalid string with a "" in it"
43
44
> ctokenizer.pl
Trace is ON in class Parse::CLex
Tokenization of DATA:
[main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 1
Record number: 1
Type: INTEGER Content:->1<-
[main::lexer|Parse::CLex] Token read (ADDOP, [-+]): +
Record number: 1
Type: ADDOP Content:->+<-
[main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 2
Record number: 1
Type: INTEGER Content:->2<-
[main::lexer|Parse::CLex] Token read (ADDOP, [-+]): -
Record number: 1
Type: ADDOP Content:->-<-
[main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 5
Record number: 1
Type: INTEGER Content:->5<-
[main::lexer|Parse::CLex] Token read (NEWLINE, \n):
Record number: 1
Type: NEWLINE Content:->
<-
[main::lexer|Parse::CLex] Token read (STRING, \"(?:[^\"]+|\"\")*\"): "This is a multiline
string with an embedded "" in it"
Record number: 3
Type: STRING Content:->"This is a multiline
string with an embedded "" in it"<-
[main::lexer|Parse::CLex] Token read (NEWLINE, \n):
Record number: 3
Type: NEWLINE Content:->
<-
[main::lexer|Parse::CLex] Token read (ERROR, .*): this is an invalid string with a "" in it"
can't analyze: "this is an invalid string with a "" in it"" at ctokenizer.pl line 17, <DATA> line 4.











Sig: El Módulo Parse::Flex
Sup: Construcción de Analizadores Léxicos
Ant: Condiciones de arranque
Err: Si hallas una errata ...
Casiano Rodríguez León
2012-05-22