Parse::Lex
está la clase Parse::CLex
,
la cual avanza consumiendo la cadena analizada mediante el uso del operador
de sustitución (s///
). Los analizadores producidos mediante esta segunda clase
no permiten el uso de anclas en las expresiones regulares. Tampoco disponen de acceso a la subclase
Parse::Token
.
He aqui el mismo ejemplo, usando la clase Parse::CLex
:
> cat -n ctokenizer.pl 1 #!/usr/local/bin/perl -w 2 3 require 5.000; 4 BEGIN { unshift @INC, "../lib"; } 5 use Parse::CLex; 6 7 @token = ( 8 qw( 9 ADDOP [-+] 10 LEFTP [\(] 11 RIGHTP [\)] 12 INTEGER [1-9][0-9]* 13 NEWLINE \n 14 ), 15 qw(STRING), [qw(" (?:[^"]+|"")* ")], 16 qw(ERROR .*), sub { 17 die qq!can\'t analyze: "$_[1]"!; 18 } 19 ); 20 21 Parse::CLex->trace; 22 $lexer = Parse::CLex->new(@token); 23 24 $lexer->from(\*DATA); 25 print "Tokenization of DATA:\n"; 26 27 TOKEN:while (1) { 28 $token = $lexer->next; 29 if (not $lexer->eoi) { 30 print "Record number: ", $lexer->line, "\n"; 31 print "Type: ", $token->name, "\t"; 32 print "Content:->", $token->getText, "<-\n"; 33 } else { 34 last TOKEN; 35 } 36 } 37 38 __END__ 39 1+2-5 40 "This is a multiline 41 string with an embedded "" in it" 42 this is an invalid string with a "" in it" 43 44
> ctokenizer.pl Trace is ON in class Parse::CLex Tokenization of DATA: [main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 1 Record number: 1 Type: INTEGER Content:->1<- [main::lexer|Parse::CLex] Token read (ADDOP, [-+]): + Record number: 1 Type: ADDOP Content:->+<- [main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 2 Record number: 1 Type: INTEGER Content:->2<- [main::lexer|Parse::CLex] Token read (ADDOP, [-+]): - Record number: 1 Type: ADDOP Content:->-<- [main::lexer|Parse::CLex] Token read (INTEGER, [1-9][0-9]*): 5 Record number: 1 Type: INTEGER Content:->5<- [main::lexer|Parse::CLex] Token read (NEWLINE, \n): Record number: 1 Type: NEWLINE Content:-> <- [main::lexer|Parse::CLex] Token read (STRING, \"(?:[^\"]+|\"\")*\"): "This is a multiline string with an embedded "" in it" Record number: 3 Type: STRING Content:->"This is a multiline string with an embedded "" in it"<- [main::lexer|Parse::CLex] Token read (NEWLINE, \n): Record number: 3 Type: NEWLINE Content:-> <- [main::lexer|Parse::CLex] Token read (ERROR, .*): this is an invalid string with a "" in it" can't analyze: "this is an invalid string with a "" in it"" at ctokenizer.pl line 17, <DATA> line 4.