Verbos que controlan el retroceso

Sig: Unicode Sup: Algunas Extensiones Ant: Expresiones Condicionales Err: Si hallas una errata ...

Subsecciones

Verbos que controlan el retroceso

El verbo de control `(*FAIL)`

Tomado de la sección 'Backtracking-control-verbs' en perlretut:

The control verb (*FAIL) may be abbreviated as (*F). If this is inserted in a regexp it will cause to fail, just like at some mismatch between the pattern and the string. Processing of the regexp continues like after any "normal" failure, so that the next position in the string or another alternative will be tried. As failing to match doesn't preserve capture buffers or produce results, it may be necessary to use this in combination with embedded code.

pl@nereida:~/Lperltesting$ cat -n vowelcount.pl
     1  #!/usr/local/lib/perl/5.10.1/bin//perl5.10.1  -w
     2  use strict;
     3
     4  my $input = shift() || <STDIN>;
     5  my %count = ();
     6  $input =~ /([aeiou])(?{ $count{$1}++; })(*FAIL)/i;
     7  printf("'%s' => %3d\n", $_, $count{$_})  for (sort keys %count);

Al ejecutarse con entrada supercalifragilistico produce la salida:

pl@nereida:~/Lperltesting$ ./vowelcount.pl
supercalifragilistico
'a' =>   2
'e' =>   1
'i' =>   4
'o' =>   1
'u' =>   1

Ejercicio 3.2.5 ¿Que queda en $1 depués de ejecutado el matching $input =~ /([aeiou])(?{ $count{$1}++; })(*FAIL)/i;?

Véase también:

El nodo en PerlMonks The Oldest Plays the Piano
Véase el ejercicio Las tres hijas en la sección 3.4.4

El verbo de control `(*ACCEPT)`

Tomado de perlretut:

This pattern matches nothing and causes the end of successful matching at the point at which the (*ACCEPT) pattern was encountered, regardless of whether there is actually more to match in the string. When inside of a nested pattern, such as recursion, or in a subpattern dynamically generated via (??{}), only the innermost pattern is ended immediately.

If the (*ACCEPT) is inside of capturing buffers then the buffers are marked as ended at the point at which the (*ACCEPT) was encountered. For instance:

  DB<1> x 'AB' =~ /(A (A|B(*ACCEPT)|C) D)(E)/x
0  'AB'
1  'B'
2  undef
  DB<2> x 'ACDE'  =~ /(A (A|B(*ACCEPT)|C) D)(E)/x
0  'ACD'
1  'C'
2  'E'

El verbo `SKIP`

This zero-width pattern prunes the backtracking tree at the current point when backtracked into on failure. Consider the pattern A (*SKIP) B, where A and B are complex patterns. Until the (*SKIP) verb is reached, A may backtrack as necessary to match. Once it is reached, matching continues in B, which may also backtrack as necessary; however, should B not match, then no further backtracking will take place, and the pattern will fail outright at the current starting position.

It also signifies that whatever text that was matched leading up to the (*SKIP) pattern being executed cannot be part of any match of this pattern. This effectively means that the regex engine skips forward to this position on failure and tries to match again, (assuming that there is sufficient room to match).

The name of the (*SKIP:NAME) pattern has special significance. If a (*MARK:NAME) was encountered while matching, then it is that position which is used as the "skip point". If no (*MARK) of that name was encountered, then the (*SKIP) operator has no effect. When used without a name the "skip point" is where the match point was when executing the (*SKIP) pattern.

Ejemplo:

pl@nereida:~/Lperltesting$ cat -n SKIP.pl
     1  #!/soft/perl5lib/bin/perl5.10.1 -w
     2  use strict;
     3  use v5.10;
     4
     5  say "NO SKIP: /a+b?(*FAIL)/";
     6  our $count = 0;
     7  'aaab' =~ /a+b?(?{print "$&\n"; $count++})(*FAIL)/;
     8  say "Count=$count\n";
     9
    10  say "WITH SKIP: a+b?(*SKIP)(*FAIL)/";
    11  $count = 0;
    12  'aaab' =~ /a+b?(*SKIP)(?{print "$&\n"; $count++})(*FAIL)/;
    13  say "WITH SKIP: Count=$count\n";
    14
    15  say "WITH SKIP /a+(*SKIP)b?(*FAIL)/:";
    16  $count = 0;
    17  'aaab' =~ /a+(*SKIP)b?(?{print "$&\n"; $count++})(*FAIL)/;
    18  say "Count=$count\n";
    19
    20  say "WITH SKIP /(*SKIP)a+b?(*FAIL): ";
    21  $count = 0;
    22  'aaab' =~ /(*SKIP)a+b?(?{print "$&\n"; $count++})(*FAIL)/;
    23  say "Count=$count\n";

Ejecución:

pl@nereida:~/Lperltesting$ perl5.10.1 SKIP.pl
NO SKIP: /a+b?(*FAIL)/
aaab
aaa
aa
a
aab
aa
a
ab
a
Count=9

WITH SKIP: a+b?(*SKIP)(*FAIL)/
aaab
WITH SKIP: Count=1

WITH SKIP /a+(*SKIP)b?(*FAIL)/:
aaab
aaa
Count=2

WITH SKIP /(*SKIP)a+b?(*FAIL):
aaab
aaa
aa
a
aab
aa
a
ab
a
Count=9

Marcas

Tomado de la sección 'Backtracking-control-verbs' en perlretut:

(*MARK:NAME) (*:NAME)

This zero-width pattern can be used to mark the point reached in a string when a certain part of the pattern has been successfully matched. This mark may be given a name. A later (*SKIP) pattern will then skip forward to that point if backtracked into on failure. Any number of (*MARK) patterns are allowed, and the NAME portion is optional and may be duplicated.

In addition to interacting with the (*SKIP) pattern, (*MARK:NAME) can be used to label a pattern branch, so that after matching, the program can determine which branches of the pattern were involved in the match.

When a match is successful, the $REGMARK variable will be set to the name of the most recently executed (*MARK:NAME) that was involved in the match.

This can be used to determine which branch of a pattern was matched without using a separate capture buffer for each branch, which in turn can result in a performance improvement.

When a match has failed, and unless another verb has been involved in failing the match and has provided its own name to use, the $REGERROR variable will be set to the name of the most recently executed (*MARK:NAME).

pl@nereida:~/Lperltesting$ cat -n mark.pl
 1  use v5.10;
 2  use strict;
 3
 4  our $REGMARK;
 5
 6  $_ = shift;
 7  say $REGMARK if /(?:x(*MARK:mx)|y(*MARK:my)|z(*MARK:mz))/;
 8  say $REGMARK if /(?:x(*:xx)|y(*:yy)|z(*:zz))/;

Cuando se ejecuta produce:

pl@nereida:~/Lperltesting$ perl5.10.1 mark.pl y
my
yy
pl@nereida:~/Lperltesting$ perl5.10.1 mark.pl z
mz
zz

Poniendo un espacio después de cada signo de puntuación

Se quiere poner un espacio en blanco después de la aparición de cada coma:

s/,/, /g;

pero se quiere que la sustitución no tenga lugar si la coma esta incrustada entre dos dígitos. Además se pide que si hay ya un espacio después de la coma, no se duplique. Sigue una solución que usa marcas:

pl@nereida:~/Lperltesting$ perl5.10.1 -wdE 0
main::(-e:1):   0
  DB<1> $a = 'ab,cd, ef,12,34,efg,56,78,df, ef,'
  DB<2> x ($b = $a) =~ s/\d,\d(*:d)|,(?!\s)/($REGMARK eq 'd')? $& : ', '/ge
0  8
  DB<3> p "<$b>"
<ab, cd, ef, 12,34, efg, 56,78, df, ef, >

Sig: Unicode Sup: Algunas Extensiones Ant: Expresiones Condicionales Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22

Verbos que controlan el retroceso

El verbo de control (*FAIL)

El verbo de control (*ACCEPT)

El verbo SKIP

Marcas

Poniendo un espacio después de cada signo de puntuación

El verbo de control `(*FAIL)`

El verbo de control `(*ACCEPT)`

El verbo `SKIP`