El siguiente fragmento esta 'casi' literalmente tomado de la sección 'Looking-ahead-and-looking-behind' en perlretut:
In Perl regular expressions, most regexp elements 'eat up' a certain
amount of string when they match. For instance, the regexp element
[abc}]
eats up one character of the string when it matches, in the sense
that Perl moves to the next character position in the string after the
match. There are some elements, however, that don't eat up characters
(advance the character position) if they match.
The examples we have seen
so far are the anchors. The anchor ^
matches the beginning of the line,
but doesn't eat any characters.
Similarly, the word boundary anchor
\b
matches wherever a character matching \w
is next to a character that
doesn't, but it doesn't eat up any characters itself.
Anchors are examples of zero-width assertions. Zero-width, because they consume no characters, and assertions, because they test some property of the string.
In the context of our walk in the woods analogy to regexp matching, most regexp elements move us along a trail, but anchors have us stop a moment and check our surroundings. If the local environment checks out, we can proceed forward. But if the local environment doesn't satisfy us, we must backtrack.
Checking the environment entails either looking ahead on the trail, looking behind, or both.
^
looks behind, to see that there are no
characters before.
$
looks ahead, to see that there are no characters
after.
\b
looks both ahead and behind, to see if the characters on either
side differ in their "word-ness".
The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for.
The lookahead assertion
is denoted by (?=regexp)
and the lookbehind assertion is denoted by
(?<=fixed-regexp)
.
En español, operador de ``trailing'' o ``mirar-adelante'' positivo.
Por ejemplo, /\w+(?=\t)/
solo casa una palabra si va seguida de un tabulador, pero el tabulador no formará parte de $&
.
Ejemplo:
> cat -n lookahead.pl 1 #!/usr/bin/perl 2 3 $a = "bugs the rabbit"; 4 $b = "bugs the frog"; 5 if ($a =~ m{bugs(?= the cat| the rabbit)}i) { print "$a matches. $& = $&\n"; } 6 else { print "$a does not match\n"; } 7 if ($b =~ m{bugs(?= the cat| the rabbit)}i) { print "$b matches. $& = $&\n"; } 8 else { print "$b does not match\n"; }Al ejecutar el programa obtenemos:
> lookahead.pl bugs the rabbit matches. $& = bugs bugs the frog does not match >
Some examples using the debugger3.4:
DB<1> #012345678901234567890 DB<2> $x = "I catch the housecat 'Tom-cat' with catnip" DB<3> print "($&) (".pos($x).")\n" if $x =~ /cat(?=\s)/g (cat) (20) # matches 'cat' in 'housecat' DB<5> $x = "I catch the housecat 'Tom-cat' with catnip" # To reset pos DB<6> x @catwords = ($x =~ /(?<=\s)cat\w+/g) 0 'catch' 1 'catnip' DB<7> #012345678901234567890123456789 DB<8> $x = "I catch the housecat 'Tom-cat' with catnip" DB<9> print "($&) (".pos($x).")\n" if $x =~ /\bcat\b/g (cat) (29) # matches 'cat' in 'Tom-cat' DB<10> $x = "I catch the housecat 'Tom-cat' with catnip" DB<11> x $x =~ /(?<=\s)cat(?=\s)/ empty array DB<12> # doesn't match; no isolated 'cat' in middle of $x
Véase el nodo A hard RegEx problem en PerlMonks. Un monje solicita:
Hi Monks,
I wanna to match this issues:
Pls help me check. Thanks
Solución:
casiano@millo:~$ perl -wde 0 main::(-e:1): 0 DB<1> x 'aaa2a1' =~ /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i 0 1 DB<2> x 'aaaaaa' =~ /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i empty array DB<3> x '1111111' =~ /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i empty array DB<4> x '1111111bbbbb' =~ /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i empty array DB<5> x '111bbbbb' =~ /\A(?=.*[a-z])(?=.*\d)\w{3,10}\z/i 0 1
Note that the parentheses in (?=regexp)
and (?<=regexp)
are non-capturing, since these are zero-width assertions.
Lookahead
(?=regexp)
can match arbitrary regexps, but lookbehind
(?<=fixed-regexp)
only works for regexps of fixed width, i.e., a fixed number of characters
long.
Thus (?<=(ab|bc))
is fine, but (?<=(ab)*)
is not.
The negated
versions of the lookahead and lookbehind assertions are denoted by
(?!regexp)
and (?<!fixed-regexp)
respectively.
They evaluate true if
the regexps do not match:
$x = "foobar"; $x =~ /foo(?!bar)/; # doesn't match, 'bar' follows 'foo' $x =~ /foo(?!baz)/; # matches, 'baz' doesn't follow 'foo' $x =~ /(?<!\s)foo/; # matches, there is no \s before 'foo'
Here is an example where a string containing blank-separated words, numbers and single dashes is to be split into its components.
Using /\s+/
alone won't work, because spaces are not required between dashes, or a
word or a dash. Additional places for a split are established by looking
ahead and behind:
casiano@tonga:~$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> $str = "one two - --6-8" DB<2> x @toks = split / \s+ | (?<=\S) (?=-) | (?<=-) (?=\S)/x, $str 0 'one' 1 'two' 2 '-' 3 '-' 4 '-' 5 6 6 '-' 7 8
El siguiente párrafo ha sido extraído la sección 'Look-Around-Assertions' en pelre. Usémoslo como texto de repaso:
Look-around assertions are zero width patterns which match a specific
pattern without including it in $&
. Positive assertions match when their
subpattern matches, negative assertions match when their subpattern
fails. Look-behind matches text up to the current match position,
look-ahead matches text following the current match position.
(?=pattern)
A zero-width positive look-ahead assertion. For example, /\w+(?=\t)/
matches a word followed by a tab, without including the tab in $&
.
(?!pattern)
A zero-width negative look-ahead assertion. For example /foo(?!bar)/
matches any occurrence of foo
that isn't followed by bar
.
Note however that look-ahead and look-behind are NOT the same thing. You cannot use this for look-behind.
If you are looking for a bar
that isn't preceded by a foo
,
/(?!foo)bar/
will not do what you want.
That's because the
(?!foo)
is just saying that the next thing cannot be foo
-and it's not, it's a bar
, so foobar
will match.
You would have to do something like /(?!foo)...bar/
for that.
We say "like" because there's the case of your bar
not having
three characters before it.
You could cover that this way:
/(?:(?!foo)...|^.{0,2})bar/
. Sometimes it's still easier just to say:
if (/bar/ && $` !~ /foo$/)
For look-behind see below.
(?<=pattern)
A zero-width positive look-behind assertion.
For example, /(?<=\t)\w+/
matches a word that follows a tab, without including the tab in $&
.
Works only for fixed-width look-behind.
\K
There is a special form of this construct, called \K
, which causes
the regex engine to 'keep' everything it had matched prior to the \K
and not include it in $&
. This effectively provides variable length
look-behind. The use of \K
inside of another look-around assertion is
allowed, but the behaviour is currently not well defined.
For various reasons \K
may be significantly more efficient than the
equivalent (?<=...)
construct, and it is especially useful in situations
where you want to efficiently remove something following something else
in a string. For instance
s/(foo)bar/$1/g;
can be rewritten as the much more efficient
s/foo\Kbar//g;
Sigue una sesión con el depurador que ilustra la semántica del operador:
casiano@millo:~$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> print "& = <$&> 1 = <$1>\n" if "alphabet" =~ /([^aeiou][a-z][aeiou])[a-z]/ & = <phab> 1 = <pha> DB<2> print "& = <$&> 1 = <$1>\n" if "alphabet" =~ /\K([^aeiou][a-z][aeiou])[a-z]/ & = <phab> 1 = <pha> DB<3> print "& = <$&> 1 = <$1>\n" if "alphabet" =~ /([^aeiou]\K[a-z][aeiou])[a-z]/ & = <hab> 1 = <pha> DB<4> print "& = <$&> 1 = <$1>\n" if "alphabet" =~ /([^aeiou][a-z]\K[aeiou])[a-z]/ & = <ab> 1 = <pha> DB<5> print "& = <$&> 1 = <$1>\n" if "alphabet" =~ /([^aeiou][a-z][aeiou])\K[a-z]/ & = <b> 1 = <pha> DB<6> print "& = <$&> 1 = <$1>\n" if "alphabet" =~ /([^aeiou][a-z][aeiou])[a-z]\K/ & = <> 1 = <pha> DB<7> @a = "alphabet" =~ /([aeiou]\K[^aeiou])/g; print "$&\n" t DB<8> x @a 0 'al' 1 'ab' 2 'et'
Otro ejemplo: eliminamos los blancos del final en una cadena:
DB<23> $x = ' cadena entre blancos ' DB<24> ($y = $x) =~ s/.*\b\K.*//g DB<25> p "<$y>" < cadena entre blancos>
(?<!pattern)
A zero-width negative look-behind assertion.
For example /(?<!bar)foo/
matches any occurrence of
foo
that does not follow bar
. Works only for fixed-width look-behind.
Veamos un ejemplo de uso. Se quiere
sustituir las extensiones .something
por .txt
en cadenas que contienen una ruta a un fichero:
casiano@millo:~$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> ($b = $a = 'abc/xyz.something') =~ s{\.[^.]*$}{.txt} DB<2> p $b abc/xyz.txt DB<3> ($b = $a = 'abc/xyz.something') =~ s/\.\K[^.]*$/txt/; DB<4> p $b abc/xyz.txt DB<5> p $a abc/xyz.something
Véase también:
Escriba una expresión regular que encuentre la última aparición de la cadena foo
en una cadena dada.
DB<6> x ($a = 'foo foo bar bar foo bar bar') =~ /foo(?!.*foo)/g; print pos($a)."\n" 19 DB<7> x ($a = 'foo foo bar bar foo bar bar') =~ s/foo(?!.*foo)/\U$&/ 0 1 DB<8> x $a 0 'foo foo bar bar FOO bar bar'
Aparentemente el operador ``mirar-adelante'' negativo es parecido a usar el operador ``mirar-adelante'' positivo con la negación de una clase.
/regexp(?![abc])/ |
/regexp(?=[^abc])/ |
Sin embargo existen al menos dos diferencias:
\d+(?!\.)
casa con $a = '452'
, mientras que \d+(?=[^.])
lo hace, pero porque
452
es 45
seguido de un carácter que no es el punto:
> cat lookaheadneg.pl #!/usr/bin/perl $a = "452"; if ($a =~ m{\d+(?=[^.])}i) { print "$a casa clase negada. \$& = $&\n"; } else { print "$a no casa\n"; } if ($a =~ m{\d+(?!\.)}i) { print "$a casa predicción negativa. \$& = $&\n"; } else { print "$b no casa\n"; } nereida:~/perl/src> lookaheadneg.pl 452 casa clase negada. $& = 45 452 casa predicción negativa. $& = 452
Otros dos ejemplos:
^(?![A-Z]*$)[a-zA-Z]*$
casa con líneas formadas por secuencias de letras tales que no todas son mayúsculas. (Obsérvese el uso de las anclas).
^(?=.*?esto)(?=.*?eso)
casan con cualquier línea en la que aparezcan
esto
y eso
. Ejemplo:
> cat estoyeso.pl #!/usr/bin/perl my $a = shift; if ($a =~ m{^(?=.*?esto)(?=.*?eso)}i) { print "$a matches.\n"; } else { print "$a does not match\n"; } >estoyeso.pl 'hola eso y esto' hola eso y esto matches. > estoyeso.pl 'hola esto y eso' hola esto y eso matches. > estoyeso.pl 'hola aquello y eso' hola aquello y eso does not match > estoyeso.pl 'hola esto y aquello' hola esto y aquello does not matchEl ejemplo muestra que la interpretación es que cada operador mirar-adelante se interpreta siempre a partir de la posición actual de búsqueda. La expresión regular anterior es básicamente equivalente a
(/esto/ && /eso/)
.
(?!000)(\d\d\d)
casa con cualquier cadena de tres dígitos que no
sea la cadena 000
.
Nótese que el ``mirar-adelante'' negativo
no puede usarse fácilmente para imitar un ``mirar-atrás'',
esto es, que no se puede imitar la conducta de
(?<!foo)bar
mediante
algo como (/?!foo)bar
. Tenga en cuenta que:
(?!foo)
es que los tres caracteres que siguen no puede ser foo
.
foo
no pertenece a /(?!foo)bar/
, pero
foobar
pertenece a (?!foo)bar/
porque bar
es una cadena
cuyos tres siguientes caracteres son bar
y no son foo
.
(?<!foo)bar
usando un lookahead negativo
tendríamos que escribir algo asi como
/(?!foo)...bar/
que casa con una cadena de tres caracteres que no sea foo
seguida de
bar
(pero que tampoco es exactamente equivalente):
pl@nereida:~/Lperltesting$ cat -n foobar.pl 1 use v5.10; 2 use strict; 3 4 my $a = shift; 5 6 for my $r (q{(?<!foo)bar}, q{(?!foo)bar}, q{(?!foo)...bar}) { 7 if ($a =~ /$r/) { 8 say "$a casa con $r" 9 } 10 else { 11 say "$a no casa con $r" 12 } 13 }
q{(?!foo)...bar}
se apróxima mas a (q{(?<!foo)bar}
:
pl@nereida:~/Lperltesting$ perl5.10.1 foobar.pl foobar foobar no casa con (?<!foo)bar foobar casa con (?!foo)bar foobar no casa con (?!foo)...bar pl@nereida:~/Lperltesting$ perl5.10.1 foobar.pl bar bar casa con (?<!foo)bar bar casa con (?!foo)bar bar no casa con (?!foo)...bar
bar
casa con (?<!foo)bar
pero no con (?!foo)...bar
.
¿Sabría encontrar una expresión regular mas apropiada usando lookahead negativo?
if (/bar/ and $` !~ /foo$/)o aún mejor (véase 3.1.4):
if (/bar/p && ${^PREMATCH} =~ /foo$/)El siguiente programa puede ser utilizado para ilustrar la equivalencia:
pl@nereida:~/Lperltesting$ cat -n foobarprematch.pl 1 use v5.10; 2 use strict; 3 4 $_ = shift; 5 6 if (/bar/p && ${^PREMATCH} =~ /foo$/) { 7 say "$_ no cumple ".q{/bar/p && ${^PREMATCH} =~ /foo$/}; 8 } 9 else { 10 say "$_ cumple ".q{/bar/p && ${^PREMATCH} =~ /foo$/}; 11 } 12 if (/(?<!foo)bar/) { 13 say "$_ casa con (?<!foo)bar" 14 } 15 else { 16 say "$_ no casa con (?<!foo)bar" 17 }Siguen dos ejecuciones:
pl@nereida:~/Lperltesting$ perl5.10.1 foobarprematch.pl bar bar cumple /bar/p && ${^PREMATCH} =~ /foo$/ bar casa con (?<!foo)bar pl@nereida:~/Lperltesting$ perl5.10.1 foobarprematch.pl foobar foobar no cumple /bar/p && ${^PREMATCH} =~ /foo$/ foobar no casa con (?<!foo)bar
foo
por foo,
usando \K
o lookbehind
lookahead
por look-ahead
usando lookaheads y lookbehinds
foo
y bar
siempre que no se incluya la
palabra baz
DB<1> x 'abc' =~ /(?=(.)(.)(.))a(b)/
s/,/, /g;pero se quiere que la sustitución no tenga lugar si la coma esta incrustada entre dos dígitos.
s/,/, /g;pero se quiere que la sustitución no tenga lugar si la coma esta incrustada entre dos dígitos. Además se pide que si hay ya un espacio después de la coma, no se duplique
pl@nereida:~/Lperltesting$ cat -n ABC123.pl 1 use warnings; 2 use strict; 3 4 my $c = 0; 5 my @p = ('^(ABC)(?!123)', '^(\D*)(?!123)',); 6 7 for my $r (@p) { 8 for my $s (qw{ABC123 ABC445}) { 9 $c++; 10 print "$c: '$s' =~ /$r/ : "; 11 <>; 12 if ($s =~ /$r/) { 13 print " YES ($1)\n"; 14 } 15 else { 16 print " NO\n"; 17 } 18 } 19 }