Mirando hacia adetrás y hacia adelante

El siguiente fragmento esta 'casi' literalmente tomado de la sección 'Looking-ahead-and-looking-behind' en perlretut:

Las zero-width assertions como caso particular de mirar atrás-adelante

In Perl regular expressions, most regexp elements 'eat up' a certain amount of string when they match. For instance, the regexp element [abc}] eats up one character of the string when it matches, in the sense that Perl moves to the next character position in the string after the match. There are some elements, however, that don't eat up characters (advance the character position) if they match.

The examples we have seen so far are the anchors. The anchor ^ matches the beginning of the line, but doesn't eat any characters.

Similarly, the word boundary anchor \b matches wherever a character matching \w is next to a character that doesn't, but it doesn't eat up any characters itself.

Anchors are examples of zero-width assertions. Zero-width, because they consume no characters, and assertions, because they test some property of the string.

In the context of our walk in the woods analogy to regexp matching, most regexp elements move us along a trail, but anchors have us stop a moment and check our surroundings. If the local environment checks out, we can proceed forward. But if the local environment doesn't satisfy us, we must backtrack.

Checking the environment entails either looking ahead on the trail, looking behind, or both.

The lookahead and lookbehind assertions are generalizations of the anchor concept. Lookahead and lookbehind are zero-width assertions that let us specify which characters we want to test for.

Lookahead assertion

The lookahead assertion is denoted by (?=regexp) and the lookbehind assertion is denoted by (?<=fixed-regexp).

En español, operador de ``trailing'' o ``mirar-adelante'' positivo. Por ejemplo, /\w+(?=\t)/ solo casa una palabra si va seguida de un tabulador, pero el tabulador no formará parte de $&. Ejemplo:

A hard RegEx problem

Los paréntesis looakehaed and lookbehind no capturan

Note that the parentheses in (?=regexp) and (?<=regexp) are non-capturing, since these are zero-width assertions.

Limitaciones del lookbehind

Lookahead (?=regexp) can match arbitrary regexps, but lookbehind (?<=fixed-regexp) only works for regexps of fixed width, i.e., a fixed number of characters long.

Negación de los operadores de lookahead y lookbehind

The negated versions of the lookahead and lookbehind assertions are denoted by (?!regexp) and (?<!fixed-regexp) respectively. They evaluate true if the regexps do not match:

Ejemplo: split con lookahead y lookbehind

Here is an example where a string containing blank-separated words, numbers and single dashes is to be split into its components.

Using /\s+/ alone won't work, because spaces are not required between dashes, or a word or a dash. Additional places for a split are established by looking ahead and behind:

Look Around en perlre

El siguiente párrafo ha sido extraído la sección 'Look-Around-Assertions' en pelre. Usémoslo como texto de repaso:

Veamos un ejemplo de uso. Se quiere sustituir las extensiones .something por .txt en cadenas que contienen una ruta a un fichero:

Operador de predicción negativo: Última ocurrencia

Escriba una expresión regular que encuentre la última aparición de la cadena foo en una cadena dada.

Diferencias entre mirar adelante negativo y mirar adelante con clase negada

Aparentemente el operador ``mirar-adelante'' negativo es parecido a usar el operador ``mirar-adelante'' positivo con la negación de una clase.

AND y AND NOT

Lookahead negativo versus lookbehind

Nótese que el ``mirar-adelante'' negativo no puede usarse fácilmente para imitar un ``mirar-atrás'', esto es, que no se puede imitar la conducta de (?<!foo)bar mediante algo como (/?!foo)bar. Tenga en cuenta que:

Ejercicios

Ejercicio 3.2.2

Escriba una sustitución que reemplaze todas las apariciones de foo por foo, usando \K o lookbehind
Escriba una sustitución que reemplaze todas las apariciones de lookahead por look-ahead usando lookaheads y lookbehinds
Escriba una expresión regular que capture todo lo que hay entre las cadenas foo y bar siempre que no se incluya la palabra baz
¿Cuál es la salida?
```
  DB<1> x 'abc' =~ /(?=(.)(.)(.))a(b)/
```
Se quiere poner un espacio en blanco después de la aparición de cada coma:
```
s/,/, /g;
```
pero se quiere que la sustitución no tenga lugar si la coma esta incrustada entre dos dígitos.
Se quiere poner un espacio en blanco después de la aparición de cada coma:
```
s/,/, /g;
```
pero se quiere que la sustitución no tenga lugar si la coma esta incrustada entre dos dígitos. Además se pide que si hay ya un espacio después de la coma, no se duplique

¿Cuál es la salida?

pl@nereida:~/Lperltesting$ cat -n ABC123.pl
     1  use warnings;
     2  use strict;
     3
     4  my $c = 0;
     5  my @p = ('^(ABC)(?!123)', '^(\D*)(?!123)',);
     6
     7  for my $r (@p) {
     8    for my $s (qw{ABC123 ABC445}) {
     9      $c++;
    10      print "$c: '$s' =~ /$r/ : ";
    11      <>;
    12      if ($s =~ /$r/) {
    13        print " YES ($1)\n";
    14      }
    15      else {
    16        print " NO\n";
    17      }
    18    }
    19  }