pl@nereida:~/Lperltesting$ cat -n c2f.pl 1 #!/usr/bin/perl -w 2 use strict; 3 4 print "Enter a temperature (i.e. 32F, 100C):\n"; 5 my $input = <STDIN>; 6 chomp($input); 7 8 if ($input !~ m/^([-+]?[0-9]+(\.[0-9]*)?)\s*([CF])$/i) { 9 warn "Expecting a temperature, so don't understand \"$input\".\n"; 10 } 11 else { 12 my $InputNum = $1; 13 my $type = $3; 14 my ($celsius, $farenheit); 15 if ($type eq "C" or $type eq "c") { 16 $celsius = $InputNum; 17 $farenheit = ($celsius * 9/5)+32; 18 } 19 else { 20 $farenheit = $InputNum; 21 $celsius = ($farenheit -32)*5/9; 22 } 23 printf "%.2f C = %.2f F\n", $celsius, $farenheit; 24 }
Véase también:
perldoc
perlrequick
perldoc
perlretut
perldoc
perlre
perldoc
perlreref
Ejecución con el depurador:
pl@nereida:~/Lperltesting$ perl -wd c2f.pl Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(c2f.pl:4): print "Enter a temperature (i.e. 32F, 100C):\n"; DB<1> c 8 Enter a temperature (i.e. 32F, 100C): 32F main::(c2f.pl:8): if ($input !~ m/^([-+]?[0-9]+(\.[0-9]*)?)\s*([CF])$/i) { DB<2> n main::(c2f.pl:12): my $InputNum = $1; DB<2> x ($1, $2, $3) 0 32 1 undef 2 'F' DB<3> use YAPE::Regex::Explain DB<4> p YAPE::Regex::Explain->new('([-+]?[0-9]+(\.[0-9]*)?)\s*([CF])$')->explain The regular expression: (?-imsx:([-+]?[0-9]+(\.[0-9]*)?)\s*([CF])$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [-+]? any character of: '-', '+' (optional (matching the most amount possible)) ---------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \2 (optional (matching the most amount possible)): ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- [0-9]* any character of: '0' to '9' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )? end of \2 (NOTE: because you're using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \2) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- [CF] any character of: 'C', 'F' ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Dentro de una expresión regular es necesario
referirse a los textos que casan con el primer, paréntesis,
segundo, etc. como \1
, \2,
etc. La notación
$1
se refieré a lo que casó con el primer paréntesis
en el último matching, no en el actual. Veamos un ejemplo:
pl@nereida:~/Lperltesting$ cat -n dollar1slash1.pl 1 #!/usr/bin/perl -w 2 use strict; 3 4 my $a = "hola juanito"; 5 my $b = "adios anita"; 6 7 $a =~ /(ani)/; 8 $b =~ s/(adios) *($1)/\U$1 $2/; 9 print "$b\n";Observe como el
$1
que aparece en la cadena de reemplazo (línea 8)
se refiere a la cadena adios
mientras que el $1
en la primera parte contiene ani
:
pl@nereida:~/Lperltesting$ ./dollar1slash1.pl ADIOS ANIta
$b =~ s/(adios) *(\1)/\U$1 $2/;
El número de paréntesis con memoria no está limitado:
pl@nereida:~/Lperltesting$ perl -wde 0 main::(-e:1): 0 123456789ABCDEF DB<1> $x = "123456789AAAAAA" 1 2 3 4 5 6 7 8 9 10 11 12 DB<2> $r = $x =~ /(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)\11/; print "$r\n$10\n$11\n" 1 A A
Véase el siguiente párrafo de perlre (sección Capture buffers):
There is no limit to the number of captured substrings that you may use. However Perl also uses\10
,\11
, etc. as aliases for\010
,\011
, etc. (Recall that0
means octal, so\011
is the character at number9
in your coded character set; which would be the 10th character, a horizontal tab under ASCII.) Perl resolves this ambiguity by interpreting\10
as a backreference only if at least10
left parentheses have opened before it. Likewise\11
is a backreference only if at least11
left parentheses have opened before it. And so on.\1
through\9
are always interpreted as backreferences.
Si se utiliza en un contexto que requiere una lista,
el ``pattern match'' retorna una lista consistente en
las subexpresiones casadas mediante los paréntesis,
esto es $1
, $2
, $3
, ....
Si no hubiera emparejamiento se retorna la lista vacía.
Si lo hubiera pero no hubieran paréntesis se retorna la lista
($&)
.
pl@nereida:~/src/perl/perltesting$ cat -n escapes.pl 1 #!/usr/bin/perl -w 2 use strict; 3 4 my $foo = "one two three four five\nsix seven"; 5 my ($F1, $F2, $Etc) = ($foo =~ /^\s*(\S+)\s+(\S+)\s*(.*)/); 6 print "List Context: F1 = $F1, F2 = $F2, Etc = $Etc\n"; 7 8 # This is 'almost' the same than: 9 ($F1, $F2, $Etc) = split(/\s+/, $foo, 3); 10 print "Split: F1 = $F1, F2 = $F2, Etc = $Etc\n";Observa el resultado de la ejecución:
pl@nereida:~/src/perl/perltesting$ ./escapes.pl List Context: F1 = one, F2 = two, Etc = three four five Split: F1 = one, F2 = two, Etc = three four five six seven
La opción s
usada en una regexp
hace que el punto '.'
case con el retorno
de carro:
pl@nereida:~/src/perl/perltesting$ perl -wd ./escapes.pl main::(./escapes.pl:4): my $foo = "one two three four five\nsix seven"; DB<1> c 9 List Context: F1 = one, F2 = two, Etc = three four five main::(./escapes.pl:9): ($F1, $F2, $Etc) = split(' ',$foo, 3); DB<2> ($F1, $F2, $Etc) = ($foo =~ /^\s*(\S+)\s+(\S+)\s*(.*)/s) DB<3> p "List Context: F1 = $F1, F2 = $F2, Etc = $Etc\n" List Context: F1 = one, F2 = two, Etc = three four five six seven
La opción /s
hace que .
se empareje con
un \n
.
Esto es, casa con cualquier carácter.
Veamos otro ejemplo, que imprime los nombres de los ficheros que contienen cadenas que casan con un patrón dado, incluso si este aparece disperso en varias líneas:
1 #!/usr/bin/perl -w 2 #use: 3 #smodifier.pl 'expr' files 4 #prints the names of the files that match with the give expr 5 undef $/; # input record separator 6 my $what = shift @ARGV; 7 while(my $file = shift @ARGV) { 8 open(FILE, "<$file"); 9 $line = <FILE>; 10 if ($line =~ /$what/s) { 11 print "$file\n"; 12 } 13 }
Ejemplo de uso:
> smodifier.pl 'three.*three' double.in split.pl doublee.pl double.in doublee.pl
Vea la sección 3.4.2 para ver los contenidos
del fichero double.in
. En dicho fichero,
el patrón three.*three
aparece repartido entre
varias líneas.
El modificador s
se suele usar conjuntamente con el modificador
m
. He aquí lo que dice
la seccion Using character classes de la sección 'Using-character-classes' en perlretut
al respecto:
m
modifier (//m
): Treat string as a set of multiple lines.
'.'
matches any character except \n
.
^
and $
are able to match at the start or end of any line within the string.
s
and m
modifiers (//sm
): Treat string as a single long line, but detect multiple lines.
'.'
matches any character, even \n
.
^
and $
, however, are able to match at the start or end of any line within the string.
Here are examples of //s and //m in action:
1. $x = "There once was a girl\nWho programmed in Perl\n"; 2. 3. $x =~ /^Who/; # doesn't match, "Who" not at start of string 4. $x =~ /^Who/s; # doesn't match, "Who" not at start of string 5. $x =~ /^Who/m; # matches, "Who" at start of second line 6. $x =~ /^Who/sm; # matches, "Who" at start of second line 7. 8. $x =~ /girl.Who/; # doesn't match, "." doesn't match "\n" 9. $x =~ /girl.Who/s; # matches, "." matches "\n" 10. $x =~ /girl.Who/m; # doesn't match, "." doesn't match "\n" 11. $x =~ /girl.Who/sm; # matches, "." matches "\n"
Most of the time, the default behavior is what is wanted, but//s
and//m
are occasionally very useful. If//m
is being used, the start of the string can still be matched with\A
and the end of the string can still be matched with the anchors\Z
(matches both the end and the newline before, like$
), and\z
(matches only the end):
1. $x =~ /^Who/m; # matches, "Who" at start of second line 2. $x =~ /\AWho/m; # doesn't match, "Who" is not at start of string 3. 4. $x =~ /girl$/m; # matches, "girl" at end of first line 5. $x =~ /girl\Z/m; # doesn't match, "girl" is not at end of string 6. 7. $x =~ /Perl\Z/m; # matches, "Perl" is at newline before end 8. $x =~ /Perl\z/m; # doesn't match, "Perl" is not at end of stringNormalmente el carácter
^
casa solamente con el comienzo de la
cadena y el carácter $
con el final. Los \n
empotrados
no casan
con ^
ni $
. El modificador /m
modifica esta
conducta. De este modo ^
y $
casan con cualquier frontera
de línea interna. Las anclas \A
y \Z
se utilizan entonces
para casar con
el comienzo y final de la cadena.
Véase un ejemplo:
nereida:~/perl/src> perl -de 0 DB<1> $a = "hola\npedro" DB<2> p "$a" hola pedro DB<3> $a =~ s/.*/x/m DB<4> p $a x pedro DB<5> $a =~ s/^pedro$/juan/ DB<6> p "$a" x pedro DB<7> $a =~ s/^pedro$/juan/m DB<8> p "$a" x juan
Reescribamos el ejemplo anterior usando un contexto de lista:
casiano@millo:~/Lperltesting$ cat -n c2f_list.pl 1 #!/usr/bin/perl -w 2 use strict; 3 4 print "Enter a temperature (i.e. 32F, 100C):\n"; 5 my $input = <STDIN>; 6 chomp($input); 7 8 my ($InputNum, $type); 9 10 ($InputNum, $type) = $input =~ m/^ 11 ([-+]?[0-9]+(?:\.[0-9]*)?) # Temperature 12 \s* 13 ([cCfF]) # Celsius or Farenheit 14 $/x; 15 16 die "Expecting a temperature, so don't understand \"$input\".\n" unless defined($InputNum); 17 18 my ($celsius, $fahrenheit); 19 if ($type eq "C" or $type eq "c") { 20 $celsius = $InputNum; 21 $fahrenheit = ($celsius * 9/5)+32; 22 } 23 else { 24 $fahrenheit = $InputNum; 25 $celsius = ($fahrenheit -32)*5/9; 26 } 27 printf "%.2f C = %.2f F\n", $celsius, $fahrenheit;
La opción /x
en una regexp permite utilizar comentarios y
espacios dentro de la expresión
regular. Los espacios dentro de la expresión regular dejan de ser significativos.
Si quieres conseguir un espacio que sea significativo, usa \s
o
bien escápalo. Véase la sección 'Modifiers' en perlre
y
la sección 'Building-a-regexp' en perlretut.
La notación (?: ... )
se usa para introducir paréntesis de agrupamiento sin memoria.
(?: ...)
Permite agrupar las expresiones tal y como lo hacen los
paréntesis ordinarios. La diferencia es que no ``memorizan''
esto es no guardan nada en $1
, $2
, etc.
Se logra así una compilación mas eficiente. Veamos un ejemplo:
> cat groupingpar.pl #!/usr/bin/perl my $a = shift; $a =~ m/(?:hola )*(juan)/; print "$1\n"; nereida:~/perl/src> groupingpar.pl 'hola juan' juan
El patrón regular puede contener variables, que serán interpoladas
(en tal caso, el patrón será recompilado).
Si quieres que dicho patrón se compile una sóla vez, usa la opción
/o
.
pl@nereida:~/Lperltesting$ cat -n mygrep.pl 1 #!/usr/bin/perl -w 2 my $what = shift @ARGV || die "Usage $0 regexp files ...\n"; 3 while (<>) { 4 print "File $ARGV, rel. line $.: $_" if (/$what/o); # compile only once 5 } 6Sigue un ejemplo de ejecución:
pl@nereida:~/Lperltesting$ ./mygrep.pl Usage ./mygrep.pl regexp files ... pl@nereida:~/Lperltesting$ ./mygrep.pl if labels.c File labels.c, rel. line 7: if (a < 10) goto LABEL;
El siguiente texto es de la sección 'Using-regular-expressions-in-Perl' en perlretut:
If $pattern
won't be changing over the lifetime of the script,
we can add the //o modifier, which directs Perl to only perform variable
substitutions once
Otra posibilidad es hacer una compilación previa usando el operador
qr
(véase la sección 'Regexp-Quote-Like-Operators' en perlop).
La siguiente variante del programa anterior también compila el patrón
una sóla vez:
pl@nereida:~/Lperltesting$ cat -n mygrep2.pl 1 #!/usr/bin/perl -w 2 my $what = shift @ARGV || die "Usage $0 regexp files ...\n"; 3 $what = qr{$what}; 4 while (<>) { 5 print "File $ARGV, rel. line $.: $_" if (/$what/); 6 }
Véase
El siguiente extracto de la sección Matching Repetitions en la sección 'Matching-repetitions' en perlretut
ilustra la semántica greedy de los operadores de repetición *+{}?
etc.
For all of these quantifiers, Perl will try to match as much of the string as possible, while still allowing the regexp to succeed. Thus with/a?.../
, Perl will first try to match the regexp with the a present; if that fails, Perl will try to match the regexp without the a present. For the quantifier*
, we get the following:
1. $x = "the cat in the hat"; 2. $x =~ /^(.*)(cat)(.*)$/; # matches, 3. # $1 = 'the ' 4. # $2 = 'cat' 5. # $3 = ' in the hat'
Which is what we might expect, the match finds the only cat in the string and locks onto it. Consider, however, this regexp:
1. $x =~ /^(.*)(at)(.*)$/; # matches, 2. # $1 = 'the cat in the h' 3. # $2 = 'at' 4. # $3 = '' (0 characters match)
One might initially guess that Perl would find theat
incat
and stop there, but that wouldn't give the longest possible string to the first quantifier.*
. Instead, the first quantifier.*
grabs as much of the string as possible while still having the regexp match. In this example, that means having theat
sequence with the finalat
in the string.
The other important principle illustrated here is that when there are two or more elements in a regexp, the leftmost quantifier, if there is one, gets to grab as much the string as possible, leaving the rest of the regexp to fight over scraps. Thus in our example, the first quantifier.*
grabs most of the string, while the second quantifier.*
gets the empty string. Quantifiers that grab as much of the string as possible are called maximal match or greedy quantifiers.
When a regexp can match a string in several different ways, we can use the principles above to predict which way the regexp will match:
Principle 0: Taken as a whole, any regexp will be matched at the earliest possible position in the string.
Principle 1: In an alternation a|b|c...
, the leftmost alternative that allows a match for the whole regexp will be the one used.
Principle 2: The maximal matching quantifiers ?
, *
, +
and {n,m}
will in general match as much of the string as possible while still allowing the whole regexp to match.
El siguiente párrafo está tomado de la sección 'Repeated-Patterns-Matching-a-Zero-length-Substring' en perlre:
Regular expressions provide a terse and powerful programming language. As with most other power tools, power comes together with the ability to wreak havoc.
A common abuse of this power stems from the ability to make infinite loops using regular expressions, with something as innocuous as:
1. 'foo' =~ m{ ( o? )* }x;
Theo?
matches at the beginning of'foo'
, and since the position in the string is not moved by the match,o?
would match again and again because of the*
quantifier.
Another common way to create a similar cycle is with the looping modifier //g
:
1. @matches = ( 'foo' =~ m{ o? }xg );
or
1. print "match: <$&>\n" while 'foo' =~ m{ o? }xg;
or the loop implied by split()
.
... Perl allows such constructs, by forcefully breaking the infinite loop. The rules for this are different for lower-level loops given by the greedy quantifiers*+{}
, and for higher-level ones like the/g
modifier orsplit()
operator.
The lower-level loops are interrupted (that is, the loop is broken) when Perl detects that a repeated expression matched a zero-length substring. Thus
1. m{ (?: NON_ZERO_LENGTH | ZERO_LENGTH )* }x;
is made equivalent to
1. m{ (?: NON_ZERO_LENGTH )* 2. | 3. (?: ZERO_LENGTH )? 4. }x;
The higher level-loops preserve an additional state between iterations: whether the last match was zero-length. To break the loop, the following match after a zero-length match is prohibited to have a length of zero. This prohibition interacts with backtracking (see Backtracking), and so the second best match is chosen if the best match is of zero length.
For example:
1. $_ = 'bar'; 2. s/\w??/<$&>/g;
results in<><b><><a><><r><>
. At each position of the string the best match given by non-greedy??
is the zero-length match, and the second best match is what is matched by\w
. Thus zero-length matches alternate with one-character-long matches.
Similarly, for repeated m/()/g
the second-best match is the match at
the position one notch further in the string.
The additional state of being matched with zero-length is associated with the matched string, and is reset by each assignment topos()
. Zero-length matches at the end of the previous match are ignored duringsplit
.
DB<25> $c = 0 DB<26> print(($c++).": <$&>\n") while 'aaaabababab' =~ /a*(ab)*/g; 0: <aaaa> 1: <> 2: <a> 3: <> 4: <a> 5: <> 6: <a> 7: <> 8: <>
Las expresiones lazy o no greedy hacen que el NFA se detenga en la cadena mas corta que
casa con la expresión. Se denotan como sus análogas greedy añadiéndole el
postfijo ?
:
{n,m}?
{n,}?
{n}?
*?
+?
??
Repasemos lo que dice la sección Matching Repetitions en la sección 'Matching-repetitions' en perlretut:
Sometimes greed is not good. At times, we would like quantifiers to match a minimal piece of string, rather than a maximal piece. For this purpose, Larry Wall created the minimal match or non-greedy quantifiers??
,*?
,+?
, and{}?
. These are the usual quantifiers with a ? appended to them. They have the following meanings:
a??
means: match 'a' 0 or 1 times. Try 0 first, then 1.
a*?
means: match 'a' 0 or more times, i.e., any number of times, but as few times as possible
a+?
means: match 'a' 1 or more times, i.e., at least once, but as few times as possible
a{n,m}?
means: match at least n times, not more than m times, as few times as possible
a{n,}?
means: match at least n times, but as few times as possible
a{n}?
means: match exactly n times. Because we match exactly n times, an? is equivalent to an and is just there for notational consistency.
Let's look at the example above, but with minimal quantifiers:
1. $x = "The programming republic of Perl"; 2. $x =~ /^(.+?)(e|r)(.*)$/; # matches, 3. # $1 = 'Th' 4. # $2 = 'e' 5. # $3 = ' programming republic of Perl'
The minimal string that will allow both the start of the string^
and the alternation to match isTh
, with the alternatione|r
matchinge
. The second quantifier.*
is free to gobble up the rest of the string.
1. $x =~ /(m{1,2}?)(.*?)$/; # matches, 2. # $1 = 'm' 3. # $2 = 'ming republic of Perl'
The first string position that this regexp can match is at the firstm
in programming . At this position, the minimalm{1,2}?
matches just onem
. Although the second quantifier.*?
would prefer to match no characters, it is constrained by the end-of-string anchor$
to match the rest of the string.
1. $x =~ /(.*?)(m{1,2}?)(.*)$/; # matches, 2. # $1 = 'The progra' 3. # $2 = 'm' 4. # $3 = 'ming republic of Perl'
In this regexp, you might expect the first minimal quantifier.*?
to match the empty string, because it is not constrained by a^
anchor to match the beginning of the word. Principle 0 applies here, however. Because it is possible for the whole regexp to match at the start of the string, it will match at the start of the string. Thus the first quantifier has to match everything up to the first m. The second minimal quantifier matches just onem
and the third quantifier matches the rest of the string.
1. $x =~ /(.??)(m{1,2})(.*)$/; # matches, 2. # $1 = 'a' 3. # $2 = 'mm' 4. # $3 = 'ing republic of Perl'
Just as in the previous regexp, the first quantifier.??
can match earliest at positiona
, so it does. The second quantifier is greedy, so it matches mm , and the third matches the rest of the string.
We can modify principle 3 above to take into account non-greedy quantifiers:
Principle 3: If there are two or more elements in a regexp, the leftmost greedy (non-greedy) quantifier, if any, will match as much (little) of the string as possible while still allowing the whole regexp to match. The next leftmost greedy (non-greedy) quantifier, if any, will try to match as much (little) of the string remaining available to it as possible, while still allowing the whole regexp to match. And so on, until all the regexp elements are satisfied.
casiano@millo:~/Lperltesting$ perl -wde 0 main::(-e:1): 0 DB<1> x ('1'x34) =~ m{^(11+)\1+$} 0 11111111111111111 DB<2> x ('1'x34) =~ m{^(11+?)\1+$} ????????????????????????????????????
Just like alternation, quantifiers are also susceptible to backtracking. Here is a step-by-step analysis of the example
1. $x = "the cat in the hat"; 2. $x =~ /^(.*)(at)(.*)$/; # matches, 3. # $1 = 'the cat in the h' 4. # $2 = 'at' 5. # $3 = '' (0 matches)
Start with the first letter in the string 't'.
The first quantifier '.*' starts out by matching the whole string 'the cat in the hat'.
'a' in the regexp element 'at' doesn't match the end of the string. Backtrack one character.
'a' in the regexp element 'at' still doesn't match the last letter of the string 't', so backtrack one more character.
Now we can match the 'a' and the 't'.
Move on to the third element '.*'. Since we are at the end of the string and '.*' can match 0 times, assign it the empty string.
We are done!
La forma en la que se escribe una regexp puede dar lugar agrandes variaciones en el rendimiento. Repasemos lo que dice la sección Matching Repetitions en la sección 'Matching-repetitions' en perlretut:
Most of the time, all this moving forward and backtracking happens quickly and searching is fast. There are some pathological regexps, however, whose execution time exponentially grows with the size of the string. A typical structure that blows up in your face is of the form
/(a|b+)*/;
The problem is the nested indeterminate quantifiers. There are many different ways of partitioning a string of length n between the+
and*
: one repetition withb+
of length , two repetitions with the firstb+
length and the second with length , repetitions whose bits add up to length , etc.
In fact there are an exponential number of ways to partition a string as a function of its length. A regexp may get lucky and match early in the process, but if there is no match, Perl will try every possibility before giving up. So be careful with nested*
's,{n,m}
's, and+
's.
The book Mastering Regular Expressions by Jeffrey Friedl [3] gives a wonderful discussion of this and other efficiency issues.
El siguiente ejemplo elimina los comentarios de un programa C
.
casiano@millo:~/Lperltesting$ cat -n comments.pl 1 #!/usr/bin/perl -w 2 use strict; 3 4 my $progname = shift @ARGV or die "Usage:\n$0 prog.c\n"; 5 open(my $PROGRAM,"<$progname") || die "can't find $progname\n"; 6 my $program = ''; 7 { 8 local $/ = undef; 9 $program = <$PROGRAM>; 10 } 11 $program =~ s{ 12 /\* # Match the opening delimiter 13 .*? # Match a minimal number of characters 14 \*/ # Match the closing delimiter 15 }[]gsx; 16 17 print $program;Veamos un ejemplo de ejecución. Supongamos el fichero de entrada:
> cat hello.c #include <stdio.h> /* first comment */ main() { printf("hello world!\n"); /* second comment */ }
Entonces la ejecución con ese fichero de entrada produce la salida:
> comments.pl hello.c #include <stdio.h> main() { printf("hello world!\n"); }Veamos la diferencia de comportamiento entre
*
y *?
en el ejemplo anterior:
pl@nereida:~/src/perl/perltesting$ perl5_10_1 -wde 0 main::(-e:1): 0 DB<1> use re 'debug'; 'main() /* 1c */ { /* 2c */ return; /* 3c */ }' =~ qr{(/\*.*\*/)}; print "\n$1\n" Compiling REx "(/\*.*\*/)" Final program: 1: OPEN1 (3) 3: EXACT *> (5) 5: STAR (7) 6: REG_ANY (0) 7: EXACT <*/> (9) 9: CLOSE1 (11) 11: END (0) anchored "/*" at 0 floating "*/" at 2..2147483647 (checking floating) minlen 4 Guessing start of match in sv for REx "(/\*.*\*/)" against "main() /* 1c */ { /* 2c */ return; /* 3c */ }" Found floating substr "*/" at offset 13... Found anchored substr "/*" at offset 7... Starting position does not contradict /^/m... Guessed: match at offset 7 Matching REx "(/\*.*\*/)" against "/* 1c */ { /* 2c */ return; /* 3c */ }" 7* 1c */ {> | 1:OPEN1(3) 7 * 1c */ {> | 3:EXACT *>(5) 9 <() /*> < 1c */ { /> | 5:STAR(7) REG_ANY can match 36 times out of 2147483647... 41 <; /* 3c > <*/ }> | 7: EXACT <*/>(9) 43 <; /* 3c */> < }> | 9: CLOSE1(11) 43 <; /* 3c */> < }> | 11: END(0) Match successful! /* 1c */ { /* 2c */ return; /* 3c */ Freeing REx: "(/\*.*\*/)" DB<2> use re 'debug'; 'main() /* 1c */ { /* 2c */ return; /* 3c */ }' =~ qr{(/\*.*?\*/)}; print "\n$1\n" Compiling REx "(/\*.*?\*/)" Final program: 1: OPEN1 (3) 3: EXACT *> (5) 5: MINMOD (6) 6: STAR (8) 7: REG_ANY (0) 8: EXACT <*/> (10) 10: CLOSE1 (12) 12: END (0) anchored "/*" at 0 floating "*/" at 2..2147483647 (checking floating) minlen 4 Guessing start of match in sv for REx "(/\*.*?\*/)" against "main() /* 1c */ { /* 2c */ return; /* 3c */ }" Found floating substr "*/" at offset 13... Found anchored substr "/*" at offset 7... Starting position does not contradict /^/m... Guessed: match at offset 7 Matching REx "(/\*.*?\*/)" against "/* 1c */ { /* 2c */ return; /* 3c */ }" 7 * 1c */ {> | 1:OPEN1(3) 7 * 1c */ {> | 3:EXACT *>(5) 9 <() /*> < 1c */ { /> | 5:MINMOD(6) 9 <() /*> < 1c */ { /> | 6:STAR(8) REG_ANY can match 4 times out of 4... 13 <* 1c > <*/ { /* 2c> | 8: EXACT <*/>(10) 15 <1c */> < { /* 2c *> | 10: CLOSE1(12) 15 <1c */> < { /* 2c *> | 12: END(0) Match successful! /* 1c */ Freeing REx: "(/\*.*?\*/)" DB<3>
Véase también la documentación en la sección 'Matching-repetitions' en perlretut y la sección 'Quantifiers' en perlre.
X[^X]*X
y X.*?X
, donde X
es un carácter arbitrario se usan de forma casi equivalente.
Una cadena que no contiene X
en su interior y que está delimitada por X
s
Una cadena que comienza en X
y termina en la X
mas próxima a la X
de comienzo
Esta equivalencia se rompe si no se cumplen las hipótesis establecidas.
En el siguiente ejemplo se intentan detectar las cadenas entre comillas dobles que terminan en el signo de exclamación:
pl@nereida:~/Lperltesting$ cat -n negynogreedy.pl 1 #!/usr/bin/perl -w 2 use strict; 3 4 my $b = 'Ella dijo "Ana" y yo contesté: "Jamás!". Eso fué todo.'; 5 my $a; 6 ($a = $b) =~ s/".*?!"/-$&-/; 7 print "$a\n"; 8 9 $b =~ s/"[^"]*!"/-$&-/; 10 print "$b\n";
Al ejecutar el programa obtenemos:
> negynogreedy.pl Ella dijo -"Ana" y yo contesté: "Jamás!"-. Eso fué todo. Ella dijo "Ana" y yo contesté: -"Jamás!"-. Eso fué todo.
=~
nos permite ``asociar'' la variable
con la operación de casamiento o sustitución. Si se trata de una sustitución
y se quiere conservar la cadena, es necesario hacer una copia:
$d = $s; $d =~ s/esto/por lo otro/;en vez de eso, puedes abreviar un poco usando la siguiente ``perla'':
($d = $s) =~ s/esto/por lo otro/;Obsérvese la asociación por la izquierda del operador de asignación.
Las referencias relativas permiten escribir expresiones regulares mas reciclables. Véase la documentación en la sección 'Relative-backreferences' en perlretut:
Counting the opening parentheses to get the correct number for a backreference is errorprone as soon as there is more than one capturing group. A more convenient technique became available with Perl 5.10: relative backreferences. To refer to the immediately preceding capture group one now may write\g{-1}
, the next but last is available via\g{-2}
, and so on.
Another good reason in addition to readability and maintainability for using relative backreferences is illustrated by the following example, where a simple pattern for matching peculiar strings is used:
1. $a99a = '([a-z])(\d)\2\1'; # matches a11a, g22g, x33x, etc.
Now that we have this pattern stored as a handy string, we might feel tempted to use it as a part of some other pattern:
1. $line = "code=e99e"; 2. if ($line =~ /^(\w+)=$a99a$/){ # unexpected behavior! 3. print "$1 is valid\n"; 4. } else { 5. print "bad line: '$line'\n"; 6. }
But this doesn't match - at least not the way one might expect. Only after inserting the interpolated$a99a
and looking at the resulting full text of the regexp is it obvious that the backreferences have backfired - the subexpression(\w+)
has snatched number 1 and demoted the groups in$a99a
by one rank. This can be avoided by using relative backreferences:
1. $a99a = '([a-z])(\d)\g{-1}\g{-2}'; # safe for being interpolated
El siguiente programa ilustra lo dicho:
casiano@millo:~/Lperltesting$ cat -n backreference.pl 1 use strict; 2 use re 'debug'; 3 4 my $a99a = '([a-z])(\d)\2\1'; 5 my $line = "code=e99e"; 6 if ($line =~ /^(\w+)=$a99a$/){ # unexpected behavior! 7 print "$1 is valid\n"; 8 } else { 9 print "bad line: '$line'\n"; 10 }Sigue la ejecución:
casiano@millo:~/Lperltesting$ perl5.10.1 -wd backreference.pl main::(backreference.pl:4): my $a99a = '([a-z])(\d)\2\1'; DB<1> c 6 main::(backreference.pl:6): if ($line =~ /^(\w+)=$a99a$/){ # unexpected behavior! DB<2> x ($line =~ /^(\w+)=$a99a$/) empty array DB<4> $a99a = '([a-z])(\d)\g{-1}\g{-2}' DB<5> x ($line =~ /^(\w+)=$a99a$/) 0 'code' 1 'e' 2 9
El siguiente texto esta tomado de la sección 'Named-backreferences' en perlretut:
Perl 5.10 also introduced named capture buffers and named backreferences. To attach a name to a capturing group, you write either(?<name>...)
or(?'name'...)
. The backreference may then be written as\g{name}
.
It is permissible to attach the same name to more than
one group, but then only the leftmost one of the eponymous set can be
referenced. Outside of the pattern a named capture buffer is accessible
through the %+
hash.
Assuming that we have to match calendar dates which may be given in one of the three formatsyyyy-mm-dd
,mm/dd/yyyy
ordd.mm.yyyy
, we can write three suitable patterns where we use'd'
,'m'
and'y'
respectively as the names of the buffers capturing the pertaining components of a date. The matching operation combines the three patterns as alternatives:
1. $fmt1 = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)'; 2. $fmt2 = '(?<m>\d\d)/(?<d>\d\d)/(?<y>\d\d\d\d)'; 3. $fmt3 = '(?<d>\d\d)\.(?<m>\d\d)\.(?<y>\d\d\d\d)'; 4. for my $d qw( 2006-10-21 15.01.2007 10/31/2005 ){ 5. if ( $d =~ m{$fmt1|$fmt2|$fmt3} ){ 6. print "day=$+{d} month=$+{m} year=$+{y}\n"; 7. } 8. }
If any of the alternatives matches, the hash %+
is bound to contain the three key-value pairs.
En efecto, al ejecutar el programa:
casiano@millo:~/Lperltesting$ cat -n namedbackreferences.pl 1 use v5.10; 2 use strict; 3 4 my $fmt1 = '(?<y>\d\d\d\d)-(?<m>\d\d)-(?<d>\d\d)'; 5 my $fmt2 = '(?<m>\d\d)/(?<d>\d\d)/(?<y>\d\d\d\d)'; 6 my $fmt3 = '(?<d>\d\d)\.(?<m>\d\d)\.(?<y>\d\d\d\d)'; 7 8 for my $d qw( 2006-10-21 15.01.2007 10/31/2005 ){ 9 if ( $d =~ m{$fmt1|$fmt2|$fmt3} ){ 10 print "day=$+{d} month=$+{m} year=$+{y}\n"; 11 } 12 }Obtenemos la salida:
casiano@millo:~/Lperltesting$ perl5.10.1 -w namedbackreferences.pl day=21 month=10 year=2006 day=15 month=01 year=2007 day=31 month=10 year=2005
Como se comentó:
... It is permissible to attach the same name to more than one group, but then only the leftmost one of the eponymous set can be referenced.
Veamos un ejemplo:
pl@nereida:~/Lperltesting$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> # ... only the leftmost one of the eponymous set can be referenced DB<2> $r = qr{(?<a>[a-c])(?<a>[a-f])} DB<3> print $+{a} if 'ad' =~ $r a DB<4> print $+{a} if 'cf' =~ $r c DB<5> print $+{a} if 'ak' =~ $r
Reescribamos el ejemplo de conversión de temperaturas usando paréntesis con nombre:
pl@nereida:~/Lperltesting$ cat -n c2f_5_10v2.pl 1 #!/usr/local/bin/perl5_10_1 -w 2 use strict; 3 4 print "Enter a temperature (i.e. 32F, 100C):\n"; 5 my $input = <STDIN>; 6 chomp($input); 7 8 $input =~ m/^ 9 (?<farenheit>[-+]?[0-9]+(?:\.[0-9]*)?)\s*[fF] 10 | 11 (?<celsius>[-+]?[0-9]+(?:\.[0-9]*)?)\s*[cC] 12 $/x; 13 14 my ($celsius, $farenheit); 15 if (exists $+{celsius}) { 16 $celsius = $+{celsius}; 17 $farenheit = ($celsius * 9/5)+32; 18 } 19 elsif (exists $+{farenheit}) { 20 $farenheit = $+{farenheit}; 21 $celsius = ($farenheit -32)*5/9; 22 } 23 else { 24 die "Expecting a temperature, so don't understand \"$input\".\n"; 25 } 26 27 printf "%.2f C = %.2f F\n", $celsius, $farenheit;
La función exists retorna verdadero si existe la clave en el hash y falso en otro caso.
El uso de nombres hace mas robustas y mas factorizables las expresiones regulares. Consideremos la siguiente regexp que usa notación posicional:
pl@nereida:~/Lperltesting$ perl5.10.1 -wde 0 main::(-e:1): 0 DB<1> x "abbacddc" =~ /(.)(.)\2\1/ 0 'a' 1 'b'Supongamos que queremos reutilizar la regexp con repetición
DB<2> x "abbacddc" =~ /((.)(.)\2\1){2}/ empty array¿Que ha ocurrido? La introducción del nuevo paréntesis nos obliga a renombrar las referencias a las posiciones:
DB<3> x "abbacddc" =~ /((.)(.)\3\2){2}/ 0 'cddc' 1 'c' 2 'd' DB<4> "abbacddc" =~ /((.)(.)\3\2){2}/; print "$&\n" abbacddcEsto no ocurre si utilizamos nombres. El operador
\k<a>
sirve para hacer referencia
al valor que ha casado con el paréntesis con nombre a
:
DB<5> x "abbacddc" =~ /((?<a>.)(?<b>.)\k<b>\k<a>){2}/ 0 'cddc' 1 'c' 2 'd'El uso de grupos con nombre y
\k
3.1en lugar de referencias numéricas absolutas
hace que la regexp sea mas reutilizable.
Es posible también llamar a la expresión regular asociada con un paréntesis.
Este parrafo tomado de la sección 'Extended-Patterns' en perlre explica el modo de uso:
(?PARNO) (?-PARNO) (?+PARNO) (?R) (?0)
PARNO
is a sequence of digits (not starting with 0) whose value reflects
the paren-number of the capture buffer to recurse to.
....
Capture buffers contained by the pattern will have the value as determined by the outermost recursion. ....
IfPARNO
is preceded by a plus or minus sign then it is assumed to be relative, with negative numbers indicating preceding capture buffers and positive ones following. Thus(?-1)
refers to the most recently declared buffer, and(?+1)
indicates the next buffer to be declared.
Note that the counting for relative recursion differs from that of relative backreferences, in that with recursion unclosed buffers are included.
Veamos un ejemplo:
casiano@millo:~/Lperltesting$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> x "AABB" =~ /(A)(?-1)(?+1)(B)/ 0 'A' 1 'B' # Parenthesis: 1 2 2 1 DB<2> x 'ababa' =~ /^((?:([ab])(?1)\g{-1}|[ab]?))$/ 0 'ababa' 1 'a' DB<3> x 'bbabababb' =~ /^((?:([ab])(?1)\g{-1}|[ab]?))$/ 0 'bbabababb' 1 'b'
Véase también:
La siguiente reescritura de nuestro ejemplo básico utiliza el módulo Regexp::Common para factorizar la expresión regular:
casiano@millo:~/src/perl/perltesting$ cat -n c2f_5_10v3.pl 1 #!/soft/perl5lib/bin/perl5.10.1 -w 2 use strict; 3 use Regexp::Common; 4 5 print "Enter a temperature (i.e. 32F, 100C):\n"; 6 my $input = <STDIN>; 7 chomp($input); 8 9 $input =~ m/^ 10 (?<farenheit>$RE{num}{real})\s*[fF] 11 | 12 (?<celsius>$RE{num}{real})\s*[cC] 13 $/x; 14 15 my ($celsius, $farenheit); 16 if ('celsius' ~~ %+) { 17 $celsius = $+{celsius}; 18 $farenheit = ($celsius * 9/5)+32; 19 } 20 elsif ('farenheit' ~~ %+) { 21 $farenheit = $+{farenheit}; 22 $celsius = ($farenheit -32)*5/9; 23 } 24 else { 25 die "Expecting a temperature, so don't understand \"$input\".\n"; 26 } 27 28 printf "%.2f C = %.2f F\n", $celsius, $farenheit;
Véase:
El módulo Regexp::Common
provee un extenso número
de expresiones regulares que son accesibles vía el hash %RE
.
sigue un ejemplo de uso:
casiano@millo:~/Lperltesting$ cat -n regexpcommonsynopsis.pl 1 use strict; 2 use Perl6::Say; 3 use Regexp::Common; 4 5 while (<>) { 6 say q{a number} if /$RE{num}{real}/; 7 8 say q{a ['"`] quoted string} if /$RE{quoted}/; 9 10 say q{a /.../ sequence} if m{$RE{delimited}{'-delim'=>'/'}}; 11 12 say q{balanced parentheses} if /$RE{balanced}{'-parens'=>'()'}/; 13 14 die q{a #*@%-ing word}."\n" if /$RE{profanity}/; 15 16 } 17Sigue un ejemplo de ejecución:
casiano@millo:~/Lperltesting$ perl regexpcommonsynopsis.pl 43 a number "2+2 es" 4 a number a ['"`] quoted string x/y/z a /.../ sequence (2*(4+5/(3-2))) a number balanced parentheses fuck you! a #*@%-ing word
El siguiente fragmento de la documentación de Regexp::Common explica el modo simplificado de uso:
To access a particular pattern, %RE
is treated as a hierarchical hash of
hashes (of hashes...), with each successive key being an identifier. For
example, to access the pattern that matches real numbers, you specify:
$RE{num}{real}
and to access the pattern that matches integers:
$RE{num}{int}
Deeper layers of the hash are used to specify flags: arguments that modify the resulting pattern in some way.
For example, to access the
pattern that matches base-2 real numbers with embedded commas separating
groups of three digits (e.g. 10,101,110.110101101
):
$RE{num}{real}{-base => 2}{-sep => ','}{-group => 3}
Through the magic of Perl, these flag layers may be specified in any order (and even interspersed through the identifier keys!) so you could get the same pattern with:
$RE{num}{real}{-sep => ','}{-group => 3}{-base => 2}
or:
$RE{num}{-base => 2}{real}{-group => 3}{-sep => ','}
or even:
$RE{-base => 2}{-group => 3}{-sep => ','}{num}{real}
etc.
Note, however, that the relative order of amongst the identifier keys is significant. That is:
$RE{list}{set}
would not be the same as:
$RE{set}{list}
Veamos un ejemplo con el depurador:
casiano@millo:~/Lperltesting$ perl -MRegexp::Common -wde 0 main::(-e:1): 0 DB<1> x 'numero: 10,101,110.110101101 101.1e-1 234' =~ m{($RE{num}{real}{-base => 2}{-sep => ','}{-group => 3})}g 0 '10,101,110.110101101' 1 '101.1e-1'
La expresión regular para un número real es relativamente compleja:
casiano@millo:~/src/perl/perltesting$ perl5.10.1 -wd c2f_5_10v3.pl main::(c2f_5_10v3.pl:5): print "Enter a temperature (i.e. 32F, 100C):\n"; DB<1> p $RE{num}{real} (?:(?i)(?:[+-]?)(?:(?=[0123456789]|[.])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))|))
Si se usa la opción -keep
el patrón proveído usa paréntesis con memoria:
casiano@millo:~/Lperltesting$ perl -MRegexp::Common -wde 0 main::(-e:1): 0 DB<2> x 'one, two, three, four, five' =~ /$RE{list}{-pat => '\w+'}/ 0 1 DB<3> x 'one, two, three, four, five' =~ /$RE{list}{-pat => '\w+'}{-keep}/ 0 'one, two, three, four, five' 1 ', '
Perl 5.10 introduce el operador de smart matching. El siguiente texto es tomado casi verbatim del site de la compañía Perl Training Australia3.2:
Perl 5.10 introduces a new-operator, called smart-match, written ~~
. As
the name suggests, smart-match tries to compare its arguments in an
intelligent fashion. Using smart-match effectively allows many complex
operations to be reduces to very simple statements.
Unlike many of the other features introduced in Perl 5.10, there's no need to use the feature pragma to enable smart-match, as long as you're using 5.10 it's available.
The smart-match operator is always commutative. That means that$x ~~ $y
works the same way as$y ~~ $x
. You'll never have to remember which order to place to your operands with smart-match. Smart-match in action.
As a simple introduction, we can use smart-match to do a simple string comparison between simple scalars. For example:
use feature qw(say); my $x = "foo"; my $y = "bar"; my $z = "foo"; say '$x and $y are identical strings' if $x ~~ $y; say '$x and $z are identical strings' if $x ~~ $z; # Printed
If one of our arguments is a number, then a numeric comparison is performed:
my $num = 100; my $input = <STDIN>; say 'You entered 100' if $num ~~ $input;
This will print our message if our user enters 100, 100.00, +100, 1e2, or any other string that looks like the number 100.
We can also smart-match against a regexp:
my $input = <STDIN>; say 'You said the secret word!' if $input ~~ /xyzzy/;
Smart-matching with a regexp also works with saved regexps created with qr.
So we can use smart-match to act like eq,==
and=~
, so what? Well, it does much more than that.
We can use smart-match to search a list:
casiano@millo:~/Lperltesting$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> @friends = qw(Frodo Meriadoc Pippin Samwise Gandalf) DB<2> print "You're a friend" if 'Pippin' ~~ @friends You're a friend DB<3> print "You're a friend" if 'Mordok' ~~ @friends
It's important to note that searching an array with smart-match is extremely fast. It's faster than using grep, it's faster than usingfirst
from Scalar::Util, and it's faster than walking through the loop withforeach
, even if you do know all the clever optimisations.
Esta es la forma típica de buscar un elemento en un array en versiones anteriores a la 5.10:
casiano@millo:~$ perl -wde 0 main::(-e:1): 0 DB<1> use List::Util qw{first} DB<2> @friends = qw(Frodo Meriadoc Pippin Samwise Gandalf) DB<3> x first { $_ eq 'Pippin'} @friends 0 'Pippin' DB<4> x first { $_ eq 'Mordok'} @friends 0 undef
We can also use smart-match to compare arrays:
DB<4> @foo = qw(x y z xyzzy ninja) DB<5> @bar = qw(x y z xyzzy ninja) DB<7> print "Identical arrays" if @foo ~~ @bar Identical arrays DB<8> @bar = qw(x y z xyzzy nOnjA) DB<9> print "Identical arrays" if @foo ~~ @bar DB<10>
And even search inside an array using a string:
DB<11> x @foo = qw(x y z xyzzy ninja) 0 'x' 1 'y' 2 'z' 3 'xyzzy' 4 'ninja' DB<12> print "Array contains a ninja " if @foo ~~ 'ninja'
or using a regexp:
DB<13> print "Array contains magic pattern" if @foo ~~ /xyz/ Array contains magic pattern DB<14> print "Array contains magic pattern" if @foo ~~ /\d+/
Smart-match works with array references, too3.3:
DB<16> $array_ref = [ 1..10 ] DB<17> print "Array contains 10" if 10 ~~ $array_ref Array contains 10 DB<18> print "Array contains 10" if $array_ref ~~ 10 DB<19>
En el caso de un número y un array devuelve cierto si el escalar aparece en un array anidado:
casiano@millo:~/Lperltesting$ perl5.10.1 -E 'say "ok" if 42 ~~ [23, 17, [40..50], 70];' ok casiano@millo:~/Lperltesting$ perl5.10.1 -E 'say "ok" if 42 ~~ [23, 17, [50..60], 70];' casiano@millo:~/Lperltesting$
Of course, we can use smart-match with more than just arrays and scalars, it works with searching for the key in a hash, too!
DB<19> %colour = ( sky => 'blue', grass => 'green', apple => 'red',) DB<20> print "I know the colour" if 'grass' ~~ %colour I know the colour DB<21> print "I know the colour" if 'cloud' ~~ %colour DB<22> DB<23> print "A key starts with 'gr'" if %colour ~~ /^gr/ A key starts with 'gr' DB<24> print "A key starts with 'clou'" if %colour ~~ /^clou/ DB<25>
You can even use it to see if the two hashes have identical keys:
DB<26> print 'Hashes have identical keys' if %taste ~~ %colour; Hashes have identical keys
La conducta del operador de smart matching viene dada por la siguiente tabla tomada de la sección 'Smart-matching-in-detail' en perlsyn:
The behaviour of a smart match depends on what type of thing its arguments are. The behaviour is determined by the following table: the first row that applies determines the match behaviour (which is thus mostly determined by the type of the right operand). Note that the smart match implicitly dereferences any non-blessed hash or array ref, so the "Hash" and "Array" entries apply in those cases. (For blessed references, the "Object" entries apply.)
Note that the "Matching Code" column is not always an exact rendition. For example, the smart match operator short-circuits whenever possible, but grep does not.
$a $b Type of Match Implied Matching Code ====== ===== ===================== ============= Any undef undefined !defined $a Any Object invokes ~~ overloading on $object, or dies Hash CodeRef sub truth for each key[1] !grep { !$b->($_) } keys %$a Array CodeRef sub truth for each elt[1] !grep { !$b->($_) } @$a Any CodeRef scalar sub truth $b->($a) Hash Hash hash keys identical (every key is found in both hashes) Array Hash hash slice existence grep { exists $b->{$_} } @$a Regex Hash hash key grep grep /$a/, keys %$b undef Hash always false (undef can't be a key) Any Hash hash entry existence exists $b->{$a} Hash Array hash slice existence grep { exists $a->{$_} } @$b Array Array arrays are comparable[2] Regex Array array grep grep /$a/, @$b undef Array array contains undef grep !defined, @$b Any Array match against an array element[3] grep $a ~~ $_, @$b Hash Regex hash key grep grep /$b/, keys %$a Array Regex array grep grep /$b/, @$a Any Regex pattern match $a =~ /$b/ Object Any invokes ~~ overloading on $object, or falls back: Any Num numeric equality $a == $b Num numish[4] numeric equality $a == $b undef Any undefined !defined($b) Any Any string equality $a eq $b
1 pl@nereida:~/Lperltesting$ cat twonumbers.pl 2 $_ = "I have 2 numbers: 53147"; 3 @pats = qw{ 4 (.*)(\d*) 5 (.*)(\d+) 6 (.*?)(\d*) 7 (.*?)(\d+) 8 (.*)(\d+)$ 9 (.*?)(\d+)$ 10 (.*)\b(\d+)$ 11 (.*\D)(\d+)$ 12 }; 13 14 print "$_\n"; 15 for $pat (@pats) { 16 printf "%-12s ", $pat; 17 <>; 18 if ( /$pat/ ) { 19 print "<$1> <$2>\n"; 20 } else { 21 print "FAIL\n"; 22 } 23 }