Tablas de Escapes, Metacarácteres, Cuantificadores, Clases

Sig: Variables especiales después de Sup: Introducción Ant: Depuración de Expresiones Regulares Err: Si hallas una errata ...

Subsecciones

The POSIX character class syntax

Tablas de Escapes, Metacarácteres, Cuantificadores, Clases

Sigue una sección de tablas con notaciones tomada de perlre:

Metacharacters

The following metacharacters have their standard egrep-ish meanings:

   1. \ Quote the next metacharacter
   2. ^ Match the beginning of the line
   3. . Match any character (except newline)
   4. $ Match the end of the line (or before newline at the end)
   5. | Alternation
   6. () Grouping
   7. [] Character class

Standard greedy quantifiers

The following standard greedy quantifiers are recognized:

   1. * Match 0 or more times
   2. + Match 1 or more times
   3. ? Match 1 or 0 times
   4. {n} Match exactly n times
   5. {n,} Match at least n times
   6. {n,m} Match at least n but not more than m times

Non greedy quantifiers

The following non greedy quantifiers are recognized:

   1. *? Match 0 or more times, not greedily
   2. +? Match 1 or more times, not greedily
   3. ?? Match 0 or 1 time, not greedily
   4. {n}? Match exactly n times, not greedily
   5. {n,}? Match at least n times, not greedily
   6. {n,m}? Match at least n but not more than m times, not greedily

Possesive quantifiers

The following possesive quantifiers are recognized:

   1. *+ Match 0 or more times and give nothing back
   2. ++ Match 1 or more times and give nothing back
   3. ?+ Match 0 or 1 time and give nothing back
   4. {n}+ Match exactly n times and give nothing back (redundant)
   5. {n,}+ Match at least n times and give nothing back
   6. {n,m}+ Match at least n but not more than m times and give nothing back

Escape sequences

   1. \t tab (HT, TAB)
   2. \n newline (LF, NL)
   3. \r return (CR)
   4. \f form feed (FF)
   5. \a alarm (bell) (BEL)
   6. \e escape (think troff) (ESC)
   7. \033 octal char (example: ESC)
   8. \x1B hex char (example: ESC)
   9. \x{263a} long hex char (example: Unicode SMILEY)
  10. \cK control char (example: VT)
  11. \N{name} named Unicode character
  12. \l lowercase next char (think vi)
  13. \u uppercase next char (think vi)
  14. \L lowercase till \E (think vi)
  15. \U uppercase till \E (think vi)
  16. \E end case modification (think vi)
  17. \Q quote (disable) pattern metacharacters till \E

Ejercicio 3.1.5 Explique la salida:

casiano@tonga:~$ perl -wde 0
main::(-e:1):   0
  DB<1> $x = '([a-z]+)'
  DB<2> x 'hola' =~ /$x/
0  'hola'
  DB<3> x 'hola' =~ /\Q$x/
  empty array
  DB<4> x '([a-z]+)' =~ /\Q$x/
0  1

Character Classes and other Special Escapes

   1. \w Match a "word" character (alphanumeric plus "_")
   2. \W Match a non-"word" character
   3. \s Match a whitespace character
   4. \S Match a non-whitespace character
   5. \d Match a digit character
   6. \D Match a non-digit character
   7. \pP Match P, named property. Use \p{Prop} for longer names.
   8. \PP Match non-P
   9. \X Match eXtended Unicode "combining character sequence",
  10.    equivalent to (?>\PM\pM*)
  11. \C Match a single C char (octet) even under Unicode.
  12.    NOTE: breaks up characters into their UTF-8 bytes,
  13.    so you may end up with malformed pieces of UTF-8.
  14.    Unsupported in lookbehind.
  15. \1 Backreference to a specific group.
  16. '1' may actually be any positive integer.
  17. \g1 Backreference to a specific or previous group,
  18. \g{-1} number may be negative indicating a previous buffer and may
  19.        optionally be wrapped in curly brackets for safer parsing.
  20. \g{name} Named backreference
  21. \k<name> Named backreference
  22. \K Keep the stuff left of the \K, don't include it in $&
  23. \v Vertical whitespace
  24. \V Not vertical whitespace
  25. \h Horizontal whitespace
  26. \H Not horizontal whitespace
  27. \R Linebreak

Zero width assertions

Perl defines the following zero-width assertions:

   1. \b Match a word boundary
   2. \B Match except at a word boundary
   3. \A Match only at beginning of string
   4. \Z Match only at end of string, or before newline at the end
   5. \z Match only at end of string
   6. \G Match only at pos() (e.g. at the end-of-match position
   7. of prior m//g)

The POSIX character class syntax

The POSIX character class syntax:

   1. [:class:]

is also available. Note that the [ and ] brackets are literal; they must always be used within a character class expression.

   1. # this is correct:
   2. $string =~ /[[:alpha:]]/;
   3.
   4. # this is not, and will generate a warning:
   5. $string =~ /[:alpha:]/;

Available classes

The available classes and their backslash equivalents (if available) are as follows:

   1. alpha
   2. alnum
   3. ascii
   4. blank
   5. cntrl
   6. digit \d
   7. graph
   8. lower
   9. print
  10. punct
  11. space \s 
  12. upper
  13. word \w 
  14. xdigit

For example use [:upper:] to match all the uppercase characters. Note that the [] are part of the [::] construct, not part of the whole character class. For example:

   1. [01[:alpha:]%]

matches zero, one, any alphabetic character, and the percent sign.

Equivalences to Unicode

The following equivalences to Unicode \p{} constructs and equivalent backslash character classes (if available), will hold:

   1. [[:...:]] \p{...} backslash
   2.
   3. alpha IsAlpha
   4. alnum IsAlnum
   5. ascii IsASCII
   6. blank
   7. cntrl IsCntrl
   8. digit IsDigit \d
   9. graph IsGraph
  10. lower IsLower
  11. print IsPrint 
  12. punct IsPunct 
  13. space IsSpace
  14. IsSpacePerl \s
  15. upper IsUpper
  16. word IsWord \w
  17. xdigit IsXDigit

Negated character classes

You can negate the [::] character classes by prefixing the class name with a '^'. This is a Perl extension. For example:

   1. POSIX traditional Unicode
   2.
   3. [[:^digit:]] \D \P{IsDigit}
   4. [[:^space:]] \S \P{IsSpace}
   5. [[:^word:]] \W \P{IsWord}

Sig: Variables especiales después de Sup: Introducción Ant: Depuración de Expresiones Regulares Err: Si hallas una errata ...

Casiano Rodríguez León
2012-05-22