The following metacharacters have their standard egrep-ish meanings:
1. \ Quote the next metacharacter 2. ^ Match the beginning of the line 3. . Match any character (except newline) 4. $ Match the end of the line (or before newline at the end) 5. | Alternation 6. () Grouping 7. [] Character class
The following standard greedy quantifiers are recognized:
1. * Match 0 or more times 2. + Match 1 or more times 3. ? Match 1 or 0 times 4. {n} Match exactly n times 5. {n,} Match at least n times 6. {n,m} Match at least n but not more than m times
The following non greedy quantifiers are recognized:
1. *? Match 0 or more times, not greedily 2. +? Match 1 or more times, not greedily 3. ?? Match 0 or 1 time, not greedily 4. {n}? Match exactly n times, not greedily 5. {n,}? Match at least n times, not greedily 6. {n,m}? Match at least n but not more than m times, not greedily
The following possesive quantifiers are recognized:
1. *+ Match 0 or more times and give nothing back 2. ++ Match 1 or more times and give nothing back 3. ?+ Match 0 or 1 time and give nothing back 4. {n}+ Match exactly n times and give nothing back (redundant) 5. {n,}+ Match at least n times and give nothing back 6. {n,m}+ Match at least n but not more than m times and give nothing back
1. \t tab (HT, TAB) 2. \n newline (LF, NL) 3. \r return (CR) 4. \f form feed (FF) 5. \a alarm (bell) (BEL) 6. \e escape (think troff) (ESC) 7. \033 octal char (example: ESC) 8. \x1B hex char (example: ESC) 9. \x{263a} long hex char (example: Unicode SMILEY) 10. \cK control char (example: VT) 11. \N{name} named Unicode character 12. \l lowercase next char (think vi) 13. \u uppercase next char (think vi) 14. \L lowercase till \E (think vi) 15. \U uppercase till \E (think vi) 16. \E end case modification (think vi) 17. \Q quote (disable) pattern metacharacters till \E
casiano@tonga:~$ perl -wde 0 main::(-e:1): 0 DB<1> $x = '([a-z]+)' DB<2> x 'hola' =~ /$x/ 0 'hola' DB<3> x 'hola' =~ /\Q$x/ empty array DB<4> x '([a-z]+)' =~ /\Q$x/ 0 1
1. \w Match a "word" character (alphanumeric plus "_") 2. \W Match a non-"word" character 3. \s Match a whitespace character 4. \S Match a non-whitespace character 5. \d Match a digit character 6. \D Match a non-digit character 7. \pP Match P, named property. Use \p{Prop} for longer names. 8. \PP Match non-P 9. \X Match eXtended Unicode "combining character sequence", 10. equivalent to (?>\PM\pM*) 11. \C Match a single C char (octet) even under Unicode. 12. NOTE: breaks up characters into their UTF-8 bytes, 13. so you may end up with malformed pieces of UTF-8. 14. Unsupported in lookbehind. 15. \1 Backreference to a specific group. 16. '1' may actually be any positive integer. 17. \g1 Backreference to a specific or previous group, 18. \g{-1} number may be negative indicating a previous buffer and may 19. optionally be wrapped in curly brackets for safer parsing. 20. \g{name} Named backreference 21. \k<name> Named backreference 22. \K Keep the stuff left of the \K, don't include it in $& 23. \v Vertical whitespace 24. \V Not vertical whitespace 25. \h Horizontal whitespace 26. \H Not horizontal whitespace 27. \R Linebreak
Perl defines the following zero-width assertions:
1. \b Match a word boundary 2. \B Match except at a word boundary 3. \A Match only at beginning of string 4. \Z Match only at end of string, or before newline at the end 5. \z Match only at end of string 6. \G Match only at pos() (e.g. at the end-of-match position 7. of prior m//g)
The POSIX character class syntax:
1. [:class:]
is also available. Note that the [
and ]
brackets are literal;
they must always be used within a character class expression.
1. # this is correct: 2. $string =~ /[[:alpha:]]/; 3. 4. # this is not, and will generate a warning: 5. $string =~ /[:alpha:]/;
The available classes and their backslash equivalents (if available) are as follows:
1. alpha 2. alnum 3. ascii 4. blank 5. cntrl 6. digit \d 7. graph 8. lower 9. print 10. punct 11. space \s 12. upper 13. word \w 14. xdigit
For example use [:upper:]
to match all the uppercase characters.
Note that the []
are part of the [::]
construct, not part of the whole character class. For example:
1. [01[:alpha:]%]
matches zero, one, any alphabetic character, and the percent sign.
The following equivalences to Unicode
\p{}
constructs and equivalent backslash
character classes (if available), will hold:
1. [[:...:]] \p{...} backslash 2. 3. alpha IsAlpha 4. alnum IsAlnum 5. ascii IsASCII 6. blank 7. cntrl IsCntrl 8. digit IsDigit \d 9. graph IsGraph 10. lower IsLower 11. print IsPrint 12. punct IsPunct 13. space IsSpace 14. IsSpacePerl \s 15. upper IsUpper 16. word IsWord \w 17. xdigit IsXDigit
You can negate the [::]
character classes by prefixing
the class name with a '^'
. This is a Perl extension. For example:
1. POSIX traditional Unicode 2. 3. [[:^digit:]] \D \P{IsDigit} 4. [[:^space:]] \S \P{IsSpace} 5. [[:^word:]] \W \P{IsWord}