Chapter 7 In the World of Regular Expressions
http://perldoc.perl.org/perlre.html
Using Simple Patterns
Unicode Properties
http://perldoc.perl.org/perluniprops.html
if (/\p{Space}/) { # 26 different possible characters print "The string has some whitespace.\n"; } if (/\p{Digit}/) { # 411 different possible characters print "The string has a digit.\n"; } if (/\P{Space}/) { # Not space (many many characters!) print "The string has one or more non-whitespace characters.\n"; }
Back Reference
You denote a back reference as a backslash followed by a number, like \1, \2, and so on.Refers to the capture of an already completed pattern match, like $1, $2, and so on,
$_ = "abba"; if (/(.)\1/) { # matches 'bb' print "It matched same character next to itself!\n"; } $_ = "Hello there, neighbor"; if (/(\S+) (\S+), (\S+)/) { print "words were $1 $2 $3\n"; }
Perl 5.10 introduced a new way to denote back references. Instead of using the back-slash and a number, you can use \g{N}, where N is the number of the back reference that you want to use.
$_ = "aa11bb"; if (/(.)\g{1}11/) { print "It matched!\n"; }
you can specify a relative back reference. You can rewrite the last example to use –1 as the number to do the same thing:
$_ = "aa11bb"; if (/(.)\g{–1}11/) { print "It matched!\n"; }
Chapter 8 Matching with Regular Expressions
Match Modifiers
Case-Insensitive Matching with /i
Matching Any Character with /s
If you might have newlines in your strings, and you want the dot to be able to match them, the /s modifier will do the job.Adding Whitespace with /x
allows you to add arbitrary whitespace to a pattern, in order to make it easier to read/-?[0-9]+\.?[0-9]*/ # what is this doing? / -? [0-9]+ \.? [0-9]* /x # a little better
The Match Variables
Noncapturing Parentheses
To skip a match variable, you use (?:PATTERN)if (/(?:bronto)?saurus (steak|burger)/) { print "Fred wants a $1\n"; }
Named Captures
To label a match variable, you use (?my $names = 'Fred or Barney'; if ( $names =~ m/(?<name1>\w+) (?:and|or) (?<name2>\w+)/ ) { say "I saw $+{name1} and $+{name2}"; } my $names = 'Fred Flintstone and Wilma Flintstone'; if ( $names =~ m/(?<last_name>\w+) and \w+ \g{last_name}/ ) { say "I saw $+{last_name}"; }
The Automatic Match Variables
$& : entire matched section$` : holds whatever the regular expression engine had to skip over before it found the match
$' : has the remainder of the string that the pattern never got to
if ("Hello there, neighbor" =~ /\s(\w+),/) { print "That was ($`)($&)($').\n"; # show (Hello)( there,)( neighbor) }
Instead of $`, $&, or $', you use ${^PREMATCH}, ${^MATCH}, or ${^POSTMATCH}
General Quantifiers
A comma-separated pair of numbers inside curly braces ({}) to specify exactly how few and how many repetitions you want.So the pattern /a{5,15}/ will match from five to fifteen repetitions of the letter a
So, /(fred){3,}/ will match if there are three or more instances of fred
Precedence
Regular expression feature | Example |
Parentheses (grouping or capturing) | (...), (?:...), (? |
Quantifiers | a*, a+, a?, a{n,m} |
Anchors and sequence | abc, ^, $, \A, \b, \z, \Z |
Alternation | a|b|c |
Atoms | a, [abc], \d, \1, \g{2} |
Chapter 9 Processing Text with Regular Expressions
Substitutions with s///
Global Replacements with /g
Different Delimiters
These are acceptable.
s#^https://#http://#; s{fred}{barney}; s[fred](barney); s<fred>#barney#;
Case Shifting
\U: forces what follows to all uppercase
\L: forces what follows to all uppercase
\u: uppercase next character
\l: lowercase next character
\u\L: all lowercase but captialize the first letter
\l\U: all Uppercase but lower the first letter
\E: turn off case shifting
The split Operator
The join Function
m// in List Context
my $text = "Fred dropped a 5 ton granite block on Mr. Slate"; my @words = ($text =~ /([a-z]+)/ig); print "Result: @words\n"; # Result: Fred dropped a ton granite block on Mr Slate
Nongreedy Quantifiers
+?
*?
??
$_ = "I thought you said Fred and <BOLD>Velma</BOLD>, not <BOLD>Wilma</BOLD>"; s#<BOLD>(.*?)</BOLD>#$1#g;
沒有留言:
張貼留言