2013年5月2日 星期四

Perl 學習手冊第六版筆記(三)

Chapter 7 In the World of Regular Expressions


Using Simple Patterns

Unicode Properties


if (/\p{Space}/) { # 26 different possible characters
    print "The string has some whitespace.\n";

if (/\p{Digit}/) { # 411 different possible characters
    print "The string has a digit.\n";

if (/\P{Space}/) { # Not space (many many characters!)
    print "The string has one or more non-whitespace characters.\n";

Back Reference

You denote a back reference as a backslash followed by a number, like \1, \2, and so on.
Refers to the capture of an already completed pattern match, like $1, $2, and so on,

$_ = "abba";
if (/(.)\1/) { # matches 'bb'
    print "It matched same character next to itself!\n";

$_ = "Hello there, neighbor";
if (/(\S+) (\S+), (\S+)/) {
print "words were $1 $2 $3\n";

Perl 5.10 introduced a new way to denote back references. Instead of using the back-slash and a number, you can use \g{N}, where N is the number of the back reference that you want to use.
$_ = "aa11bb";
if (/(.)\g{1}11/) {
    print "It matched!\n";

you can specify a relative back reference. You can rewrite the last example to use –1 as the number to do the same thing:

$_ = "aa11bb";
if (/(.)\g{–1}11/) {
    print "It matched!\n";

Chapter 8 Matching with Regular Expressions

Match Modifiers

Case-Insensitive Matching with /i

Matching Any Character with /s

If you might have newlines in your strings, and you want the dot to be able to match them, the /s modifier will do the job.

Adding Whitespace with /x

allows you to add arbitrary whitespace to a pattern, in order to make it easier to read

/-?[0-9]+\.?[0-9]*/               # what is this doing?
/ -? [0-9]+ \.? [0-9]* /x         # a little better

The Match Variables

Noncapturing Parentheses

To skip a match variable, you use (?:PATTERN)
if (/(?:bronto)?saurus (steak|burger)/) {
    print "Fred wants a $1\n";

Named Captures

To label a match variable, you use (?
my $names = 'Fred or Barney';
  if ( $names =~ m/(?<name1>\w+) (?:and|or) (?<name2>\w+)/ ) {
  say "I saw $+{name1} and $+{name2}";

my $names = 'Fred Flintstone and Wilma Flintstone';
  if ( $names =~ m/(?<last_name>\w+) and \w+ \g{last_name}/ ) {
  say "I saw $+{last_name}";

The Automatic Match Variables

$& : entire matched section
$` : holds whatever the regular expression engine had to skip over before it found the match
$' : has the remainder of the string that the pattern never got to

if ("Hello there, neighbor" =~ /\s(\w+),/) {
    print "That was ($`)($&)($').\n";       # show (Hello)( there,)( neighbor)

Instead of $`, $&, or $', you use ${^PREMATCH}, ${^MATCH}, or ${^POSTMATCH}

General Quantifiers

A comma-separated pair of numbers inside curly braces ({}) to specify exactly how few and how many repetitions you want.
So the pattern /a{5,15}/ will match from five to fifteen repetitions of the letter a
So, /(fred){3,}/ will match if there are three or more instances of fred


Regular expression feature
Parentheses (grouping or capturing)
(...), (?:...), (?
a*, a+, a?, a{n,m}
Anchors and sequence
abc, ^, $, \A, \b, \z, \Z
a, [abc], \d, \1, \g{2}

Chapter 9 Processing Text with Regular Expressions

Substitutions with s///

Global Replacements with /g

Different Delimiters

These are acceptable.

Case Shifting

\U: forces what follows to all uppercase
\L: forces what follows to all uppercase
\u: uppercase next character
\l: lowercase next character
\u\L: all lowercase but captialize the first letter
\l\U: all Uppercase but lower the first letter
\E: turn off case shifting

The split Operator

The join Function

m// in List Context

my $text = "Fred dropped a 5 ton granite block on Mr. Slate";
my @words = ($text =~ /([a-z]+)/ig);
print "Result: @words\n";
# Result: Fred dropped a ton granite block on Mr Slate

Nongreedy Quantifiers


$_ = "I thought you said Fred and <BOLD>Velma</BOLD>, not <BOLD>Wilma</BOLD>";

