Regular expressions are too huge of a topic to introduce here, but make sure that you understand these concepts. For tutorials, see perlrequick or perlretut. For the definitive documentation, see perlre.
The m//
and s///
operators return the number of matches or replacements they made,
respectively.
You can either use the number directly,
or check it for truth.
if ( $str =~ /Diggle|Shelley/ ) { print "We found Pete or Steve!\n"; } if ( my $n = ($str =~ s/this/that/g) ) { print qq{Replaced $n occurrence(s) of "this"\n}; }
The capture variables, $1
, etc, are not valid unless the match succeeded, and they're not cleared, either.
# BAD: Not checked, but at least it "works". my $str = 'Perl 101 rocks.'; $str =~ /(\d+)/; print "Number: $1"; # Prints "Number: 101"; # WORSE: Not checked, and the result is not what you'd expect $str =~ /(Python|Ruby)/; print "Language: $1"; # Prints "Language: 101";
Instead, you must check the return value from the match:
# GOOD: Check the results my $str = 'Perl 101 rocks.'; if ( $str =~ /(\d+)/ ) { print "Number: $1"; # Prints "Number: 101"; } if ( $str =~ /(Python|Ruby)/ ) { print "Language: $1"; # Never gets here }
/i
- case insensitive match/g
- match multiple times
$var = "match match match"; while ($var =~ /match/g) { $a++; } print "$a\n"; # prints 3 $a = 0; $a++ foreach ($var =~ /match/g); print "$a\n"; # prints 3
/m
- ^
and $
change meaning
^
means "start of string" and $
, "end of string"/m
makes them mean start and end of line, respectively
$str = "one\ntwo\nthree"; @a = $str =~ /^\w+/g; # @a = ("one"); @b = $str =~ /^\w+/gm; # @b = ("one","two","three")
\A
and \z
for start and end of string regardless of /m
\Z
is the same as \z
except it will ignore a final newline/s
- .
also matches newline
$str = "one\ntwo\nthree\n"; $str =~ /^(.{8})/s; print $1; # prints "one\ntwo\n"
$1
and friendsmy $str = "abc"; $str =~ /(((a)(b))(c))/; print "1: $1 2: $2 3: $3 4: $4 5: $5\n"; # prints: 1: abc 2: ab 3: a 4: b 5: c
?:
?:
, the group will not be capturedmy $str = "abc"; $str =~ /(?:a(b)c)/; print "$1\n"; # prints "b"
/x
switch/x
flag.
This ugly behemoth
my ($num) = $ARGV[0] =~ m/^\+?((?:(?<!\+)-)?(?:\d*.)?\d+)$/x;
is more readable with whitespace and comments, as allowed by the /x
flag.
my ($num) = $ARGV[0] =~ m/^ \+? # An optional plus sign, to be discarded ( # Capture... (?:(?<!\+)-)? # a negative sign, if there's no plus behind it, (?:\d*.)? # an optional number, followed by a point if a decimal, \d+ # then any number of numbers. )$/x;
\Q
and \E
my $num = '3.1415'; print "ok 1\n" if $num =~ /\Q3.14\E/; $num = '3X1415'; print "ok 2\n" if $num =~ /\Q3.14\E/; print "ok 3\n" if $num =~ /3.14/;
prints
ok 1 ok 3
/e
flag to s///
my $str = "AbCdE\n"; $str =~ s/(\w)/lc $1/eg; print $str; # prints "abcde"
$1
and friends if necessarystudy
study is not helpful in the vast majority of cases. All it does is make a table of where the first occurrence of each of 256 bytes is in the string. This means that if you have a 1,000-character string, and you search for lots of strings that begin with a constant character, then the matcher can jump right to it. For example:
"This is a very long [... 900 characters skipped...] string that I have here, ending at position 1000"
Now, if you are matching this against the regex /Icky/, the matcher will try to find the first letter "I" that matches. That may take scanning through the first 900+ characters until you get to it. But what study does is build a table of the 256 possible bytes and where they first appear, so that in this case, the scanner can jump right to that position and start matching.
-Mre=debug
Submit a PR to github.com/petdance/perl101
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.