Regular Expressions

Regular expressions are too huge of a topic to introduce here, but make sure that you understand these concepts. For tutorials, see perlrequick or perlretut. For the definitive documentation, see perlre.

Matches and replacements return a quantity.

The m// and s/// operators return the number of matches or replacements they made, respectively. You can either use the number directly, or check it for truth.

    if ( $str =~ /Diggle|Shelley/ ) {
        print "We found Pete or Steve!\n";
    }

    if ( my $n = ($str =~ s/this/that/g) ) {
        print qq{Replaced $n occurrence(s) of "this"\n};
    }

Don't use capture variables without checking that the match succeeded.

The capture variables, $1, etc, are not valid unless the match succeeded, and they're not cleared, either.

    # BAD: Not checked, but at least it "works".
    my $str = 'Perl 101 rocks.';
    $str =~ /(\d+)/;
    print "Number: $1"; # Prints "Number: 101";

    # WORSE: Not checked, and the result is not what you'd expect
    $str =~ /(Python|Ruby)/;
    print "Language: $1"; # Prints "Language: 101";

Instead, you must check the return value from the match:

    # GOOD: Check the results
    my $str = 'Perl 101 rocks.';
    if ( $str =~ /(\d+)/ ) {
        print "Number: $1"; # Prints "Number: 101";
    }

    if ( $str =~ /(Python|Ruby)/ ) {
        print "Language: $1"; # Never gets here
    }

XXX m// in list context gives a list of matches

Common match flags

/i - case insensitive match

/g - match multiple times

    $var = "match match match";

    while ($var =~ /match/g) { $a++; }
    print "$a\n"; # prints 3

    $a = 0;
    $a++ foreach ($var =~ /match/g);
    print "$a\n"; # prints 3

/m - ^ and $ change meaning
- Ordinarily, ^ means "start of string" and $, "end of string"
- /m makes them mean start and end of line, respectively
```
    $str = "one\ntwo\nthree";
    @a = $str =~ /^\w+/g;  # @a = ("one");
    @b = $str =~ /^\w+/gm; # @b = ("one","two","three")
```
- Use \A and \z for start and end of string regardless of /m
- \Z is the same as \z except it will ignore a final newline

/s - . also matches newline

    $str = "one\ntwo\nthree\n";
    $str =~ /^(.{8})/s;
    print $1; # prints "one\ntwo\n"

Capture variables `$1` and friends

Sets of capturing parentheses are stored in numeric variables

Parenthesis are assigned left to right:

    my $str = "abc";
    $str =~ /(((a)(b))(c))/;
    print "1: $1 2: $2 3: $3 4: $4 5: $5\n";
    # prints: 1: abc 2: ab 3: a 4: b 5: c

No upper limit on number of capturing parenthesis and variables

Avoid capture with `?:`

If a parenthesis is followed by ?:, the group will not be captured

Useful if you don't want the matches to be saved

    my $str = "abc";
    $str =~ /(?:a(b)c)/;
    print "$1\n"; # prints "b"

Allow easier reading with the `/x` switch

If you're doing something tricky with a regex, comment it.

You can do this with the /x flag.

This ugly behemoth

    my ($num) = $ARGV[0] =~ m/^\+?((?:(?<!\+)-)?(?:\d*.)?\d+)$/x;

is more readable with whitespace and comments, as allowed by the /x flag.

    my ($num) =
        $ARGV[0] =~ m/^ \+?        # An optional plus sign, to be discarded
                    (              # Capture...
                    (?:(?<!\+)-)? # a negative sign, if there's no plus behind it,
                    (?:\d*.)?     # an optional number, followed by a point if a decimal,
                    \d+           # then any number of numbers.
                    )$/x;

Whitespace and comments are stripped unless escaped.

Automatically quote your regexes with `\Q` and `\E`

Automatically escapes regex metacharacters

Won't escape dollar signs

    my $num = '3.1415';
    print "ok 1\n" if $num =~ /\Q3.14\E/;
    $num = '3X1415';
    print "ok 2\n" if $num =~ /\Q3.14\E/;
    print "ok 3\n" if $num =~ /3.14/;

prints

    ok 1
    ok 3

Execute code with `/e` flag to `s///`

Allows arbitrary code to replace a string in a regular expression

    my $str = "AbCdE\n";
    $str =~ s/(\w)/lc $1/eg;
    print $str; # prints "abcde"

Use $1 and friends if necessary

Know when to use `study`

study is not helpful in the vast majority of cases. All it does is make a table of where the first occurrence of each of 256 bytes is in the string. This means that if you have a 1,000-character string, and you search for lots of strings that begin with a constant character, then the matcher can jump right to it. For example:

"This is a very long [... 900 characters skipped...] string that I have here, ending at position 1000"

Now, if you are matching this against the regex /Icky/, the matcher will try to find the first letter "I" that matches. That may take scanning through the first 900+ characters until you get to it. But what study does is build a table of the 256 possible bytes and where they first appear, so that in this case, the scanner can jump right to that position and start matching.

Handle multi-line regexes

Use re => debug

    -Mre=debug

Want to contribute?

Submit a PR to github.com/petdance/perl101