Topics

Regular Expressions

Regular expressions are too huge of a topic to introduce here, but make sure that you understand these concepts. For tutorials, see perlrequick or perlretut. For the definitive documentation, see perlre.

Matches and replacements return a quantity.

The m// and s/// operators return the number of matches or replacements they made, respectively. You can either use the number directly, or check it for truth.

    if ( $str =~ /Diggle|Shelley/ ) {
        print "We found Pete or Steve!\n";
    }

    if ( my $n = ($str =~ s/this/that/g) ) {
        print qq{Replaced $n occurrence(s) of "this"\n};
    }

Don't use capture variables without checking that the match succeeded.

The capture variables, $1, etc, are not valid unless the match succeeded, and they're not cleared, either.

    # BAD: Not checked, but at least it "works".
    my $str = 'Perl 101 rocks.';
    $str =~ /(\d+)/;
    print "Number: $1"; # Prints "Number: 101";

    # WORSE: Not checked, and the result is not what you'd expect
    $str =~ /(Python|Ruby)/;
    print "Language: $1"; # Prints "Language: 101";

Instead, you must check the return value from the match:

    # GOOD: Check the results
    my $str = 'Perl 101 rocks.';
    if ( $str =~ /(\d+)/ ) {
        print "Number: $1"; # Prints "Number: 101";
    }

    if ( $str =~ /(Python|Ruby)/ ) {
        print "Language: $1"; # Never gets here
    }

XXX m// in list context gives a list of matches

Common match flags

  • /i - case insensitive match
  • /g - match multiple times
        $var = "match match match";
    
        while ($var =~ /match/g) { $a++; }
        print "$a\n"; # prints 3
    
        $a = 0;
        $a++ foreach ($var =~ /match/g);
        print "$a\n"; # prints 3
  • /m - ^ and $ change meaning
    • Ordinarily, ^ means "start of string" and $, "end of string"
    • /m makes them mean start and end of line, respectively
          $str = "one\ntwo\nthree";
          @a = $str =~ /^\w+/g;  # @a = ("one");
          @b = $str =~ /^\w+/gm; # @b = ("one","two","three")
    • Use \A and \z for start and end of string regardless of /m
    • \Z is the same as \z except it will ignore a final newline
  • /s - . also matches newline
        $str = "one\ntwo\nthree\n";
        $str =~ /^(.{8})/s;
        print $1; # prints "one\ntwo\n"

Capture variables $1 and friends

  • Sets of capturing parentheses are stored in numeric variables
  • Parenthesis are assigned left to right:
        my $str = "abc";
        $str =~ /(((a)(b))(c))/;
        print "1: $1 2: $2 3: $3 4: $4 5: $5\n";
        # prints: 1: abc 2: ab 3: a 4: b 5: c
  • No upper limit on number of capturing parenthesis and variables

Avoid capture with ?:

  • If a parenthesis is followed by ?:, the group will not be captured
  • Useful if you don't want the matches to be saved
        my $str = "abc";
        $str =~ /(?:a(b)c)/;
        print "$1\n"; # prints "b"

Allow easier reading with the /x switch

  • If you're doing something tricky with a regex, comment it.
  • You can do this with the /x flag.

    This ugly behemoth

        my ($num) = $ARGV[0] =~ m/^\+?((?:(?<!\+)-)?(?:\d*.)?\d+)$/x;

    is more readable with whitespace and comments, as allowed by the /x flag.

        my ($num) =
            $ARGV[0] =~ m/^ \+?        # An optional plus sign, to be discarded
                        (              # Capture...
                        (?:(?<!\+)-)? # a negative sign, if there's no plus behind it,
                        (?:\d*.)?     # an optional number, followed by a point if a decimal,
                        \d+           # then any number of numbers.
                        )$/x;
  • Whitespace and comments are stripped unless escaped.

Automatically quote your regexes with \Q and \E

  • Automatically escapes regex metacharacters
  • Won't escape dollar signs
        my $num = '3.1415';
        print "ok 1\n" if $num =~ /\Q3.14\E/;
        $num = '3X1415';
        print "ok 2\n" if $num =~ /\Q3.14\E/;
        print "ok 3\n" if $num =~ /3.14/;

    prints

        ok 1
        ok 3

Execute code with /e flag to s///

  • Allows arbitrary code to replace a string in a regular expression
        my $str = "AbCdE\n";
        $str =~ s/(\w)/lc $1/eg;
        print $str; # prints "abcde"
  • Use $1 and friends if necessary

Know when to use study

study is not helpful in the vast majority of cases. All it does is make a table of where the first occurrence of each of 256 bytes is in the string. This means that if you have a 1,000-character string, and you search for lots of strings that begin with a constant character, then the matcher can jump right to it. For example:

"This is a very long [... 900 characters skipped...] string that I have here, ending at position 1000"

Now, if you are matching this against the regex /Icky/, the matcher will try to find the first letter "I" that matches. That may take scanning through the first 900+ characters until you get to it. But what study does is build a table of the 256 possible bytes and where they first appear, so that in this case, the scanner can jump right to that position and start matching.

Handle multi-line regexes

Use re => debug

    -Mre=debug

We want your feedback

If we can improve perl101.org in any way, please let us know with this form.

Your name
Your email
Your comment