Perl Regular Expressions

    Metacharacters
  char meaning  ^      beginning of string   $      end of string   .      any character except newline   *      match 0 or more times   +      match 1 or more times   ?      match 0 or 1 times; or: shortest match   |      alternative   ( )    grouping; "storing"   [ ]    set of characters   { }    repetition modifier         quote or special

  Repetition  a*     zero or more a's   a+     one or more a's   a?     zero or one a's (i.e., optional a)   a{m}   exactly m a's   a{m,}  at least m a's   a{m,n} at least m but at most n a's repetition?       t     tab   n     newline   r     return (CR)   xhh   character with hex. code hh   b     "word" boundary   B     not a "word" boundary       w     matches any single character classified as a          "word" character (alphanumeric or _)   W     matches any non-"word" character   s     matches any whitespace character (space, tab, newline)   S     matches any non-whitespace character    d     matches any digit character, equiv. to [0-9]   D     matches any non-digit character       [characters] matches any of the characters in the sequence    [x-y]        matches any of the characters from x to y                (inclusively) in the ASCII code    [-]         matches the hyphen character -   [n]         matches the newline; other single character                denotations with  apply normally, too

Examples

How do I extract everything between a the words “start” and “end”?

$mystring = “The start text always precedes the end of the end text.”;

if($mystring =~ m/start(.*)end/) {

print $1;

}

How do I extract a complete number, like the year?

$mystring = “[2004/04/13] The date of this article.”;

if($mystring =~ m/(d+)/) {

print “The first number is $1.”;

}

# find word that is bolded

# returns: $1 = ‘text’

$line = “This is some text with HTML and “;

$line =~ m/(.*)/i;

If you have found my website useful, please consider buying me a coffee below 😉