Perl Regular Expressions
Metacharacters
char meaning ^ beginning of string $ end of string . any character except newline * match 0 or more times + match 1 or more times ? match 0 or 1 times; or: shortest match | alternative ( ) grouping; "storing" [ ] set of characters { } repetition modifier quote or special
  Repetition  a*     zero or more a's   a+     one or more a's   a?     zero or one a's (i.e., optional a)   a{m}   exactly m a's   a{m,}  at least m a's   a{m,n} at least m but at most n a's repetition?       t     tab   n     newline   r     return (CR)   xhh   character with hex. code hh   b     "word" boundary   B     not a "word" boundary       w     matches any single character classified as a          "word" character (alphanumeric or _)   W     matches any non-"word" character   s     matches any whitespace character (space, tab, newline)   S     matches any non-whitespace character    d     matches any digit character, equiv. to [0-9]   D     matches any non-digit character       [characters] matches any of the characters in the sequence    [x-y]        matches any of the characters from x to y                (inclusively) in the ASCII code    [-]         matches the hyphen character -   [n]         matches the newline; other single character                denotations with  apply normally, too          Examples
How do I extract everything between a the words “start” and “end”?
$mystring = “The start text always precedes the end of the end text.”;
if($mystring =~ m/start(.*)end/) {
print $1;
}
How do I extract a complete number, like the year?
$mystring = “[2004/04/13] The date of this article.”;
if($mystring =~ m/(d+)/) {
print “The first number is $1.”;
}
# find word that is bolded
# returns: $1 = ‘text’
$line = “This is some text with HTML and “;
$line =~ m/(.*)/i;
