Perl Regular Expressions
Metacharacters
char meaning ^ beginning of string $ end of string . any character except newline * match 0 or more times + match 1 or more times ? match 0 or 1 times; or: shortest match | alternative ( ) grouping; "storing" [ ] set of characters { } repetition modifier quote or special
Repetition a* zero or more a's a+ one or more a's a? zero or one a's (i.e., optional a) a{m} exactly m a's a{m,} at least m a's a{m,n} at least m but at most n a's repetition? t tab n newline r return (CR) xhh character with hex. code hh b "word" boundary B not a "word" boundary w matches any single character classified as a "word" character (alphanumeric or _) W matches any non-"word" character s matches any whitespace character (space, tab, newline) S matches any non-whitespace character d matches any digit character, equiv. to [0-9] D matches any non-digit character [characters] matches any of the characters in the sequence [x-y] matches any of the characters from x to y (inclusively) in the ASCII code [-] matches the hyphen character - [n] matches the newline; other single character denotations with apply normally, too
Examples
How do I extract everything between a the words “start” and “end”?
$mystring = “The start text always precedes the end of the end text.”;
if($mystring =~ m/start(.*)end/) {
print $1;
}
How do I extract a complete number, like the year?
$mystring = “[2004/04/13] The date of this article.”;
if($mystring =~ m/(d+)/) {
print “The first number is $1.”;
}
# find word that is bolded
# returns: $1 = ‘text’
$line = “This is some text with HTML and “;
$line =~ m/(.*)/i;