Perl Regular Expressions

Perl Regular Expressions



Metacharacters
char meaning
^ beginning of string $ end of string . any character except newline * match 0 or more times + match 1 or more times ? match 0 or 1 times; or: shortest match | alternative ( ) grouping; "storing" [ ] set of characters { } repetition modifier quote or special
Repetition

a*     zero or more a's 

a+     one or more a's 

a?     zero or one a's (i.e., optional a) 

a{m}   exactly m a's 

a{m,}  at least m a's 

a{m,n} at least m but at most n a's repetition?



 

t     tab 

n     newline 

r     return (CR) 

xhh   character with hex. code hh 

b     "word" boundary 

B     not a "word" boundary 





w     matches any single character classified as a 

       "word" character (alphanumeric or _) 

W     matches any non-"word" character 

s     matches any whitespace character (space, tab, newline) 

S     matches any non-whitespace character  

d     matches any digit character, equiv. to [0-9] 

D     matches any non-digit character 





[characters] matches any of the characters in the sequence  

[x-y]        matches any of the characters from x to y 

             (inclusively) in the ASCII code  

[-]         matches the hyphen character - 

[n]         matches the newline; other single character 

             denotations with  apply normally, too  

 



Examples

How do I extract everything between a the words “start” and “end”?

$mystring = “The start text always precedes the end of the end text.”;

if($mystring =~ m/start(.*)end/) {

print $1;

}

How do I extract a complete number, like the year?

$mystring = “[2004/04/13] The date of this article.”;

if($mystring =~ m/(d+)/) {

print “The first number is $1.”;

}

# find word that is bolded

# returns: $1 = ‘text’

$line = “This is some text with HTML and “;

$line =~ m/(.*)/i;

If you have found my website useful, please consider buying me a coffee below 😉