Python is a convenient language that’s often used for scripting, data science, and web development.
In this article, we’ll look at newline matches, case insensitive matching, and the sub method.
Matching Newlines with the Dot Character
We can use the re.DOTALL constant to match newlines.
For instance, we can use it as in the following code:
import re
regex = re.compile(r'.\*', re.DOTALL)
matches = regex.search('Jane\\nJoe')
Then we get ‘Jane\nJoe’ as the value returned bymatches.group() .
Without re.DOTALL , as in the following example:
import re
regex = re.compile(r'.\*')
matches = regex.search('Jane\\nJoe')
we get ‘Jane’ as the value returned bymatches.group() .
Summary of Regex Symbols
The following is a summary of regex symbols:
?— matches 0 or 1 of the preceding group*— matches 0 or more of the preceding group+— matches one or more of the preceding group{n}— matches exactlynof the preceding group{n,}— matchesnor more of the preceding group{,n}— matches 0 tonof the preceding group{n,m}— matchesntomof the preceding group{n,m}?or*?or+?performs a non-greedy match of the preceding group^foo— matches a string beginning withfoofoo$— matches a string that ends withfoo.matches any character except for new kine\d,\w, and\smatches a digit, word, or space character respectively\D,\W, and\Smatch anything except a digit, word, or space character respectively[abc]— matches any character between the brackets likea,,b, orc[^abc]— matches any character buta,borc
Case-Insensitive Matching
We can pass in re.I to do case insensitive matching.
For instance, we can write:
import re
regex = re.compile(r'foo', re.I)
matches = regex.findall('FOO foo fOo fOO Foo')
Then matches has the value [‘FOO’, ‘foo’, ‘fOo’, ‘fOO’, ‘Foo’] .
Substituting Strings with the sub() Method
We can use the sub method to replace all substring matches with the given string.
For instance, we can write:
import re
regex = re.compile(r'\\d{3}-\\d{3}-\\d{4}')
new\_string = regex.sub('SECRET', 'Jane\\'s number is 123-456-7890. Joe\\'s number is 555-555-1212')
Since sub replaces the substring matches passed in as the 2nd argument and a new string is returned, new_string has the value of:
"Jane's number is SECRET. Joe's number is SECRET"
Verbose Mode
We can use re.VERBOSE to ignore whitespaces and comments in a regex.
For instance, we can write:
import re
regex = re.compile(r'\\d{3}-\\d{3}-\\d{4} # phone regex', re.VERBOSE)
matches = regex.findall('Jane\\'s number is 123-456-7890. Joe\\'s number is 555-555-1212')
Then matches has the value [‘123–456–7890’, ‘555–555–1212’] since the whitespace and comment in our regex is ignored by passing in the re.VERBOSE option.
Combining re.IGNORECASE, re.DOTALL, and re.VERBOSE
We can combine re.IGNORECASE , re.DOTALL , and re.VERBOSE with a pipe (|) operator.
For instance, we can do a case-insensitive and ignore whitespace and comments by writing:
import re
regex = re.compile(r'jane # jane', re.IGNORECASE | re.VERBOSE)
matches = regex.findall('Jane\\'s number is 123-456-7890. Joe\\'s number is 555-555-1212')
The matches has the values ['Jane'] since we passed in re.IGNORECASE and combined it with re.VERBOSE with the | symbol to do a case-insensitive search.
Conclusion
We can pass in different arguments to the re.compile method to adjust how regex searches are done.
re.IGNORECASE lets us do a case-insensitive search.
re.VERBOSE makes the Python interpreter ignore whitespace and comments in our regex.
re.DOTALL let us search for matches with newline characters.
The 3 constants above can be combined with the | operator.
The sub method makes a copy of the string, then replace all the matches with what we passed in, then returns the string with the replacements.