Categories
Python

More Things We Can Do With Regexes and Python

Python is a convenient language that’s often used for scripting, data science, and web development.

In this article, we’ll look at newline matches, case insensitive matching, and the sub method.

Matching Newlines with the Dot Character

We can use the re.DOTALL constant to match newlines.

For instance, we can use it as in the following code:

import re  
regex = re.compile(r'.\*', re.DOTALL)  
matches = regex.search('Jane\\nJoe')

Then we get ‘Jane\nJoe’ as the value returned bymatches.group() .

Without re.DOTALL , as in the following example:

import re  
regex = re.compile(r'.\*')  
matches = regex.search('Jane\\nJoe')

we get ‘Jane’ as the value returned bymatches.group() .

Summary of Regex Symbols

The following is a summary of regex symbols:

  • ? — matches 0 or 1 of the preceding group
  • * — matches 0 or more of the preceding group
  • + — matches one or more of the preceding group
  • {n} — matches exactly n of the preceding group
  • {n,} — matches n or more of the preceding group
  • {,n} — matches 0 to n of the preceding group
  • {n,m} — matches n to m of the preceding group
  • {n,m}? or *? or +? performs a non-greedy match of the preceding group
  • ^foo — matches a string beginning with foo
  • foo$ — matches a string that ends with foo
  • . matches any character except for new kine
  • \d , \w , and \s matches a digit, word, or space character respectively
  • \D , \W , and \S match anything except a digit, word, or space character respectively
  • [abc] — matches any character between the brackets like a, , b , or c
  • [^abc] — matches any character but a , b or c

Case-Insensitive Matching

We can pass in re.I to do case insensitive matching.

For instance, we can write:

import re  
regex = re.compile(r'foo', re.I)  
matches = regex.findall('FOO foo fOo fOO Foo')

Then matches has the value [‘FOO’, ‘foo’, ‘fOo’, ‘fOO’, ‘Foo’] .

Substituting Strings with the sub() Method

We can use the sub method to replace all substring matches with the given string.

For instance, we can write:

import re  
regex = re.compile(r'\\d{3}-\\d{3}-\\d{4}')  
new\_string = regex.sub('SECRET', 'Jane\\'s number is 123-456-7890. Joe\\'s number is 555-555-1212')

Since sub replaces the substring matches passed in as the 2nd argument and a new string is returned, new_string has the value of:

"Jane's number is SECRET. Joe's number is SECRET"

Verbose Mode

We can use re.VERBOSE to ignore whitespaces and comments in a regex.

For instance, we can write:

import re  
regex = re.compile(r'\\d{3}-\\d{3}-\\d{4} # phone regex', re.VERBOSE)  
matches = regex.findall('Jane\\'s number is 123-456-7890. Joe\\'s number is 555-555-1212')

Then matches has the value [‘123–456–7890’, ‘555–555–1212’] since the whitespace and comment in our regex is ignored by passing in the re.VERBOSE option.

Combining re.IGNORECASE, re.DOTALL, and re.VERBOSE

We can combine re.IGNORECASE , re.DOTALL , and re.VERBOSE with a pipe (|) operator.

For instance, we can do a case-insensitive and ignore whitespace and comments by writing:

import re  
regex = re.compile(r'jane # jane',  re.IGNORECASE | re.VERBOSE)  
matches = regex.findall('Jane\\'s number is 123-456-7890. Joe\\'s number is 555-555-1212')

The matches has the values ['Jane'] since we passed in re.IGNORECASE and combined it with re.VERBOSE with the | symbol to do a case-insensitive search.

Conclusion

We can pass in different arguments to the re.compile method to adjust how regex searches are done.

re.IGNORECASE lets us do a case-insensitive search.

re.VERBOSE makes the Python interpreter ignore whitespace and comments in our regex.

re.DOTALL let us search for matches with newline characters.

The 3 constants above can be combined with the | operator.

The sub method makes a copy of the string, then replace all the matches with what we passed in, then returns the string with the replacements.

Categories
Python

Intro to Python Boolean and Conditional Statements

Python is a convenient language that’s often used for scripting, data science, and web development.

In this article, we’ll look at how to use booleans and conditional statements in our Python programs.

Boolean Values

Boolean values take the value True or False . They always start with uppercase.

They can be used in expressions like anything else. For example, we can write:

foo = True

Comparison Operators

Comparison operators are used for comparison 2 values ane evaluate operands to a single boolean value.

The following comparison operators are included with Python:

  • == — equal to
  • != — not equal to
  • < — less than
  • > — greater than
  • <= — less than or equal to
  • >= — greater than or equal to

For example, we can write the following:

1 == 1

returns True .

1 != 2

returns False .

'hello' == 'Hello'

also returns False .

== is the equal to comparison operator, while the = is the assignment operator that assigns the right operand to the variable on the left.

Boolean Operators

The and operator takes 2 boolean values and then return one boolean value given the 2 operands.

It returnsTrue if both operands are True . Otherwise, it returns False .

The or operator takes 2 boolean values and returns one boolean value given the 2 operands.

It returns True if one or both operands are True . Otherwise, it returns False .

The not operator is a unary operator, which means it takes one operand.

It returns the negated value of the operand. This means that not True returns False and not False returns True .

Mixing Boolean and Comparison Operators

We can mix booleans and comparison operators since comparison operators return booleans.

For example, we can write:

(1 < 2) and (4 < 5)

which returns True .

Or:

(1 == 2) and (4 == 5)

which returns False .

Flow Control

We can combine conditions and blocks of code to create a program that has flow control.

The conditions can be used with the if or a combination of if , elif , or a combination of if , elife , and else together.

Blocks are indented. They begin when indentation increases and they can have blocks nested in it.

Blocks end when the indentation decreases to zero or to the containing block’s indentation.

For example, we can write the following if block;

print('Enter your name')  
name=input()  
if name == 'Mary':  
  print('Hello Mary')

The code above asks for the name and displays ‘Hello Mary’ if the name entered is 'Mary' .

We can add a nested if block as follows:

print('Enter your name')  
name=input()  
print('Enter your age')  
age=input()  
if name == 'Mary':  
  print('Hello Mary')  
  if int(age) < 18:  
    print('You are a girl')  
  else:  
    print('You are a woman')

In the code above, we have a nested if block that nest the age check in the name check.

We have the else block which runs if the int(age) < 18 returns False .

If we have more than 2 cases, we can use the elif keyword for checking and running code if alternative cases are True .

For example, we can use it as follows:

print('Enter your name')  
name=input()  
if name == 'Mary':  
  print('Hello Mary')  
elif name == 'Alex':  
  print('Hello Alex')  
elif name == 'Jane':  
  print('Hello Jane')  
else:  
  print('I do not know you')

Now if we enter Mary , Alex or Jane , we’ll see the Hello sentences displayed. Otherwise, we see I do not know you displayed.

Note that we always have a colon at the end of a if , elif and else lines.

The blocks are also indented. This is mandatory in Python to denote blocks.

Conclusion

Booleans are variables that can take the value True or False .

Comparison operators can be used to build expressions from other values. We can compare numbers and check if strings are equal.

They return boolean values, so they can be combined with the name operators to return boolean values.

and and or operators are used to combining expressions with comparison operators.

We can then use them in if statements to run code conditionally. For alternative cases, we can add them to elif and else keywords to denote them. They have to be used with if blocks.

Categories
Python

Quick Intro to Python Loops

Python is a convenient language that’s often used for scripting, data science, and web development.

In this article, we’ll look at various kinds of loops that we can use in Python apps to run repeated code.

while Loop Statements

We can use the while loop to run code repeatedly while a condition is True .

It consists of the while keyword, a condition to evaluate, a colon, and then the code to run indented below it.

For example, we can write the following while loop to print a message repeatedly:

x = 0  
while x < 5:  
    print('Hello.')  
    x = x + 1

In the code above, we have x set to 0. Then we use the while loop to print ‘Hello.’. Next, we increment x by 1. We do this repeatedly until x reaches 5.

while loops are useful for looping keeping the loop running until we meet a condition. It doesn’t have to have a finite, determinate amount of iterations.

For example, we can use the while loop until the user guesses the right number as follows:

guess = 0  
while int(guess) != 5:  
  print('Guess a number')  
  guess = input()  
print('You got it')

In the code above, as long as guess doesn’t evaluate to 5 when we convert it to an integer, the while loop will keep running.

Once we entered the right guess, which is 5, the loop will end.

break Statements

The break keyword is used to terminate a loop before the loop ends.

For example, we can rewrite the example above, with break instead of the condition in the while loop as follows:

guess = 0  
while True:  
  if int(guess) == 5:  
    break  
  print('Guess a number')  
  guess = input()  
print('You got it')

In the code above, we have an infinite while loop that has the condition to end the loop with break when we int(guess) returns 5.

The rest of the code works the same way as before.

continue Statements

We can use the continue statement to move on to the next iteration of the loop.

For example, we can use it as follows:

x = 0  
while x < 5:  
  x = x + 1  
  if x == 2:  
    continue  
  print(x)

The code above prints the value of x if it’s not 2. This is because if x is 2, we run continue to skip to the next iteration.

Truthy and Falsy Values

Python has the concept of truthy and falsy values. Truthy values are automatically converted to True when we use them where we have condition checks.

Falsy values are converted to False when we use them for condition checks.

0, 0.0, and '' (the empty string) are all considered False , while all other values are considered True .

For example, we can write a program to prompt users to enter a name and won’t stop until they enter one as follows:

name = ''  
while not name:  
  print('Enter your name:')  
  name = input()  
print('Your name is', name)

In the code above, we use not name to check if name is an empty string or not. If it is, then we keep showing 'Enter your name.' until they enter one.

Once they did, we display the last line with the name .

for Loops and the range() Function

We can use the for loop to loop through a certain number of items.

For example, we can use the for loop with the range function to display numbers from 0 to 4 as follows:

for i in range(5):  
    print(i)

In the code above, the range function returns integers starting from 0 as we and increments the number as we loop up to the number passed into the range function minus 1.

As we can see, the for loop consists of the for keyword, a variable name, the in keyword, a call to the range function, a colon , and then the block of code to run in the loop.

We can also use break and continue statement inside for loops as we did in while loops.

The range function can take 3 arguments, where the first is the starting number and the 2nd argument is the ending number. The loop will terminate when it reaches the ending number minus 1.

The 3rd argument is the increment to increase the variable by in each iteration.

For example, we can write the following code to print all odd numbers between 1 and 10:

for i in range(1, 10, 2):  
    print(i)

We should see:

1  
3  
5  
7  
9

printed because in our range function call, we passed in 1 as the starting number, 10 as the ending number, and 2 to increment i by 2 in each iteration.

Conclusion

We can use while loops to repeatedly run a block of code until a condition is met. This means the loop can run an indeterminate number of iterations.

break is used for ending a loop before it ends. continue is used to skip the loop to the next iteration.

for loops are used for repeatedly run code a finite number of times. It’s used with the range function to do the finite iteration.

Categories
Python

Python String Methods You May Have Missed

Python is a convenient language that’s often used for scripting, data science, and web development.

In this article, we’ll look at how to use Python string methods to manipulate strings.

The upper(), lower(), isupper(), and islower() Methods

The upper method converts all characters of a string to upper case and returns it.

For instance, given the following string:

msg = 'Hello Jane'

Then running msg.upper() returns ‘HELLO JANE’ .

The lower method converts all characters of a string to lower case and returns it.

Therefore, msg.lower() returns ‘hello jane’ .

isupper checks if the whole string is converted to upper case.

For instance, if we have:

msg = 'HELLO JANE'

Then msg.isupper() returns True .

islower checks if the whole string is converted to lower case. For instance, given the following string:

msg = 'hello jane'

Then msg.islower() returns True .

upper and lower can be chained together since they both return strings.

For instance, we can write:

msg.upper().lower()

Then we get:

'hello jane'

returned.

The isX() Methods

There are also other methods for checking for various aspects of the string.

isalpha checks if the whole string consists of only letters and isn’t blank.

For instance, given the following string:

msg = 'hello jane'

Then msg.isalpha() returns False since it has a space in it.

isalnum checks is a string only consists of letters and numbers and returns True if it is.

For example, given the following string:

msg = 'hello'

Then msg.isalnum() returns True .

isdecimal returns True is string consists only of numeric characters and isn’t blank.

For instance, if we have:

msg = '12345'

Then msg.isdecimal() returns True .

isspace returns True if the string only consists of tabs, spaces, and newlines and isn’t blank.

For instance, if we have the following string:

msg = '\n '

Then msg.isspace() returns True .

istitle returns True if the string only has words that begin with an upper case letter followed by only lower case letters.

For instance, if we have the following string:

msg = 'Hello World'

Then msg.istitle() returns True .

The startswith() and endswith() Methods

The startswith method returns True if a string starts with the substring passed in as the argument.

For instance:

'Hello, world'.startswith('Hello')

returns True .

The endswith method returns True if a string ends with the substring passed in as the argument.

For instance:

'Hello, world!'.endswith('world!')

returns True since our string ends with world! .

The join() and split() Methods

The join method combines multiple strings in a string array into one string by the character that it’s called on.

For instance, we can write:

','.join(['apple', 'orange', 'grape'])

which returns ‘apple,orange,grape’.

The string that it’s called on is inserted between the entries.

The split method is used to split a string into a list of substrings by the character that it’s called on.

For instance:

'My name is Jane'.split(' ')

returns [‘My’, ‘name’, ‘is’, ‘Jane’] .

Splitting Strings with the partition() Method

The partition method splits a string into text before and after a separator string.

For instance:

'My name is Jane'.partition('is')

returns:

('My name ', 'is', ' Jane')

We can use the multiple assignment syntax to assign the parts into their own variables since the string is called on is always split into 3 parts.

For instance, we write the folllowing:

before, sep, after = 'My name is Jane'.partition('is')

Then before has the value ‘My name ‘ . sep is 'is' , and after is ' Jane' .

Justifying Text with the rjust(), ljust(), and center() Methods

The rjust method moves a string by the given number of spaces passed in as the argument to the right.

For instance:

'foo'.rjust(5)

returns:

'  foo'

It also takes a second argument to fill in something instead of spaces. For instance, ‘foo’.rjust(5, ‘-’) returns ‘--foo’

ljust adds spaces by the number of that’s passed into the argument to the right of the text.

For instance:

'foo'.ljust(5)

returns:

'foo  '

It also takes a second argument to fill in something instead of spaces. For instance, ‘foo’.ljust(5, ‘*’) returns ‘foo**’

The center method adds the number of spaces passed in as the argument to the left and the right of the string.

For instance:

'foo'.center(15)

returns:

'      foo      '

It also takes a second argument to fill in something instead of spaces. For instance, ‘foo’.center(5, ‘*’) returns ‘*foo*’ .

Conclusion

Python has string methods to convert strings to upper and lower case.

We can also add spaces and other characters to the string.

Multiple strings can also be joined together. Also, they can be split off into multiple strings.

There’re also many methods to check strings for various characteristics.

Categories
Python

Using Regex with Python

Python is a convenient language that’s often used for scripting, data science, and web development.

In this article, we’ll look at how to use regex with Python to make finding text easier.

Finding Patterns of Text with Regular Expressions

Regular expressions, or regexes, are descriptions for a pattern of text.

For instance, \d represents a single digit. We can combine characters to create regexes to search text.

To use regexes to search for text, we have to import the re module and then create a regex object with a regex string as follows:

import re  
phone_regex = re.compile('\\d{3}-\d{3}-\d{4}')

The code above has the regex to search for a North American phone number.

Then if we have the following string:

msg = 'Joe\'s phone number is 555-555-1212'

We can look for the phone number inside msg with the regex object’s search method as follows:

import re  
phone_regex = re.compile('\d{3}-\d{3}-\d{4}')  
msg = 'Joe\'s phone number is 555-555-1212'  
match = phone_regex.search(msg)

When we inspect the match object, we see something like:

<re.Match object; span=(22, 34), match='555-555-1212'>

Then we can return a string representation of the match by calling the group method:

phone = match.group()

phone has the value '555-555-1212' .

Grouping with Parentheses

We can use parentheses to group different parts of the result into its own match entry.

To do that with our phone number regex, we can write:

phone_regex = re.compile('(\d{3})-(\d{3})-(\d{4})')

Then when we call search , we can either get the whole search string, or individual match groups.

group takes an integer that lets us get the parts that are matched by the groups.

Therefore, we can rewrite our program to get the whole match and the individual parts of the phone number as follows:

import re  
phone_regex = re.compile('(\d{3})-(\d{3})-(\d{4})')  
msg = 'Joe\'s phone number is 123-456-7890'  
match = phone_regex.search(msg)  
phone = match.group()  
area_code = match.group(1)  
exchange_code = match.group(2)  
station_code = match.group(3)

In the code above, phone should be ‘123–456–7890’ since we passed in nothing to group. Passing in 0 also returns the same thing.

area_code should be '123' since we passed in 1 to group , which returns the first group match.

exchange_code should be '456' since we passed in 2 to group , which returns the 2nd group match.

Finally, station_code should be '7890' since we passed in 3 to group , which returns the 3rd group match.

If we want to pass in parentheses or any other special character as a character of the pattern rather than a symbol for the regex, then we have to put a \ before it.

Matching Multiple Groups with the Pipe

We can use the | symbol, which is called a pipe to match one of many expressions.

For instance, we write the following to get the match:

import re  
name_regex = re.compile('Jane|Joe')  
msg = 'Jane and Joe'  
match = name_regex.search(msg)  
match = match.group()

match should be 'Jane' since this is the first match that’s found according to the regex.

We can combine pipes and parentheses to find a part of a string. For example, we can write the following code:

import re  
snow_regex = re.compile(r'snow(man|mobile|shoe)')  
msg = 'I am walking on a snowshoe'  
snow_match = snow_regex.search(msg)  
match = snow_match.group()  
group_match = snow_match.group(1)

to get the whole match with match , which has the value 'snowshoe' .

group_match should have the partial group match, which is 'shoe' .

Optional Matching with the Question Mark

We can add a question mark to the end of a group, which makes the group optional for matching purposes.

For example, we can write:

import re  
snow_regex = re.compile(r'snow(shoe)?')  
msg = 'I am walking on a snowshoe'  
msg_2 = 'I am walking on snow'  
snow_match = snow_regex.search(msg)  
snow_match_2 = snow_regex.search(msg_2)

Then snow_match.group() returns 'snowshoe' and snow_match.group(1) returns 'shoe' .

Since the (shoe) group is optional, snow_match_2.group() returns 'snow' and snow_match_2.group(1) returns None .

Conclusion

We can use regexes to find patterns in strings. They’re denoted by a set of characters that defines a pattern.

In Python, we can use the re module to create a regex object from a string.

Then we can use it to do searches with the search method.

We can define groups with parentheses. Once we did that, we can call group on the match object returned by search.

The group is returned when we pass in an integer to get it by their position.

We can make groups optional with a question mark appended after the group.