Categories
Python Answers

How to extract text from MS word files in Python?

Sometimes, we want to extract text from MS word files in Python.

In this article, we’ll look at how to extract text from MS word files in Python.

How to extract text from MS word files in Python?

To extract text from MS word files in Python, we can use the zipfile library.

For instance, we write

import zipfile, re

docx = zipfile.ZipFile('/path/to/file/mydocument.docx')
content = docx.read('word/document.xml').decode('utf-8')
cleaned = re.sub('<(.|\n)*?>','',content)
print(cleaned)

to create ZipFile object with the path string to the Word file.

Then we call read with 'word/document.xml' to read the Word file.

And we call decode to decode the text as Unicode.

Next, we call re.sub to replace the tags with empty strings.

Conclusion

To extract text from MS word files in Python, we can use the zipfile library.

Categories
Python Answers

How to fix lost connection to MySQL server during query with Python?

Sometimes, we want to fix lost connection to MySQL server during query with Python.

In this article, we’ll look at how to fix lost connection to MySQL server during query with Python.

How to fix lost connection to MySQL server during query with Python?

To fix lost connection to MySQL server during query with Python, we increase the max_allowed_packet size.

To do this, we write

connection.execute('set max_allowed_packet=67108864')

to call connection.execute with a string to set the max_allowed_packet to a bigger size.

Then in /etc/mysql/my.cnf we add max_allowed_packet=64M to set the max packet size to 64MB.

Conclusion

To fix lost connection to MySQL server during query with Python, we increase the max_allowed_packet size.

Categories
Python Answers

How to remove non-ASCII characters but leave periods and spaces with Python?

Sometimes, we want to remove non-ASCII characters but leave periods and spaces with Python.

In this article, we’ll look at how to remove non-ASCII characters but leave periods and spaces with Python.

How to remove non-ASCII characters but leave periods and spaces with Python?

To remove non-ASCII characters but leave periods and spaces with Python, we can get a list of printable characters with string.printable and use that to filter out the unwanted characters.

For instance, we write

import string
s = "some\x00string. with\x15 funny characters"

printable = set(string.printable)
filtered = filter(lambda x: x in printable, s)

to create a set from string.printable with set.

Then we call filter with a function that returns if character x in string s is in the printable set and the s string.

Then we get an iterable with the characters that are in printable in s.

Conclusion

To remove non-ASCII characters but leave periods and spaces with Python, we can get a list of printable characters with string.printable and use that to filter out the unwanted characters.

Categories
Python Answers

How to add a new column to a CSV file in Python?

Sometimes, we want to add a new column to a CSV file in Python.

In this article, we’ll look at how to add a new column to a CSV file in Python.

How to add a new column to a CSV file in Python?

To add a new column to a CSV file in Python, we can use Pandas.

For instance, we write

import pandas as pd

csv_input = pd.read_csv('input.csv')
csv_input['Berries'] = csv_input['Name']
csv_input.to_csv('output.csv', index=False)

to call read_csv to open input.csv.

Then we add the Berries column into the csv by assigning the values of the Name column into the new column.

Then we call to_csv with the file we want to save to and set index to False to not save the indexes in the new csv file.

Conclusion

To add a new column to a CSV file in Python, we can use Pandas.

Categories
Python Answers

How to kill a while loop with a keystroke in Python?

Sometimes, we want to kill a while loop with a keystroke in Python.

In this article, we’ll look at how to kill a while loop with a keystroke in Python.

How to kill a while loop with a keystroke in Python?

To kill a while loop with a keystroke in Python, we can catch the KeyboardInterrupt exception.

For instance, we write

try:
    while True:
        do_something()
except KeyboardInterrupt:
    pass

to add a infinite while loop into the try block.

And we catch the KeyboardInterrupt exception in the except block, which is triggered by pressing ctrl+c.

Conclusion

To kill a while loop with a keystroke in Python, we can catch the KeyboardInterrupt exception.