Categories
Python Answers

How to merge two dictionaries in a single expression with Python?

Sometimes, we want to merge two dictionaries in a single expression with Python.

In this article, we’ll look at how to merge two dictionaries in a single expression with Python.

How to merge two dictionaries in a single expression with Python?

To merge two dictionaries in a single expression with Python, we can use the ** or | operators.

For instance, we write:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
z = {**x, **y}
print(z)

Then z is {'a': 1, 'b': 3, 'c': 4}.

** is available since Python 3.5.

We can also use the | operator with Python 3.9 or later.

To use it, we write:

x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
z = x | y

And we get the same value for z.

Conclusion

To merge two dictionaries in a single expression with Python, we can use the ** or | operators.

Categories
Python Answers

How to check whether a file exists without exceptions with Python?

Sometimes, we want to check whether a file exists without exceptions with Python.

In this article, we’ll look at how to check whether a file exists without exceptions with Python.

How to check whether a file exists without exceptions with Python?

To check whether a file exists without exceptions with Python, we can use the os.path.isFile method.

For instance, we write:

import os.path
fname = './foo.txt'
os.path.isfile(fname) 

fname is a file path string.

If the file doesn’t exist at the fname path, then it returns False.

Conclusion

To check whether a file exists without exceptions with Python, we can use the os.path.isFile method.

Categories
Python Answers

How to add a ternary conditional expression with Python?

Sometimes, we want to add a ternary conditional expression with Python.

In this article, we’ll look at how to add a ternary conditional expression with Python.

How to add a ternary conditional expression with Python?

To add a ternary conditional expression with Python, we can following the following format:

a if condition else b

where a and b are expressions.

For instance, if we have:

'true' if True else 'false'

Then true is returned.

Conclusion

To add a ternary conditional expression with Python, we can following the following format:

a if condition else b
Categories
Beautiful Soup

Web Scraping with Beautiful Soup — Equality, Copies, and Parsing Part of a Document

We can get data from web pages with Beautiful Soup.

It lets us parse the DOM and extract the data we want.

In this article, we’ll look at how to scrape HTML documents with Beautiful Soup.

Comparing Objects for Equality

We can compare objects for equality.

For example, we can write:

from bs4 import BeautifulSoup
markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>"
soup = BeautifulSoup(markup, 'html.parser')
first_b, second_b = soup.find_all('b')
print(first_b == second_b)
print(first_b.previous_element == second_b.previous_element)

Then we the first print prints True since the first b element and the 2nd one has the same structure and content.

The 2nd print prints False because the previous element to each b element is different.

Copying Beautiful Soup Objects

We can copy Beautiful Soup objects.

We can use the copy library to do this:

from bs4 import BeautifulSoup
import copy

markup = "<p>I want <b>pizza</b> and more <b>pizza</b>!</p>"
soup = BeautifulSoup(markup, 'html.parser')
p_copy = copy.copy(soup.p)
print(p_copy)

The copy is considered to be equal to the original.

Parsing Only Part of a Document

For example, we can write:

from bs4 import BeautifulSoup, SoupStrainer

html_doc = """<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup = BeautifulSoup(html_doc, 'html.parser')
only_a_tags = SoupStrainer("a")
only_tags_with_id_link2 = SoupStrainer(id="link2")

def is_short_string(string):
    return string is not None and len(string) < 10
only_short_strings = SoupStrainer(string=is_short_string)

print(only_a_tags)
print(only_tags_with_id_link2)
print(only_short_strings)

We can only select the elements we want with SoupStrainer .

The selection can be done with a selector, or we can pass in an id , or pass in a function to do the selection.

Then we see:

a|{}
None|{'id': u'link2'}
None|{'string': <function is_short_string at 0x00000000036FC908>}

printed.

Conclusion

We can parse part of a document, compare parsed objects for equality, and copy objects with Beautiful Soup.

Categories
Beautiful Soup

Web Scraping with Beautiful Soup — Encoding

We can get data from web pages with Beautiful Soup.

It lets us parse the DOM and extract the data we want.

In this article, we’ll look at how to scrape HTML documents with Beautiful Soup.

Output Formatters

We can format our output with Beautiful Soup.

For example, we can write:

from bs4 import BeautifulSoup
french = "<p>Il a dit &lt;&lt;Sacr&eacute; bleu!&gt;&gt;</p>"
soup = BeautifulSoup(french, 'html.parser')
print(soup.prettify(formatter="html"))

to set the formatter to the one we want when we call prettify .

Also we can use the html5 formatter,

For example, we can write:

from bs4 import BeautifulSoup
br = BeautifulSoup("<br>", 'html.parser').br
print(br.prettify(formatter="html"))
print(br.prettify(formatter="html5"))

Then from the first print , we see:

<br/>

And from the 2nd print , we see:

<br>

Also, we can set the formatter to None :

from bs4 import BeautifulSoup
link_soup = BeautifulSoup('<a href="http://example.com/?foo=val1&bar=val2">A link</a>', 'html.parser')
print(link_soup.a.encode(formatter=None))

Then the string is printed as-is.

get_text()

We can call the get_text method to get the text from an element,.

For instance, we can write:

from bs4 import BeautifulSoup
markup = '<a href="http://example.com/">nI linked to <i>example.com</i>n</a>'
soup = BeautifulSoup(markup, 'html.parser')

print(soup.get_text())

Then we see:

I linked to example.com

printed.

We can specify how the bits of text can be joined together by passing in an argument.

For example, if we write:

from bs4 import BeautifulSoup
markup = '<a href="http://example.com/">nI linked to <i>example.com</i>n</a>'
soup = BeautifulSoup(markup, 'html.parser')

print(soup.get_text('|'))

Then we write:

I linked to |example.com|

Encodings

We can get the encoding of the markup string.

For example, we can write:

from bs4 import BeautifulSoup
markup = "<h1>Sacrxc3xa9 bleu!</h1>"
soup = BeautifulSoup(markup, 'html.parser')
print(soup.original_encoding)

Then soup.original_encoding is ‘utf-8’ .

We specify the encoding of the string with the from_encoding parameter.

For instance, we can write:

from bs4 import BeautifulSoup
markup = b"<h1>xedxe5xecxf9</h1>"
soup = BeautifulSoup(markup, 'html.parser', from_encoding="iso-8859-8")
print(soup.h1)
print(soup.original_encoding)

We set the encoding in the BeautifulSoup class so that we get what we expect parsed.

Also, we can call encode on a parsed node to parse it with the given encoding.

For example, we can write:

from bs4 import BeautifulSoup
markup = u"<b>N{SNOWMAN}</b>"
snowman_soup = BeautifulSoup(markup, 'html.parser')
tag = snowman_soup.b
print(tag.encode("latin-1"))

to set the encoding.

Then we see:

<b>&#9731;</b>

printed.

Unicode, Dammit

We can use the UnicodeDammit class from Beautiful Soup to convert a string with any encoding to Unicode.

For example, we can write:

from bs4 import BeautifulSoup, UnicodeDammit
dammit = UnicodeDammit("Sacrxc3xa9 bleu!")
print(dammit.unicode_markup)
print(dammit.original_encoding)

Then dammit.unicode_markup is ‘Sacré bleu!’ and dammit.original_encoding is utf-8 .

Smart Quotes

We can use Unicode, Dammit to convert Microsoft smart quotes to HTML or XML entities:

from bs4 import BeautifulSoup, UnicodeDammit
markup = b"<p>I just x93lovex94 Microsoft Wordx92s smart quotes</p>"
print(UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="html").unicode_markup)
print(UnicodeDammit(markup, ["windows-1252"], smart_quotes_to="xml").unicode_markup)

Then we get:

<p>I just &ldquo;love&rdquo; Microsoft Word&rsquo;s smart quotes</p>

from the first print and:

<p>I just &#x201C;love&#x201D; Microsoft Word&#x2019;s smart quotes</p>

from the 2nd print .

Conclusion

Beautiful can work with strings with various encodings.