Categories
Python Answers

How to construct a Python Pandas DataFrame from items in nested dictionary?

Sometimes, we want to construct a Python Pandas DataFrame from items in nested dictionary.

In this article, we’ll look at how to construct a Python Pandas DataFrame from items in nested dictionary.

How to construct a Python Pandas DataFrame from items in nested dictionary?

To construct a Python Pandas DataFrame from items in nested dictionary, we can use dictionary comprehension to get the values we want before creating the data frame.

For instance, we write

pd.concat({k: pd.DataFrame(v).T for k, v in user_dict.items()}, axis=0)

to call pd.concat with a fictionary that we create from creating the DataFrame with value v that we get from the key-value pairs returned by items.

Conclusion

To construct a Python Pandas DataFrame from items in nested dictionary, we can use dictionary comprehension to get the values we want before creating the data frame.

Categories
Python Answers

How to add Python Pandas data to an existing csv file?

Sometimes, we want to add Python Pandas data to an existing csv file.

In this article, we’ll look at how to add Python Pandas data to an existing csv file.

How to add Python Pandas data to an existing csv file?

To add Python Pandas data to an existing csv file, we can write the data to a csv with to_csv.

For instance, we write

df.to_csv('my_csv.csv', mode='a', header=False)

to call to_csv with the CSV file path.

The mode is set to 'a' for append.

And header is set to false so the header row won’t be appended.

Conclusion

To add Python Pandas data to an existing csv file, we can write the data to a csv with to_csv.

Categories
Python Answers

How to select with complex criteria from a Python Pandas DataFrame?

Sometimes, we want to select with complex criteria from a Python Pandas DataFrame.

In this article, we’ll look at how to select with complex criteria from a Python Pandas DataFrame.

How to select with complex criteria from a Python Pandas DataFrame?

To select with complex criteria from a Python Pandas DataFrame, we can call the query method.

For instance, we write

import pandas as pd

from random import randint
df = pd.DataFrame({'A': [randint(1, 9) for x in xrange(10)],
                   'B': [randint(1, 9) * 10 for x in xrange(10)],
                   'C': [randint(1, 9) * 100 for x in xrange(10)]})
df.query('B > 50 and C != 900')

to create a data frame df with pd.DataFrame.

Then we call df.query with a string that has the conditions of the values we’re looking for and return that as a data frame.

Conclusion

To select with complex criteria from a Python Pandas DataFrame, we can call the query method.

Categories
Python Answers

How to convert an XML file to a Python Pandas dataframe?

To convert an XML file to a Python Pandas dataframe, we parse the XML into an object and them we create a dataframe from it.

For instance, we write

import pandas as pd
import xml.etree.ElementTree as ET

xml_str = '<?xml version="1.0" encoding="utf-8"?>\n<response>\n <head>\n  <code>\n   200\n  </code>\n </head>\n <body>\n  <data id="0" name="All Categories" t="2018052600" tg="1" type="category"/>\n  <data id="13" name="RealEstate.com.au [H]" t="2018052600" tg="1" type="publication"/>\n </body>\n</response>'

etree = ET.fromstring(xml_str)
dfcols = ['id', 'name']
df = pd.DataFrame(columns=dfcols)

for i in etree.iter(tag='data'):
    df = df.append(
        pd.Series([i.get('id'), i.get('name')], index=dfcols),
        ignore_index=True)

df.head()

to call ET.fromstring with xml_str to create an XML tree object.

And then we create an empty data frame with some columns with DataFrame.

Next, we use a for loop to loop through the data tag values.

In it, we call df.append to append the series created from the id and name attribute values of each node.

Categories
Python Answers

How to create scatter by category plots in Python Pandas and Pyplot?

To create scatter by category plots in Python Pandas and Pyplot, we can use the subplots method to do the plots.

For instance, we write

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(1974)

num = 20
x, y = np.random.random((2, num))
labels = np.random.choice(['a', 'b', 'c'], num)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05)
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()

to call np.random.random to create some random data.

And then we convert it to a data frame with DataFrame.

Next, we call plt.subplots to crearte the plot.

Then we loop through the groups we got from groupby and call ax.plot to plot the values.

Next, we call ax.legeng to add a legend.

And then we call plt.show to show the plot.