Categories
Python Answers

How to create a Python Pandas DataFrame from a string?

To create a Python Pandas DataFrame from a string, we use the StringIO class with read_csv.

For instance, we write

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA = StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)

df = pd.read_csv(TESTDATA, sep=";")

to create a StringIO instance with a string.

And then we call read_csv with the TESTDATA string with the sep set to the separator for the row items.

Categories
Python Answers

Hoe to convert a Python Pandas GroupBy output from Series to DataFrame?

To convert a Python Pandas GroupBy output from Series to DataFrame, we can use the reset_index method.

For instance, we write

import pandas

df1 = pandas.DataFrame( { 
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )

g1 = df1.groupby( [ "Name", "City"] ).count().reset_index()

to call groupby to group the Name and City columns.

And then we call count to get the count of the group by values in a series.

And then we call reset_index to return the values as a daatframe.

Categories
Python Answers

How to group by in group by and average with Python Pandas?

To group by in group by and average with Python Pandas, we can use the mean method.

For instance, we write

df.groupby(['org']).mean().groupby(['cluster']).mean()

to call groupby to group values by the org column value.

And then we call mean to get the mean of the grouped values.

Then we call groupby again to group the returned values by the cluster column.

And finally we call mean again to get the mean of those grouped values.

Categories
Python Answers

How to remove Python Pandas rows with duplicate indices?

To remove Python Pandas rows with duplicate indices, we call index.duplicated with negation.

For instance, we write

df3 = df3[~df3.index.duplicated(keep='first')]

to get the duplicated indexes of the df3 data frame with

df3.index.duplicated

And then we negate that to return the unique rows.

Categories
Python Answers

How to add new column to dataframe which is a copy of the index column with Python Pandas?

To add new column to dataframe which is a copy of the index column with Python Pandas, we can call plot with the index.

For instance, we write

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))

plt.plot(df.index, df[0])
plt.show()

to call plt.plot with the index for the horizontal axis with df.index and plot them as values.