Category: Python Answers

How to get percentage of total with groupby with Python Pandas?

Post author By John Au-Yeung
Post date March 26, 2022
No Comments on How to get percentage of total with groupby with Python Pandas?

Labrador retriever puppy walking on green grass

To get percentage of total with groupby with Python Pandas, we can use the / operator to get the percentage with each item in a column.

For instance, we write

df['sales'] / df.groupby('state')['sales'].transform('sum')

to divide the values in the sales column with the sales values grouped by the state value and summed together with transform.

Python Answers

How to append existing Excel sheet with new dataframe using Python Pandas?

Post author By John Au-Yeung
Post date March 26, 2022
No Comments on How to append existing Excel sheet with new dataframe using Python Pandas?

To append existing Excel sheet with new dataframe using Python Pandas, we can use ExcelWriter.

For instance, we write

import pandas as pd
import openpyxl

workbook = openpyxl.load_workbook("test.xlsx")
writer = pd.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = workbook
writer.sheets = dict((ws.title, ws) for ws in workbook.worksheets)
data_df.to_excel(writer, 'Existing_sheetname')
writer.save()
writer.close()

to call load_workbook with the Excel file path.

Then we caLL ExcelWrite to create the writer.

And set writer.book to workbook.

Then we get the sheets from workbook.worksheets and convert it to a dict before assigning it to writer.sheets.

Next, we call to_excel with writer and a sheet name that already exists to append the Excel sheet values to the data_df dataframe.

Then we save the writer and close it.

Python Answers

How to replace NaNs by preceding or next values in a Python Pandas DataFrame?

Post author By John Au-Yeung
Post date March 26, 2022
No Comments on How to replace NaNs by preceding or next values in a Python Pandas DataFrame?

To replace NaNs by preceding or next values in a Python Pandas DataFrame we can use the fillna method with the method argument set to 'ffill'.

For instance, we write

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df.fillna(method='ffill')

to call fillna on dataframe df with the method argument set to 'ffill' to fill NaNs with values before the next row.

We can also set method to 'bfill ' to fill NaNs with values after the next row.

Python Answers

How to fix Python Pandas Error tokenizing data?

Post author By John Au-Yeung
Post date March 26, 2022
No Comments on How to fix Python Pandas Error tokenizing data?

To fix Python Pandas Error tokenizing data, we call read_csv with the on_bad_lines argument set to 'skip'.

For instance, we write

data = pd.read_csv('file1.csv', on_bad_lines='skip')

to call read_csv with the file path and the on_bad_lines argument set to 'skip' to skip the bad lines when reading the CSV.

Python Answers

How to do multiple aggregations of the same column using Python Pandas with GroupBy.agg()?

Post author By John Au-Yeung
Post date March 26, 2022
No Comments on How to do multiple aggregations of the same column using Python Pandas with GroupBy.agg()?

To do multiple aggregations of the same column using Python Pandas with GroupBy.agg(), we can use the groupby and agg methods.

For instance, we write

df.groupby('group').agg(
             a_sum=('a', 'sum'),
             a_mean=('a', 'mean'),
             b_mean=('b', 'mean'),
             c_sum=('c', 'sum'),
             d_range=('d', lambda x: x.max() - x.min())
)

to call agg on the groups returned by groupby with some arguments to computed aggregate values for various columns.

We compute the sum of columns in a, the mean of a and b, the sum of c and the differnce between the max and min columns in d with agg.