Categories
Python Answers

How to concatenate strings from several rows using Python Pandas groupby?

To concatenate strings from several rows using Python Pandas groupby, we can use the transform method.

For instance, we write

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()

to create the text column that calls groupby on the selected columns name and month.

And then we get the text column from the grouped data frame and call transform with a lamnda function to join the strings together.

And then we call drop_duplicates to drop the duplicate rows.

Categories
Python Answers

How to count the frequency that a value occurs in a Python Pandas dataframe column?

To count the frequency that a value occurs in a Python Pandas dataframe column, we call groupby and count.

For instance, we write

df = pd.DataFrame({'a':list('abssbab')})
df.groupby('a').count()

to create the df data frame.

Then we call groupby with 'a' to group the 'a' column values into groups.

Then we call count to count the number of values in each group.

Categories
Python Answers

How to use Python Pandas get rows which are NOT in other dataframe?

To use Python Pandas get rows which are NOT in other dataframe, we can use isin with negation.

For instance, we write

df1[~df1.index.isin(df2.index)]

to call isin with the index of the df1 data frame that are in fd2.inex list.

And then we negate that to get the rows in df1 that aren’t in the df2 data frame.

Categories
Python Answers

How to split dataframe into multiple dataframes with Python Pandas?

To split dataframe into multiple dataframes with Python Pandas, we can use list comprehension with groupby.

For instance, we write

[v for k, v in df.groupby('name')]

to call df.groupby with 'name' to group by the name column and then we get the split data frames from value v with the data frames with the grouped items separated.

Categories
Python Answers

How to do aggregation in Python Pandas?

To do aggregation in Python Pandas, we can use groupby and aggregeation methods.

For instance, we write

df1 = df.groupby(['A', 'B'], as_index=False)['C'].sum()

to get the sums of column A and B values in column C by call groupby to group the values in the columns and then call sum to sum up the grouped values.

We can also use agg after groupby to do aggregation.

For instance, we write

df5 = df.groupby(['A', 'B']).agg(['mean','sum'])

to call groupby to do the same grouping and call agg to return the mean and sum.