Categories
Python Answers

How to create new column based on values from other columns or apply a function of multiple columns, row-wise with Python Pandas?

To create new column based on values from other columns or apply a function of multiple columns, row-wise with Python Pandas, we can use the apply method.

For instance, we write

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

to call apply on data frame df with a lambda function that calls function f on the col_1 and col_2 column values.

And then we assign the returned values to column col_3.

Categories
Python Answers

How to apply a function to two columns of Python Pandas dataframe?

To apply a function to two columns of Python Pandas dataframe, we can use the apply method.

For instance, we write

df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)

to call apply on data frame df with a lambda function that calls function f on the col_1 and col_2 column values.

And then we assign the returned values to column col_3.

Categories
Python Answers

How to convert Python dict into a Pandas dataframe?

To convert Python dict into a Pandas dataframe, we can call DataFrame with the dictuionary.

For instance, we write

dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
pd.DataFrame([dict_])

to call DataFrame with the dict_ dictionary in a list to create the data frame.

Categories
Python Answers

How to delete DataFrame row in Pandas based on column value?

To delete DataFrame row in Pandas based on column value, we put the condition of the items to return in the brackets.

For instance, we write

df = df[df.line_race != 0]

to return the df data frame values where the line_race column value isn’t 0 with

df[df.line_race != 0]
Categories
Python Answers

How to apply multiple functions to multiple groupby columns with Python Pandas?

To apply multiple functions to multiple groupby columns with Python Pandas, we can use the groupby and agg methods.

For instance, we write

df.groupby('group').agg(
             a_sum=('a', 'sum'),
             a_mean=('a', 'mean'),
             b_mean=('b', 'mean'),
             c_sum=('c', 'sum'),
             d_range=('d', lambda x: x.max() - x.min())
)

to call agg on the groups returned by groupby with some arguments to computed aggregate values for various columns.

We compute the sum of columns in a, the mean of a and b, the sum of c and the differnce between the max and min columns in d with agg.