Categories
Python Answers

How to get statistics for each group using Python Pandas GroupBy?

To get statistics for each group using Python Pandas GroupBy, we can call the size method.

For instance, we write

df.groupby(['col1', 'col2']).size().reset_index(name='counts')

to call groupby with an array of columns.

Then we call size to get the row counts.

And then we call reset_index to return the values in a data a frame in the 'counts' column.

Categories
Python Answers

How to set value for particular cell in Python Pandas DataFrame using index?

To set value for particular cell in Python Pandas DataFrame using index, we can use the set_value method.

For instance, we write

df.set_value('C', 'x', 10)

to set the value of the 'C‘ column in 'x' row of the df data frame to 10.

Categories
Python Answers

How to filter Python Pandas dataframe using ‘in’ and ‘not in’ like in SQL?

To filter Python Pandas dataframe using ‘in’ and ‘not in’ like in SQL, we call the isin method.

For instance, we write

df[df.country.isin(countries_to_keep)]

to call df.country.isin to get the rows that has the country column set to the values in the countries_to_keep list.

We can negate isin with ~, so we can write

df[~df.country.isin(countries_to_keep)]

to call df.country.isin to get the rows that has the country column that aren’t set to the values in the countries_to_keep list.

Categories
Python Answers

How to shuffle python Pandas DataFrame rows?

To shuffle python Pandas DataFrame rows, we call the data frame sample method.

For instance, we write

df.sample(frac=1)

to call sample on the df data frame.

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means to return all rows in random order.

Categories
Python Answers

How to count the NaN values in a column in Python Pandas DataFrame?

To count the NaN values in a column in Python Pandas DataFrame, we call isna and sum.

For instance, we write

s = pd.Series([1,2,3, np.nan, np.nan])
count = s.isna().sum()

to create a series witth pd.Series.

The we call isna to return the isna values in the series.

And then we call sum to get the count of them.