Categories
Python Answers

How to easily share a sample dataframe using df.to_dict() with Python Pandas?

To easily share a sample dataframe using df.to_dict() with Python Pandas, we can split a dataframe with 'split' with calling to_dict.

For instance, we write

import plotly.express as px
df = px.data.iris().head(10)
df.to_dict('split')

to call to_dict on our dataframe with 'split' to return a data frame that has some of the rows that’s in the data frame.

Categories
Python Answers

How to drop rows of a Python Pandas DataFrame whose value in a certain column is NaN?

To drop rows of a Python Pandas DataFrame whose value in a certain column is NaN, we call the notna method.

For instance, we write

df = df[df['EPS'].notna()]

to filter out the rows with 'EPS' column value that’s NaN by calling thw notna method on the column.

Then we get the filtered rows with df[df['EPS'].notna()]

Categories
Python Answers

How to do fuzzy match merge with Python Pandas?

To do fuzzy match merge with Python Pandas, we can use the fuzzymatcher library.

To install it, we run

pip install fuzzymatcher 

Then we use it by writing

from fuzzymatcher import link_table, fuzzy_left_join

df1 = pd.DataFrame({'Col1':['Microsoft', 'Google', 'Amazon', 'IBM']})
df2 = pd.DataFrame({'Col2':['Mcrsoft', 'gogle', 'Amason', 'BIM']})

left_on = ["Col1", "Col2"]
right_on = ["Col2", "Col2"]

fuzzymatcher.link_table(df1, df2, left_on, right_on)

to create 2 dataframes df1 and df2.

Then we call the fuzzymatcher.link_table method to merge df1 and df2 on the columns listed in left_on and right_on.

Categories
Python Answers

How to select DataFrame rows between two dates with Python Pandas?

To select DataFrame rows between two dates with Python Pandas, we can use a boolean mask.

For instance, we write

df['date'] = pd.to_datetime(df['date'])  
mask = (df['date'] > start_date) & (df['date'] <= end_date)

to convert the 'date' column entries to datetime64 with pd.to_datetimr.

Then we create the mask with (df['date'] > start_date) & (df['date'] <= end_date).

And then we get the filtered rows between start_date and end_date with

df.loc[mask]
Categories
Python Answers

How to do conditional replace with Python Pandas?

To do conditional replace with Python Pandas, we can return the items we want to replace and then replace them.

For instance, we write

mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0

to replace them items returned by mask, which is the rows with my_channel column values that are bigger than 20000.

Then we use

df.loc[mask, column_name] = 0

to replace the returned rows and columns with .