Best way to perform large amount of Pandas Joins
I am trying to use two data frames for a simple lookup using Pandas. I have a main master data frame (left) and a lookup data frame (right). I want to left join them on the matching integer code and return the item title
from the item_df
.
I see a slight solution with a key value pair idea but it seems cumbersome. My idea is to merge
the data frames together using col3
and name
as key columns and keep the value
from the right frame that I want which will be title
. Thus I decide to drop
the key
column that I joined on so all I have left is the value
. Now lets say I want to do this several times with my own manual naming conventions. For this I use rename
to rename the value that I merged in. Now I would repeat this merge operation and rename my next join to something like second_title
(see example below).
Is there a less cumbersome way to perform this repeated operation without constantly dropping the extra columns that are merged in and renaming the new column between each merge step?
Example code below:
import pandas as pd
master_dict: dict = {'col1': [3,4,8,10], 'col2': [5,6,9,10], 'col3': [50,55,59,60]}
master_df: pd.DataFrame = pd.DataFrame(master_dict)
item_dict: dict = {'name': [55,59,50,5,6,7], 'title': ['p1','p2','p3','p4','p5','p6']}
item_df: pd.DataFrame = pd.DataFrame(item_dict)
print(master_df.head())
col1 col2 col3
0 3 5 50
1 4 6 55
2 8 9 59
3 10 10 60
print(item_df.head())
name title
0 55 p1
1 59 p2
2 50 p3
3 5 p4
4 6 p5
# merge on col3 and name
combined_df = pd.merge(master_df, item_df, how = 'left', left_on = 'col3', right_on = 'name')
# rename title to "first_title"
combined_df.rename(columns = {'title':'first_title'}, inplace = True)
combined_df.drop(columns = ['name'], inplace = True) # remove 'name' column that was joined in from right frame
# repeat operation for "second_title"
combined_df = pd.merge(combined_df, item_df, how = 'left', left_on = 'col2', right_on = 'name')
combined_df.rename(columns = {'title': 'second_title'}, inplace = True)
combined_df.drop(columns = ['name'], inplace = True)
print(combined_df.head())
col1 col2 col3 first_title second_title
0 3 5 50 p3 p4
1 4 6 55 p1 p5
2 8 9 59 p2 NaN
3 10 10 60 NaN NaN
from Recent Questions - Stack Overflow https://ift.tt/2JiIW8L
https://ift.tt/eA8V8J
Comments
Post a Comment