2021-07-29

How can I convert a Pandas DataFrame to a three level nested dictionary?

How can I convert a Pandas DataFrame to a three level nested dictionary using column names?

The columns are not first three columns and I want it to group by column artist then group by column album, and I need it to be case insensitive, preferably without using defaultdict.

This is a minimal reproducible example:

from collections import defaultdict                                               
from itertools import product                                                     
from pandas import DataFrame                                                      
tree = defaultdict(lambda: defaultdict(dict))                                     
columns = {'a': str(), 'b': str(), 'c': str(), 'd': int(), 'e': int(), 'f': int()}
df = DataFrame(columns, index=[])                                                 
for i, j, k in product('abcd', repeat=3):                                         
    tree[i][j][k] = list(map('abcd'.index, (i, j, k)))                            
    df.loc[len(df)] = [i, j, k, *list(map('abcd'.index, (i, j, k)))]              

How can I get a nested dictionary similar to tree from df?

I am really sorry I can provide any actual examples because they wouldn't be minimal.

I tried to use .groupby() but I only ever saw it being used with one column and I really don't know what to do with the pandas.core.groupby.generic.DataFrameGroupBy object it returns, I just started using it today.


Currently I can do this:

tree1 = dict()                                                                                  
for index, row in df.iterrows():                                                                
    if not tree1.get(row['a'].lower()):                                                         
        tree1[row['a'].lower()] = dict()                                                        
    if not tree1[row['a'].lower()].get(row['b'].lower()):                                       
        tree1[row['a'].lower()][row['b'].lower()] = dict()                                      
    tree1[row['a'].lower()][row['b'].lower()][row['c'].lower()] = [row['d'], row['e'], row['f']]

I actually implemented case insensitive str and dict but for the sake of brevity (they are very long) I wouldn't use it here.

But according to this answer https://stackoverflow.com/a/55557758/16383578 such method is bad, what is a better way?



from Recent Questions - Stack Overflow https://ift.tt/3lj2Txp
https://ift.tt/eA8V8J

No comments:

Post a Comment