2020-10-28

How to extract row counts based on multiple descriptors in a columns from csv and then export output a new csv using bash/python script?

I am working with a csv file (100s of rows) containing data as follows. I would like to get counts per each gene for each element in csv/tab format.

Input

    Gene     Element   
 ---------- ---------- 
  STBZIP1    G-box     
  STBZIP1    G-box     
  STBZIP1    MYC       
  STBZIP1    MYC       
  STBZIP1    MYC       
  STBZIP10   MYC       
  STBZIP10   MYC       
  STBZIP10   MYC       
  STBZIP10   G-box     
  STBZIP10   G-box     
  STBZIP10   G-box     
  STBZIP10   G-box     

Expected output

    Gene     G-Box   MYC  
 ---------- ------- ----- 
  STBZIP1        2     3  
  STBZIP10       4     3  

Can someone please help me to come up with a bash script (or python) in this regard?

Update

I am trying the following and stuck for the time being :| ;

import pandas as pd
df = pd.read_csv("Promoter_Element_Distribution.csv")
print (df)
df.groupby(['Gene', 'Element']).size().unstack(fill_value=0)


from Recent Questions - Stack Overflow https://ift.tt/2G85iIY
https://ift.tt/eA8V8J

No comments:

Post a Comment