How to extract row counts based on multiple descriptors in a columns from csv and then export output a new csv using bash/python script?

By Ritesh Sahu - October 28, 2020

I am working with a csv file (100s of rows) containing data as follows. I would like to get counts per each gene for each element in csv/tab format.

Input

    Gene     Element   
 ---------- ---------- 
  STBZIP1    G-box     
  STBZIP1    G-box     
  STBZIP1    MYC       
  STBZIP1    MYC       
  STBZIP1    MYC       
  STBZIP10   MYC       
  STBZIP10   MYC       
  STBZIP10   MYC       
  STBZIP10   G-box     
  STBZIP10   G-box     
  STBZIP10   G-box     
  STBZIP10   G-box

Expected output

    Gene     G-Box   MYC  
 ---------- ------- ----- 
  STBZIP1        2     3  
  STBZIP10       4     3

Can someone please help me to come up with a bash script (or python) in this regard?

Update

I am trying the following and stuck for the time being :| ;

import pandas as pd
df = pd.read_csv("Promoter_Element_Distribution.csv")
print (df)
df.groupby(['Gene', 'Element']).size().unstack(fill_value=0)

from Recent Questions - Stack Overflow https://ift.tt/2G85iIY
https://ift.tt/eA8V8J

Search This Blog

Theprogrammersfirst | A technical portal.

How to extract row counts based on multiple descriptors in a columns from csv and then export output a new csv using bash/python script?

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept