How to extract row counts based on multiple descriptors in a columns from csv and then export output a new csv using bash/python script?
I am working with a csv file (100s of rows) containing data as follows. I would like to get counts per each gene for each element in csv/tab format.
Input
Gene Element
---------- ----------
STBZIP1 G-box
STBZIP1 G-box
STBZIP1 MYC
STBZIP1 MYC
STBZIP1 MYC
STBZIP10 MYC
STBZIP10 MYC
STBZIP10 MYC
STBZIP10 G-box
STBZIP10 G-box
STBZIP10 G-box
STBZIP10 G-box
Expected output
Gene G-Box MYC
---------- ------- -----
STBZIP1 2 3
STBZIP10 4 3
Can someone please help me to come up with a bash script (or python) in this regard?
Update
I am trying the following and stuck for the time being :| ;
import pandas as pd
df = pd.read_csv("Promoter_Element_Distribution.csv")
print (df)
df.groupby(['Gene', 'Element']).size().unstack(fill_value=0)
from Recent Questions - Stack Overflow https://ift.tt/2G85iIY
https://ift.tt/eA8V8J
Comments
Post a Comment