How to extract all the cells between two other cells in Pandas from the column with most cells between these two other cells?
I have multiple financial statements and not all of them have the same entries - some have more entries than others - and I'd like to consolidate them all into a single one that has all of the entries.
I was able to do it manually in Excel since it's not that many, but I'd like to have the computer do it to double check I got them all.
So, here's what I did: I created a dataframe where each column has the entry names from one of the financial statements.
FinancialStatement1 | FinancialStatement2 | FinancialStatement3 |
---|---|---|
REVENUES | REVENUES | REVENUES |
Revenue1 | Revenue1 | Revenue1 |
Revenue2 | Revenue2 | Revenue2 |
EXPENSES | EXPENSES | Revenue3 |
Expense1 | Expense1 | EXPENSES |
Expense2 | PROFIT | Expense1 |
Expense3 | - | Expense2 |
PROFIT | - | PROFIT |
- | - | - |
My idea was to run a script that would analyse the number of cells between two of the group titles and return to a 'Consolidated' column all of the strings between these two values, including the first, but not the last.
My end result would look like this:
FinancialStatement1 | FinancialStatement2 | FinancialStatement3 | Consolidated |
---|---|---|---|
REVENUES | REVENUES | REVENUES | REVENUES |
Revenue1 | Revenue1 | Revenue1 | Revenue1 |
Revenue2 | Revenue2 | Revenue2 | Revenue2 |
EXPENSES | EXPENSES | Revenue3 | Revenue3 |
Expense1 | Expense1 | EXPENSES | EXPENSES |
Expense2 | PROFIT | Expense1 | Expense1 |
Expense3 | - | Expense2 | Expense2 |
PROFIT | - | PROFIT | Expense3 |
- | - | - | PROFIT |
I'm a beginner in Pandas and, so far, here's what I came up with by searching here in stack:
df = pd.read_excel(file)
df['Consolidated']=0
df.head()
df['Consolidated'].iloc[1] = df['FinancialStatement1'][df['FinancialStatement1'].between(
'REVENUES', 'EXPENSES',
inclusive=False
)].tolist()
However, this code gives me "A value is trying to be set on a copy of a slice from a DataFrame"
. I tried using only df.iloc[3,0] but it also doesn't work. Anyway, this code won't do what I want anyway since it doesn't pick the list with most items between the two group titles.
from Recent Questions - Stack Overflow https://ift.tt/3Eqycg2
https://ift.tt/eA8V8J
Comments
Post a Comment