Adding dates for unique EIDs row-wise to get Years of Service
I have the following dataset:
EID | Company | Start_Date | End_Date | T_F |
---|---|---|---|---|
A111 | ABC | 2015-07-20 | NaT | True |
B111 | DEF | 1983-06-01 | NaT | False |
B111 | ABC | 2017-01-01 | NaT | True |
C111 | GHI | 1980-10-01 | 1981-08-31 | True |
D111 | JKL | 1973-05-01 | 1977-11-30 | True |
E111 | ABC | 2006-04-24 | NaT | True |
F111 | ABC | 1991-06-10 | 1994-12-15 | False |
F111 | MNO | 1994-12-01 | 2002-08-31 | False |
F111 | ABC | 2002-08-01 | NaT | True |
G111 | ABC | 1979-01-01 | NaT | True |
H111 | ABC | 2002-02-01 | NaT | True |
The expected output is as follows:
EID | Company | Start_Date | End_Date | T_F | YoS |
---|---|---|---|---|---|
A111 | ABC | 2015-07-20 | NaT | True | NaN |
B111 | DEF | 1983-06-01 | NaT | False | (2017-01-01) - (1983-06-01) |
B111 | ABC | 2017-01-01 | NaT | True | NaN |
C111 | GHI | 1980-10-01 | 1981-08-31 | True | (1981-08-31) - (1980-10-01) |
D111 | JKL | 1973-05-01 | 1977-11-30 | True | (1977-11-30) - (1973-05-01) |
E111 | ABC | 2006-04-24 | NaT | True | NaN |
F111 | ABC | 1991-06-10 | 1994-12-15 | False | (2002-08-01) - (1991-06-10) |
F111 | MNO | 1994-12-01 | 2002-08-31 | False | NaN |
F111 | ABC | 2002-08-01 | NaT | True | NaN |
G111 | ABC | 1979-01-01 | NaT | True | NaN |
H111 | ABC | 2002-02-01 | NaT | True | NaN |
This is what I am trying to do:
- Where an EID has only one record and company is ABC YoS column should be NULL. End_Date is always blank in these cases.
- Where an EID has multiple records and his/her last record is company ABC then YoS column will be Start date of first company - Start date of ABC company.
- Where an EID has only one record and company is not ABC then YoS will be calculated as End_Date - Start_Date
- Only the first record will have YoS value other records will contain NaN value.
- If an employee has multiple records 99% of the times employees last records will be ABC company.
I tried the following code but this is only half part (or incorrect) I believe:
result.loc[~(result.CLEAN_NAME == 'HONEYWELL / HON') & (result.T_F == False),'Hon_StartDate'] = result['Start_Date']
Any leads would be appreciated. Thanks.!
from Recent Questions - Stack Overflow https://ift.tt/3fJZovU
https://ift.tt/eA8V8J
Comments
Post a Comment