2022-06-22

Using numpy.where to calculate new pandas column, with multiple conditions

I have a problem with regards as to how to appropriately code this condition. I'm currently creating a new pandas column in my dataframe, new_column, which performs a subtraction on the values in column test, based on what index of the data we are at. I'm currently using this code to get it to subtract a different value every 4 times:

subtraction_value = 3
subtraction_value = 6

data = pd.DataFrame({"test":[12, 4, 5, 4, 1, 3, 2, 5, 10, 9]} 


data['new_column'] = np.where(data.index%4,
                              data['test']-subtraction_value,
                              data['test']-subtraction_value_2)
print (data['new_column']


[6,1,2,1,-5,0,-1,3,4,6]

However, I now wish to get it performing the higher subtraction on the first two positions in the column, and then 3 subtractions with the original value, another two with the higher subtraction value, 3 small subtractions, and so forth. I thought I could do it this way, with an | condition in my np.where statement:

data['new_column'] = np.where((data.index%4) | (data.index%5),
                              data['test']-subtraction_value,
                              data['test']-subtraction_value_2)

However, this didn't work, and I feel my maths may be slightly off. My desired output would look like this:

print(data['new_column'])

[6,-2,2,1,-2,-3,-4,3,7,6])

As you can see, this slightly shifts the pattern. Can I still use numpy.where() here, or do I have to take a new approach? Any help would be greatly appreciated!



No comments:

Post a Comment