pandas vectorized lookup without depreciated loockup()
My problem concerns lookup()
, which is to be depreciated. So I'm looking for an alternative. Documentation suggests using loc()
(which does not seem to work with a vectorized approach) or melt()
(which seems quite convoluted). Furthermore, the documentation suggests factorize()
which (I think) does not work for my setup.
Here is the problem: I have a 2-column DataFrame with x,y-values.
k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df
And I have several DataFrames in crosstab-format of df
. For example one called Cij
:
Concordance table (Cij):
x 1 2 3 4 5 6 RTotal
y
1 16 15 13 NaN 5 NaN 108
2 NaN 12 NaN 15 NaN NaN 87
3 NaN NaN 6 NaN 13 14 121
I now want to perform a vectorized lookup in Cij
from xy-pairs in df
to generate a new column CrC
in df
. Which so far looked like this (plain and simple):
df["Crc"] = Cij.lookup(df["y"],df["x"])
How can I achieve the same thing without lookup()
? Or did I just not understand the suggested alternatives?
Thanks in advance!
Addendum: Working code example as requested.
data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])
# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout
#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):
for j in range(ct_a.shape[1]):
if ct_a.iloc[i,j] != 0:
Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()
#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])
Also, loop-based "filling" of Cij
is fine, since crosstabs of df
are always small. However, df
itself can be very large so vectorized lookup is a necessity.
Comments
Post a Comment