pandas vectorized lookup without depreciated loockup()
My problem concerns lookup(), which is to be depreciated. So I'm looking for an alternative. Documentation suggests using loc() (which does not seem to work with a vectorized approach) or melt() (which seems quite convoluted). Furthermore, the documentation suggests factorize() which (I think) does not work for my setup.
Here is the problem: I have a 2-column DataFrame with x,y-values.
k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df
And I have several DataFrames in crosstab-format of df. For example one called Cij:
Concordance table (Cij):
x 1 2 3 4 5 6 RTotal
y
1 16 15 13 NaN 5 NaN 108
2 NaN 12 NaN 15 NaN NaN 87
3 NaN NaN 6 NaN 13 14 121
I now want to perform a vectorized lookup in Cij from xy-pairs in df to generate a new column CrC in df. Which so far looked like this (plain and simple):
df["Crc"] = Cij.lookup(df["y"],df["x"])
How can I achieve the same thing without lookup()? Or did I just not understand the suggested alternatives?
Thanks in advance!
Addendum: Working code example as requested.
data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])
# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout
#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):
for j in range(ct_a.shape[1]):
if ct_a.iloc[i,j] != 0:
Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()
#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])
Also, loop-based "filling" of Cij is fine, since crosstabs of df are always small. However, df itself can be very large so vectorized lookup is a necessity.
Comments
Post a Comment