pandas vectorized lookup without depreciated loockup()

By Ritesh Sahu - June 27, 2022

My problem concerns lookup(), which is to be depreciated. So I'm looking for an alternative. Documentation suggests using loc() (which does not seem to work with a vectorized approach) or melt() (which seems quite convoluted). Furthermore, the documentation suggests factorize() which (I think) does not work for my setup.

Here is the problem: I have a 2-column DataFrame with x,y-values.

k = 20
y = random.choices(range(1,4),k=k)
x = random.choices(range(1,7),k=k)
tuples = list(zip(x,y))
df = pd.DataFrame(tuples, columns=["x", "y"])
df

And I have several DataFrames in crosstab-format of df. For example one called Cij:

Concordance table (Cij):
x     1     2     3    4     5     6  RTotal
y                                           
1   16     15    13  NaN     5   NaN     108
2   NaN    12   NaN   15   NaN   NaN      87
3   NaN   NaN     6  NaN    13    14     121

I now want to perform a vectorized lookup in Cij from xy-pairs in df to generate a new column CrC in df. Which so far looked like this (plain and simple):

df["Crc"] = Cij.lookup(df["y"],df["x"])

How can I achieve the same thing without lookup()? Or did I just not understand the suggested alternatives?

Thanks in advance!

Addendum: Working code example as requested.

data = [[1,1],[1,1],[1,2],[1,2],[1,2],[1,3],[1,3],[1,5],[2,2],[2,4],[2,4],[2,4],[2,4],[2,4],[3,3],[3,3],[3,5],[3,5],[3,5],[3,6],[3,6],[3,6],[3,6],[3,6]]
df = pd.DataFrame(data, columns=["y", "x"])

# crosstab of df
ct_a = pd.crosstab(df["y"], df["x"])
Cij = pd.DataFrame([], index=ct_a.index, columns=ct_a.columns) #one of several dfs in ct_a layout

#row-wise, than column-wise filling of Cij
for i in range(ct_a.shape[0]):           
  for j in range(ct_a.shape[1]):
    if ct_a.iloc[i,j] != 0:
      Cij.iloc[i,j]= ct_a.iloc[i+1:,j+1:].sum().sum()+ct_a.iloc[:i,:j].sum().sum()

#vectorized lookup, to be substituted with future-proof method
df["Crc"] = Cij.lookup(df["y"],df["x"])

Also, loop-based "filling" of Cij is fine, since crosstabs of df are always small. However, df itself can be very large so vectorized lookup is a necessity.

Search This Blog

Theprogrammersfirst | A technical portal.

pandas vectorized lookup without depreciated loockup()

Comments

Post a Comment

Popular posts from this blog

Spring Elasticsearch Operations

Hibernate Search - Elasticsearch with JSON manipulation

Today Walkin 14th-Sept