2020-12-28

.apply gave me strange results when df.iterrows worked fine?

I am using the apply function below to perform regex-related operations and store the result in email_account_letter_or_number_only column. Basically, if the email_account is all numbers or if it contains no letters, then just store email_account for email_account_letter_or_number_only. Else, perform re.sub to only keep letters (in other words, remove all numbers and special characters) for email_account_letter_or_number_only.

concised_df['email_account_letter_or_number_only'] = concised_df['email_account'].apply(lambda x: concised_df['email_account'] if (str(concised_df['email_account']).isdigit() or not bool(re.search('[a-zA-Z]', str(concised_df['email_account'])))) else re.sub('[^A-Za-z]+', '', str(concised_df['email_account'])))

However, I am getting some really weird results as shown below:

email_account email_account_letter_or_number_only
0018889 zzaninazzhanargulyazzkatezzxpianozzzacmeNameem...
nacho.taro zzaninazzhanargulyazzkatezzxpianozzzacmeNameem...
nachth45678 zzaninazzhanargulyazzkatezzxpianozzzacmeNameem...
nacikita zzaninazzhanargulyazzkatezzxpianozzzacmeNameem...
nacia_art zzaninazzhanargulyazzkatezzxpianozzzacmeNameem...

Basically, every row in email_account_letter_or_number_only got assigned the same weird string.

However, when I used the df.iterrows() method below, the result was correct as expected:

for index, row in concised_df.iterrows():
    if str(row['email_account']).isdigit() or not bool(re.search('[a-zA-Z]', str(row['email_account']))):
        concised_df.at[index, 'email_account_letter_or_number_only'] = row['email_account']
    else:
        concised_df.at[index, 'email_account_letter_or_number_only'] = re.sub('[^A-Za-z]+', '', row['email_account'])
email_account email_account_letter_or_number_only
0018889 0018889
nacho.taro nachotaro
nachth45678 nachth
nacikita nacikita
nacia_art naciaart

Note that there's a small nuance in the last parameter inside re.sub() operation between the iterrows and .apply method, because with the .apply method, if I don't wrap transform the row['email_account'] parameter into str(row['email_account']), I would get the error TypeError: expected string or bytes-like object. However, in the iterrows method, it ran fine without wrap that parameter with str(). I am not sure if that's related to the strange result I got with .apply, but just wanted to call it out.

Could anyone please advise on what I might've done wrong in the .apply function?



from Recent Questions - Stack Overflow https://ift.tt/3nSJcdX
https://ift.tt/eA8V8J

No comments:

Post a Comment