.apply gave me strange results when df.iterrows worked fine?
I am using the apply function below to perform regex-related operations and store the result in email_account_letter_or_number_only
column. Basically, if the email_account
is all numbers or if it contains no letters, then just store email_account
for email_account_letter_or_number_only
. Else, perform re.sub
to only keep letters (in other words, remove all numbers and special characters) for email_account_letter_or_number_only
.
concised_df['email_account_letter_or_number_only'] = concised_df['email_account'].apply(lambda x: concised_df['email_account'] if (str(concised_df['email_account']).isdigit() or not bool(re.search('[a-zA-Z]', str(concised_df['email_account'])))) else re.sub('[^A-Za-z]+', '', str(concised_df['email_account'])))
However, I am getting some really weird results as shown below:
email_account | email_account_letter_or_number_only |
---|---|
0018889 | zzaninazzhanargulyazzkatezzxpianozzzacmeNameem... |
nacho.taro | zzaninazzhanargulyazzkatezzxpianozzzacmeNameem... |
nachth45678 | zzaninazzhanargulyazzkatezzxpianozzzacmeNameem... |
nacikita | zzaninazzhanargulyazzkatezzxpianozzzacmeNameem... |
nacia_art | zzaninazzhanargulyazzkatezzxpianozzzacmeNameem... |
Basically, every row in email_account_letter_or_number_only
got assigned the same weird string.
However, when I used the df.iterrows()
method below, the result was correct as expected:
for index, row in concised_df.iterrows():
if str(row['email_account']).isdigit() or not bool(re.search('[a-zA-Z]', str(row['email_account']))):
concised_df.at[index, 'email_account_letter_or_number_only'] = row['email_account']
else:
concised_df.at[index, 'email_account_letter_or_number_only'] = re.sub('[^A-Za-z]+', '', row['email_account'])
email_account | email_account_letter_or_number_only |
---|---|
0018889 | 0018889 |
nacho.taro | nachotaro |
nachth45678 | nachth |
nacikita | nacikita |
nacia_art | naciaart |
Note that there's a small nuance in the last parameter inside re.sub()
operation between the iterrows
and .apply
method, because with the .apply
method, if I don't wrap transform the row['email_account']
parameter into str(row['email_account'])
, I would get the error TypeError: expected string or bytes-like object
. However, in the iterrows
method, it ran fine without wrap that parameter with str()
. I am not sure if that's related to the strange result I got with .apply
, but just wanted to call it out.
Could anyone please advise on what I might've done wrong in the .apply
function?
from Recent Questions - Stack Overflow https://ift.tt/3nSJcdX
https://ift.tt/eA8V8J
Comments
Post a Comment