Is there a step to use relative frequency instead of step_tokenfilter() in recipes
I'm building a regression model using this great approach by Emil Hvitfeldt and Julia Silge in R (https://smltar.com/mlregression#fnref7) and I was wondering if it could be possible to use relative frequency instead of absolute in the preprocessing steps step_tokenfilter()
. I looked into it but couldn't find the function.
Here is my code for now, using tf-idf instead on the 1000 most frequent tokens.
data_rec <- recipe(year ~ sentence_lemma, data = data_train) %>%
step_tokenize(sentence_lemma) %>%
step_stopwords(sentence_lemma, custom_stopword_source = stopwords_list) %>%
step_tokenfilter(sentence_lemma, max_tokens = 1e3) %>%
step_tfidf(sentence_lemma) %>%
step_normalize(all_predictors())
Thanks in advance for any help ;)
Comments
Post a Comment