You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #17 we discussed how to support or include stemming within tidytext and decided against it since these approaches are quite diverse and work already with a tidy data principles approach. I see that is already true of your project:
library(tidyverse)
library(tidytext)
library(abbrevTexts)
tidy_p_and_p<-
tibble(txt=janeaustenr::prideprejudice) %>%
unnest_tokens(word, txt)
p_and_p_dict<-
makeAbbrStemDict(
term.vec=tidy_p_and_p$word,
min.len=3,
min.share=.6
)
tidy_p_and_p %>%
left_join(p_and_p_dict, by= c("word"="parent")) %>%
mutate(word= coalesce(terminal.child, word)) %>%
anti_join(get_stopwords()) %>%
count(word, sort=TRUE)
#> Joining, by = "word"#> # A tibble: 4,940 × 2#> word n#> <chr> <int>#> 1 mr 785#> 2 elizabeth 635#> 3 darcy 417#> 4 said 401#> 5 though 344#> 6 mrs 343#> 7 ever 334#> 8 much 327#> 9 bennet 323#> 10 bingley 306#> # … with 4,930 more rows## to comparetidy_p_and_p %>%
anti_join(get_stopwords()) %>%
count(word, sort=TRUE)
#> Joining, by = "word"#> # A tibble: 6,404 × 2#> word n#> <chr> <int>#> 1 mr 785#> 2 elizabeth 597#> 3 said 401#> 4 darcy 373#> 5 mrs 343#> 6 much 326#> 7 must 305#> 8 bennet 294#> 9 miss 283#> 10 jane 264#> # … with 6,394 more rows
Hi Julia! I'll be happy if my algorithm of autostemming become of part of tidytext package!
https://github.com/edvardoss/abbrevTexts
The text was updated successfully, but these errors were encountered: