tokenizers.bpe (0.1.0)

0 users

Byte Pair Encoding Text Tokenization.

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library which is an implementation of fast Byte Pair Encoding (BPE) .

Maintainer: Jan Wijffels
Author(s): Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), [cph], Gregory Popovitch [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), The Abseil Authors [ctb, cph] (Files at src/parallel_hashmap (Apache License, Version 2.0), Ivan Belonogov [ctb, cph] (Files at src/youtokentome (MIT License))

License: MPL-2.0

Uses: Rcpp

Released 10 months ago.



  (0 votes)


  (0 votes)

Log in to vote.


No one has written a review of tokenizers.bpe yet. Want to be the first? Write one now.

Related packages: corpora, gsubfn, kernlab, languageR, lsa, tm, wordnet, zipfR, RWeka, RKEA, openNLP, skmeans, tau, tm.plugin.mail, lda, textcat, topicmodels, tm.plugin.dc, textir, movMF(20 best matches, based on common tags.)

Search for tokenizers.bpe on google, google scholar, r-help, r-devel.

Visit tokenizers.bpe on R Graphical Manual.