-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
Milestone
Description
Is your feature request related to a problem? Please describe.
I want to use a multilingual model from Huggingface ( https://huggingface.co/intfloat/multilingual-e5-large ) and the tokenizer is a sentencepiece unigram tokenizer, so I am unable to port it to C#/ONNX
Describe the solution you'd like
Support for the unigram sentencepiece tokenizer in the Microsoft.ML.Tokenizers
package.
Describe alternatives you've considered
Blingfire, but seems not maintained anymore and unclear if it would return exactly the same token-id's.
Thank you for your time and effort (the library in general is great!)
luisquintanilla, gonsss, KasperNissen1997, saddam213 and marosbeno