-
Notifications
You must be signed in to change notification settings - Fork 12
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Currently we use spacy for convert token classification datasets (more precisely NER datasets) to convert a sequence of BIO tags into spans in order to prompt the LLM in natural language.
Goal: Write own conversion function for this from BIO tags -> spans and spans -> BIO tags by search substrings in the text. It is important to keep the tokenization of the original dataset which is currently an issue. Additionally, we remove the entire spacy dependency.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request