Skip to content

Remove spacy dependency #55

@whoisjones

Description

@whoisjones

Currently we use spacy for convert token classification datasets (more precisely NER datasets) to convert a sequence of BIO tags into spans in order to prompt the LLM in natural language.

Goal: Write own conversion function for this from BIO tags -> spans and spans -> BIO tags by search substrings in the text. It is important to keep the tokenization of the original dataset which is currently an issue. Additionally, we remove the entire spacy dependency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions