Skip to content

List of useful resources (dictionaries, corpora, papers, models, etc.) to work on NLP for french-based Caribbean creoles and similar creole languages

Notifications You must be signed in to change notification settings

cibeah/nlp-kreyol-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 

Repository files navigation

nlp-kreyol-resources

List of useful resources (dictionaries, corpora, papers, models, etc.) to work on NLP for french-based Caribbean creoles and similar creole languages

Datasets 📂

Ready-to-use

Open source annotated corpora.

Other corpora resources

> Written

  • Tatoeba [link]

    2 123 sentences in Guadeloupean creole with translations for some (English, French).

  • Kréyolad [link]

    All contributions of Jude Duranty to the newspaper Antilla from 2004 to 2023

> Oral

  • CREOLORAL [link] (Not open source)

    Anne Zribi-Hertz, Emmanuel Schang, Herby Glaude

    2012

    3 hours of oral data, spontaneously spoken MQ & GP creole, with transcriptions and french translations. Not open source but is (partially ?) available here (with no annotations ?) [link, 19 audio records]

  • Pawolotek [link]

    Simone Lagrand, "Titak - Panorama sonore du parler martiniquais"

    2022

    45 short audio files in Martinican creole (a few seconds/minutes long) recorded in Martinique.

> Unrealeased

Tracking unreleased corpora.

  • Data for Noun Phrases in mixed Martinican Creole and French: Evidence for an Underspecified Language Model

    Christelle Lengrai, Juliette Moustin, Pascal Vaillant

    Data recorded on radio broadcasts in Martinique in 2005-2006. Transcribed and annotated.

    2016

  • Data for How to Parse a Creole: When Martinican Creole Meets French

    Martinican creole tree bank: 240 fully annotated sentences. Not publicly available.

    2022

Models 📈

Classification

  • Guadeloupean Creole Language Identification Tool [link]

    2020, William Soto

Translation

  • CreoleM2M [link]

    2023, Raj Dabre

    Multilingual translation model built with HuggingFace. Support for 26 creoles including Saint Lucian, Seychellois, Mauritian, Haitian creoles. Online playground available here.

Speech Recognition & Query

  • ASR + Query-by-Example [link]

    2022, Cécile Macaire et al

    Guadeloupean and Mauritian creole. Goal: design linguistic tools for language documentation

Papers 📃

Building Datasets & Corpora

French-based

  • Case Study on Data Collection of Kreol Morisien, a Low-Resourced Creole Language [link]

    David Joshen Bastien, Vijay Prakash Chumroo, Johan Patrice Bastien

    2022, IST-Africa Conference

  • MorisienMT: A Dataset for Mauritian Creole Machine Translation [paper, dataset]

    Raj Dabre, Aneerav Sukhoo

    2022

  • Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen [link]

    Alice Millour, Karën Fort

    2018

Others

  • JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset [link]

    Ruth-Ann Armstrong, John Hewitt, Christopher Manning

    2022

Classification

French-based

  • Language Identification of Guadeloupean Creole [paper, code]

    William Soto

    2020

POS Tagging

  • Krik: First Steps into Crowdsourcing POS tags for Kréyòl Gwadloupéyen [link]

    Alice Millour, Karën Fort

    2018

  • How to Parse a Creole: When Martinican Creole Meets French [link]

    Ludovic Mompelat, Daniel Dakota, Sandra Kübler

    2022

Translation

French-based

  • Kreol Morisien to English and English to Kreol Morisien Translation System using Attention and Transformer Model [link]

    Zaheenah Boodeea, Sameerchand Pudaruth

    2020

Speech Recognition

  • Automatic Speech Recognition and Query By Example for Creole Languages Documentation [link, code]

    Cécile Macaire, Didier Schwab, Benjamin Lecouteux, Emmanuel Schang

    2022

    Guadeloupean & Mauritian Creoles

Others

General

  • On Language Models for Creoles [link]

    Heather Lent, Emanuele Bugliarello, Miryam de Lhoneux, Chen Qiu, Anders Søgaar

    2021, Conference on Computational Natural Language Learning

  • What a Creole Wants, What a Creole Needs [link]

    Heather Lent, Kelechi Ogueji, Miryam de Lhoneux, Orevaoghene Ahia, Anders Søgaard

    2022

  • Ancestor-to-Creole Transfer is Not a Walk in the Park [link]

    Heather Lent, Emanuele Bugliarello, Anders Søgaard

    2022

  • African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation [link]

    Nathaniel Romney Robinson, Nathaniel Romney Robinson, Matthew Dean Stutzman, Stephen D. Richardson, David R Mortensen

    2023

Other

  • Une grammaire formelle du créole martiniquais pour la génération automatique [link]

    Pascal Vaillant

    2003

About

List of useful resources (dictionaries, corpora, papers, models, etc.) to work on NLP for french-based Caribbean creoles and similar creole languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages