Skip to content

Nonpharma diacritics on Sqlite database are broken #242

@zdavatz

Description

@zdavatz

Report by https://github.com/hx-paco

Nonpharma_db SQLite database is configured for UTF8, but the files downloaded to populate the database are using ANSI which makes diacritics looks like '�'.

For example:

BIOLIGO Selenium L�s pr�paration comptoir 250 ml

Image

Maybe it's necessary to convert the description string to utf-8 before filling the database? Using something like this maybe? (sorry my c++ is very rusty :P )


#include <string>
#include <codecvt>

std::string to_utf8(std::string str, std::locale loc = std::locale{}) {
    using wcvt = std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t>;
    std::u32string wstr(str.size(), U'\0');
    std::use_facet<std::ctype<char32_t>>(loc).widen(str.data(), str.data() + str.size(), &wstr[0]);
    return wcvt{}.to_bytes(wstr.data(),wstr.data() + wstr.size());
}

Vielen Dank & Gruss
See: zdavatz/oddb2xml#100

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions