feature: handle multiple alphabets #543

adamdecaf · 2024-03-20T14:55:49Z

Slack: https://moov-io.slack.com/archives/CFUCEBGH2/p1710500854485369

I have some results with curl 'http://localhost:8084/search?q=wiam+wahhab' and with curl 'http://localhost:8084/search?q=الخليلي+سيف.
It's the same person and even if results aren't the same, it means that you manage another alphabets.

The first link is a study about the phonetisation logic of the Arab language and the second is just a table of the different writing of the english phonetisation.
https://ccc.inaoep.mx/~villasen/bib/reglas%20de%20fonetizacion%20Arabe.pdf
http://www.aurint.de/phonetic_transcription.htm
The goal is not to have a 100% trusted translation, it's impossible with phonetisation transcription. But lucky we are, there is a Jaro Winkler passing.
The majority of the lists datas are in latin. So it would be too big I suppose to transcribe persons BUT if we do only once a big transcription all over the lists datas to have different alphabets phonetisation transcription for all it wouldn't be to big.
The execution way would be :
get the lists datas
transcribe to different alphabets
STORE the transcriptions into the database as table "arabic", "latin", "mandarin" etc and mark if it's the originals datas or a transcription
get the person to check
get the alphabet/language of the person datas (you already do that with the package "stopwords") research only in tables of the same alphabet AND get down the score minimum if the table alphabet isn't the original one from the list
Of course it will be a lot of work to transcribe into all the alphabets AND all alphabets can have different phonetisations (like english vs french). But after a lot of thinking and research it came to me that it's the best solution without being too big or with less trust.

Projects:

Arabic Phonetic Mapping Algorithm.pdf
Arabic Phonetization .pdf

https://chat.openai.com/share/b4c607ca-3ab2-4e5e-97ac-870357424fdc

Related: #150

The text was updated successfully, but these errors were encountered:

adamdecaf added bug Something isn't working enhancement New feature or request labels Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: handle multiple alphabets #543

feature: handle multiple alphabets #543

adamdecaf commented Mar 20, 2024 •

edited

Loading

feature: handle multiple alphabets #543

feature: handle multiple alphabets #543

Comments

adamdecaf commented Mar 20, 2024 • edited Loading

adamdecaf commented Mar 20, 2024 •

edited

Loading