Crates.io | IndicScriptSwap |
lib.rs | IndicScriptSwap |
version | 0.6.0 |
source | src |
created_at | 2023-04-25 20:43:54.10255 |
updated_at | 2023-05-23 20:07:17.879625 |
description | This is a tool to help transliterate between various indic scripts. It is not ready yet and has many issues. If you encounter any issues, please contact me (https://github.com/mssrprad/transliterate-ferris/tree/cli or pradyumna.malladi2003@gmail.com) |
homepage | https://github.com/mssrprad/transliterate-ferris/tree/cli |
repository | https://github.com/mssrprad/transliterate-ferris/tree/cli |
max_upload_size | |
id | 848803 |
size | 3,761,905 |
IMP: Don't use rust-analyzer if you're working on the code. It has crashed my pc multiple times on both vscode and nvim. I suspect the data.rs is causing this issue.
INSTALLATION:
cargo install IndicScriptSwap
USAGE:
IndicScriptSwap --source devanagari --destination telugu --conversion i2i --input data.txt --output out.txt
Note: Piping is also supported for input and output:
cat data.txt | IndicScriptSwap --source devanagari --destination telugu --conversion i2i
Rust Library to transliterate between scripts.
serde
and json
dependencies and reading from data.rs would improve the build time but it made it much much worse.aksharamukha-python
on the same conversion. (Can be made even better)Despite these issues it seems sufficient for basic conversion requirement.
git clone https://github.com/MSSRPRAD/ transliterate-ferris.git
cd transliterate-ferris
time cargo run --release 1>out.txt 2>error.txt
out.txt
contains the output.
error.txt
contains the warnings and error messages emmitted by the program.
main.rs
contains a sort of 'demo'. The input text is given by:
let input: String = "अस्त्य् उत्तरस्यां दिशि देवतात्मा हिमालयो नाम नगाधिराजः । पूर्वापरौ तोयनिधी विगाह्य स्थितः पृथिव्या इव मानदण्डः ॥" .to_string();
You can change it to try different test cases.
The output generated for the above input text was:
"asty uttarasyAM diSi devatAtmA himAlayo nAma nagADirAjaH .\n pUrvAparO toyaniDI vigAhya sTitaH pfTivyA iva mAnadaRqaH .."
Many more transliterations between scripts not mentioned here can be achieved. Do try them all! (See src/data.rs for the various script names)
source key | destination key | function |
---|---|---|
devanagari | slp1 | convert_indic_to_roman() |
devanagari | iast | convert_indic_to_roman() |
devanagari | hk | convert_indic_to_roman() |
telugu | slp1 | convert_indic_to_roman() |
telugu | iast | convert_indic_to_roman() |
telugu | hk | convert_indic_to_roman() |
kannada | slp1 | convert_indic_to_roman() |
kannada | iast | convert_indic_to_roman() |
kannada | hk | convert_indic_to_roman() |
slp1 | devanagari | convert_roman_to_indic() |
slp1 | telugu | convert_roman_to_indic() |
slp1 | kannada | convert_roman_to_indic() |
slp1 | itrans | convert_roman_to_roman() |
iast | slp1 | convert_roman_to_roman() |
slp1 | iast | convert_roman_to_roman() |
iast | slp1 | convert_roman_to_roman() |
iast | itrans | convert_roman_to_roman() |
iast | hk | convert_roman_to_roman() |
devanagari | telugu | convert_roman_to_roman() |
telugu | devanagari | convert_roman_to_roman() |
devanagari | kannada | convert_roman_to_roman() |
kannada | devanagari | convert_roman_to_roman() |
kannada | telugu | convert_roman_to_roman() |
telugu | kannada | convert_roman_to_roman() |
Primary Focus was transliterating between devanagari to slp1 as many programs require input in slp1 and rust has no transliteration library from devanagari -> slp1 yet. It turned out making the convert_roman_to_indic() and convert_roman_to_roman() was not much different from this so I did that too.
LatinScripts = ['IASTLOC', 'RomanColloquial', 'ISOPali', 'RomanKana', 'BarahaNorth', 'BarahaSouth', 'Mongolian', 'SLP1', 'Wancho', 'Mro', 'IASTPali', 'HanifiRohingya','Ariyaka', 'RomanReadable', 'Aksharaa', 'WarangCiti', 'SoraSompeng','WX-kok','Avestan','ISO','IAST','HK','Titus','Itrans','Velthuis','WX','Inter','IPA','TolongSiki','Santali','RussianCyrillic']