Crates.io | tanaka |
lib.rs | tanaka |
version | 0.1.0 |
source | src |
created_at | 2023-12-06 23:44:10.592762 |
updated_at | 2023-12-06 23:44:10.592762 |
description | A Rust interface the Tanaka Corpus of parallel Japanese-English sentences |
homepage | https://gitlab.com/johngavingraham/tanaka |
repository | https://gitlab.com/johngavingraham/tanaka.git |
max_upload_size | |
id | 1060527 |
size | 7,038,890 |
A Rust interface to the Tanaka Corpus of parallel Japanese-English sentences.
The standard corpus is included - simply call examples() (or examples_subset()). These take up a few megabytes in the library - they can be excluded by disabling their respective feature flags.
# use tanaka::Corpus;
let corpus = Corpus::examples();
println!("{:?}", corpus.examples[0]);
Otherwise, load the version of the corpus you want into a string, and parse it:
# use tanaka::Corpus;
let text = "A: 彼は忙しいですか。 Is he busy?#ID=303692_100005\n\
B: 彼(かれ)[01] は 忙しい(いそがしい) ですか";
let corpus = Corpus::parse(text).unwrap();
println!("{:?}", corpus.examples[0]);