Crates.io | lindera-py |
lib.rs | lindera-py |
version | |
source | src |
created_at | 2024-12-06 01:34:18.926525 |
updated_at | 2024-12-06 05:37:41.148227 |
description | Python binding for Lindera. |
homepage | https://github.com/lindera-morphology/lindera-py |
repository | https://github.com/lindera-morphology/lindera-py |
max_upload_size | |
id | 1473798 |
Cargo.toml error: | TOML parse error at line 17, column 1 | 17 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
Python binding for Lindera, a Japanese morphological analysis engine.
# Install Python
% pyenv install 3.12.3
# Clone lindera-py project repository
% git clone git@github.com:lindera/lindera-py.git
% cd lindera-py
# Set Python version for this project
% pyenv local 3.12.3
# Make Python virtual environment
% python -m venv .venv
# Activate Python virtual environment
% source .venv/bin/activate
# Initialize lindera-py project
(.venv) % make init
This command takes a long time because it builds a library that includes all the dictionaries.
(.venv) % make maturin-develop
from lindera_py import Segmenter, Tokenizer, load_dictionary
def main():
# load the dictionary
dictionary = load_dictionary("ipadic")
# create a segmenter
segmenter = Segmenter("normal", dictionary)
# create a tokenizer
tokenizer = Tokenizer(segmenter)
text = "関西国際空港限定トートバッグを東京スカイツリーの最寄り駅であるとうきょうスカイツリー駅で買う"
print(f"text: {text}\n")
# tokenize the text
tokens = tokenizer.tokenize(text)
for token in tokens:
print(token.text)
if __name__ == "__main__":
main()