Crates.io | lindera-ko-dic |
lib.rs | lindera-ko-dic |
version | |
source | src |
created_at | 2022-03-07 16:45:56.330426 |
updated_at | 2024-11-30 13:41:33.571508 |
description | A Japanese morphological dictionary for ko-dic. |
homepage | https://github.com/lindera-morphology/lindera |
repository | https://github.com/lindera-morphology/lindera |
max_upload_size | 52428800 |
id | 545109 |
Cargo.toml error: | TOML parse error at line 17, column 1 | 17 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
This repository contains mecab-ko-dic.
Information about the dictionary format and part-of-speech tags used by mecab-ko-dic id documented in this Google Spreadsheet, linked to from mecab-ko-dic's repository readme.
Note how ko-dic has one less feature column than NAIST JDIC, and has an altogether different set of information (e.g. doesn't provide the "original form" of the word).
The tags are a slight modification of those specified by 세종 (Sejong), whatever that is. The mappings from Sejong to mecab-ko-dic's tag names are given in tab 태그 v2.0
on the above-linked spreadsheet.
The dictionary format is specified fully (in Korean) in tab 사전 형식 v2.0
of the spreadsheet. Any blank values default to *
.
Index | Name (Korean) | Name (English) | Notes |
---|---|---|---|
0 | 표면 | Surface | |
1 | 왼쪽 문맥 ID | Left context ID | |
2 | 오른쪽 문맥 ID | Right context ID | |
3 | 비용 | Cost | |
4 | 품사 태그 | part-of-speech tag | See 태그 v2.0 tab on spreadsheet |
5 | 의미 부류 | meaning | (too few examples for me to be sure) |
6 | 종성 유무 | presence or absence | T for true; F for false; else * |
7 | 읽기 | reading | usually matches surface, but may differ for foreign words e.g. Chinese character words |
8 | 타입 | type | One of: Inflect (활용); Compound (복합명사); or Preanalysis (기분석) |
9 | 첫번째 품사 | first part-of-speech | e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return VV |
10 | 마지막 품사 | last part-of-speech | e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return EP |
11 | 표현 | expression | 활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드 – Fields that tell how usage, compound nouns, and key analysis are organized |
Index | Name (Japanese) | Name (English) | Notes |
---|---|---|---|
0 | 표면 | Surface | |
1 | 품사 태그 | part-of-speech tag | See 태그 v2.0 tab on spreadsheet |
2 | 읽기 | reading | usually matches surface, but may differ for foreign words e.g. Chinese character words |
Index | Name (Korean) | Name (English) | Notes |
---|---|---|---|
0 | 표면 | Surface | |
1 | 왼쪽 문맥 ID | Left context ID | |
2 | 오른쪽 문맥 ID | Right context ID | |
3 | 비용 | Cost | |
4 | 품사 태그 | part-of-speech tag | See 태그 v2.0 tab on spreadsheet |
5 | 의미 부류 | meaning | (too few examples for me to be sure) |
6 | 종성 유무 | presence or absence | T for true; F for false; else * |
7 | 읽기 | reading | usually matches surface, but may differ for foreign words e.g. Chinese character words |
8 | 타입 | type | One of: Inflect (활용); Compound (복합명사); or Preanalysis (기분석) |
9 | 첫번째 품사 | first part-of-speech | e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return VV |
10 | 마지막 품사 | last part-of-speech | e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return EP |
11 | 표현 | expression | 활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드 – Fields that tell how usage, compound nouns, and key analysis are organized |
12 | - | - | After 12, it can be freely expanded. |
The API reference is available. Please see following URL: