lindera-ko-dic

Crates.io	lindera-ko-dic
lib.rs	lindera-ko-dic
version
source	src
created_at	2022-03-07 16:45:56.330426
updated_at	2024-11-30 13:41:33.571508
description	A Japanese morphological dictionary for ko-dic.
homepage	https://github.com/lindera-morphology/lindera
repository	https://github.com/lindera-morphology/lindera
max_upload_size	52428800
id	545109
Cargo.toml error:	TOML parse error at line 17, column 1 \| 17 \| autolib = false \| ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size	0

Minoru Osuka (mosuka)

documentation

https://docs.rs/lindera-ko-dic

README

Lindera ko-dic

Dictionary version

This repository contains mecab-ko-dic.

Dictionary format

Information about the dictionary format and part-of-speech tags used by mecab-ko-dic id documented in this Google Spreadsheet, linked to from mecab-ko-dic's repository readme.

Note how ko-dic has one less feature column than NAIST JDIC, and has an altogether different set of information (e.g. doesn't provide the "original form" of the word).

The tags are a slight modification of those specified by 세종 (Sejong), whatever that is. The mappings from Sejong to mecab-ko-dic's tag names are given in tab 태그 v2.0 on the above-linked spreadsheet.

The dictionary format is specified fully (in Korean) in tab 사전 형식 v2.0 of the spreadsheet. Any blank values default to *.

Index	Name (Korean)	Name (English)	Notes
0	표면	Surface
1	왼쪽 문맥 ID	Left context ID
2	오른쪽 문맥 ID	Right context ID
3	비용	Cost
4	품사 태그	part-of-speech tag	See `태그 v2.0` tab on spreadsheet
5	의미 부류	meaning	(too few examples for me to be sure)
6	종성 유무	presence or absence	`T` for true; `F` for false; else `*`
7	읽기	reading	usually matches surface, but may differ for foreign words e.g. Chinese character words
8	타입	type	One of: `Inflect` (활용); `Compound` (복합명사); or `Preanalysis` (기분석)
9	첫번째 품사	first part-of-speech	e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return `VV`
10	마지막 품사	last part-of-speech	e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return `EP`
11	표현	expression	`활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드` – Fields that tell how usage, compound nouns, and key analysis are organized

User dictionary format (CSV)

Simple version

Index	Name (Japanese)	Name (English)	Notes
0	표면	Surface
1	품사 태그	part-of-speech tag	See `태그 v2.0` tab on spreadsheet
2	읽기	reading	usually matches surface, but may differ for foreign words e.g. Chinese character words

Detailed version

Index	Name (Korean)	Name (English)	Notes
0	표면	Surface
1	왼쪽 문맥 ID	Left context ID
2	오른쪽 문맥 ID	Right context ID
3	비용	Cost
4	품사 태그	part-of-speech tag	See `태그 v2.0` tab on spreadsheet
5	의미 부류	meaning	(too few examples for me to be sure)
6	종성 유무	presence or absence	`T` for true; `F` for false; else `*`
7	읽기	reading	usually matches surface, but may differ for foreign words e.g. Chinese character words
8	타입	type	One of: `Inflect` (활용); `Compound` (복합명사); or `Preanalysis` (기분석)
9	첫번째 품사	first part-of-speech	e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return `VV`
10	마지막 품사	last part-of-speech	e.g. given a part-of-speech tag of "VV+EM+VX+EP", would return `EP`
11	표현	expression	`활용, 복합명사, 기분석이 어떻게 구성되는지 알려주는 필드` – Fields that tell how usage, compound nouns, and key analysis are organized
12	-	-	After 12, it can be freely expanded.

API reference

The API reference is available. Please see following URL:

lindera-ko-dic

Commit count: 505