kgdata

Crates.iokgdata
lib.rskgdata
version4.0.1
sourcesrc
created_at2023-05-21 12:48:31.728333
updated_at2024-03-28 23:01:35.390048
descriptionLibrary to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)
homepagehttps://github.com/binh-vu/kgdata
repositoryhttps://github.com/binh-vu/kgdata
max_upload_size
id869928
size249,785
Binh Vu (binh-vu)

documentation

README

kgdata PyPI Documentation

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
  • Create embedded key-value databases to access entities from the dumps.
  • Extract Wikidata ontology.
  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
  • Create Pyserini indices to search Wikidata’s entities.
  • and more

For a full documentation, please see the website.

Installation

From PyPI (using pre-built binaries):

pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version
Commit count: 347

cargo fmt