Crates.io | covid19db |
lib.rs | covid19db |
version | 2.0.9 |
source | src |
created_at | 2020-08-11 00:55:28.522296 |
updated_at | 2020-10-31 00:06:12.932231 |
description | Utility for building and accessing COVID-19 datasets |
homepage | https://github.com/jgoerzen/covid19db |
repository | https://github.com/jgoerzen/covid19db/ |
max_upload_size | |
id | 275203 |
size | 234,068 |
This repository contains tools to generate a COVID-19 database for research and analysis, and links to a pre-generated database. The database is a self-contained Sqlite database which can be used on any platform.
The program in this library can be run on your machine to download data from the Internet and assemble your own database. The process takes approximately two minutes and you can run it however often you like to obtain the latest data. Alternatively, a database is generated daily that you can download as well.
You can download a compressed database for yourself here: covid19db.zip.
This file is automatically regenerated daily.
This data is used in the COVID-19 in Kansas project. It has graphs automatically updated daily with a unique perspective on various data.
Besides the Sqlite command-line tools, here are some other tips on using the data:
Please note that various included data requests or requires attribution. Please give credit to original sources of data (eg, The New York Times) and aggregators in your work.
You can find a complete database schema in dbschema.rs. The views defined there are intended to be the primary way to access the database. A Rust API for sqlx
is also provided for select tables. Direct source data download URLs are in loader.rs.
Here are the sources:
cdataset
is from the COVID-19 derived datasets project, which includes data from Johns Hopkins University, the New York Times, and ECDC. This integrates the "combined" set, so you will almost certainly want to use a WHERE dataset='foo'
in every query so that you use only a single dataset. select distinct dataset from cdataset order by dataset;
will show you the available datasets. Please see the derived datasets link above for a description of the sources and the augmentation done there. Additional augmentation is done on reading in to this system:
factbook_population
column using the Johns Hopkins data (see below).loc_lookup
is from the Johns Hopkins dataset, the bulk of which it already included above in cdataset
. This table represents the UID_ISO_FIPS_LookUp_Table.csv
file, which contains county-level population data that is integrated into cdataset
or can be queried separately.rtlive
is from rt.live. Julian dates and YYYY-MM-DD dates are added to the CSV source; no other changes were made.covidtracking
is from the COVID Tracking Project data downloads. Julian dates and Y/M/D dates are added to the CSV source; no other changes were made.
covidtracking_us
that uses the data in covidtracking
to present the same kind of view.owid
is from the Our World in Data COVID-19 dataset. Julian dates and Y/M/D dates are added to the CSV source.These are potential future integrations:
A command like this should do it
git clone https://github.com/jgoerzen/covid19db
cd covid19db
cargo run --release
You will then get a file named covid19.db
in the working directory. Just use this with Sqlite.
With these commands, you can verify these results for yourself. If you don't already have Rust installed, see the Rust installation page.
It is pretty skeletal at the moment, but you can browse the docs.
This is a rapidly-changing field and the data providers change their schemas on a fairly frequent basis. I attempt to mitigate impacts. If you avoid things like SELECT *
and instead name your columns explicitly you will minimize the impact on yourself in the event of API changes.
This data is used by the Kansas COVID-19 Charts project and perhaps others.
This code is Copyright (c) 2019-2020 John Goerzen
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
This repository contains only tools for obtaining data and no data itself, though the data itself may be available elsewhere on Github. If you use the data accumulated by this program, or download it, you may be required to acknowledge the source. Here are some details:
In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.
If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”
If you use it in an online presentation, we would appreciate it if you would link to our U.S. tracking page at https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.
If you use this data, please let us know at covid-data@nytimes.com.
See our LICENSE for the full terms of use for this data.
This license is co-extensive with the Creative Commons Attribution-NonCommercial 4.0 International license, and licensees should refer to that license (CC BY-NC) if they have questions about the scope of the license.
We just ask that you cite Rt.live as the source and link where appropriate.
You are welcome to copy, distribute, and develop data and website content from The COVID Tracking Project at The Atlantic for all healthcare, medical, journalistic and non-commercial uses, including any personal, editorial, academic, or research purposes.
The COVID Tracking Project at The Atlantic data and website content is published under a Creative Commons CC BY-NC-4.0 license, which requires users to attribute the source and license type (CC BY-NC-4.0) when sharing our data or website content. The COVID Tracking Project at The Atlantic also grants permission for any derivative use of this data and website content that supports healthcare or medical research (including institutional use by public health and for-profit organizations), or journalistic usage (by nonprofit or for-profit organizations). All other commercial uses are not permitted under the Creative Commons license, and will require permission from The COVID Tracking Project at The Atlantic.
"All our research and visualizations are free to use by everyone for all purposes." source
Visualizations and text: All our charts, maps, and text is licensed under a very permissive ‘Creative Commons’ (CC) license: The CC-BY license. The BY stands for ‘by attribution’ and this means you are free to take whatever is useful for your work. You just need to provide credit to Our World in Data and our underlying sources (see below).
This data is a manual import from the Kansas Department of Health and Environment and the Harvey County Health Department.