Crates.io | osm-tag-csv-history |
lib.rs | osm-tag-csv-history |
version | 0.5.0 |
source | src |
created_at | 2020-01-06 17:21:08.503938 |
updated_at | 2022-03-26 06:54:32.314804 |
description | Use CSV tools to see who's mapping what in OpenStreetMap. |
homepage | |
repository | https://git.sr.ht/~ebel/osm-tag-csv-history |
max_upload_size | |
id | 195868 |
size | 94,445 |
Use CSV tools to see who's mapping what in OpenStreetMap.
Given a OSM history file, it produces a CSV file, where each row refers to a change (addition, removal or modification) to a tag all OSM objects in an OSM data file with history.
Planet.OpenStreetMap.org provides a “full history” file, updated every week, where you can download the latest full history file (⚠ 99+ GB! ⚠), although it's quite large.
Download it over BitTorrent with:
aria2c --seed-time 0 https://planet.openstreetmap.org/pbf/full-history/history-latest.osm.pbf.torrent
Geofabrik provides an download service which includes full history files for lots of regions & countries. You must log into that with your OpenStreetMap account. You can also use this tool on regular, non-history, OSM data files.
If you have Rust installed, you can install it with:
cargo install osm-tag-csv-history
You can download prebuild binary released from the Github release page, (e.g. download the v0.3.0 release).
osm-tag-csv-history -i mydata.osm.pbf -o mydata.csv.gz
The output is automatically compressed with gzip if the file ends in .gz
. .csv
filename for CSV files, .tsv
for TSV (tab separated).
By default, all tag changes are included. With the --tag
/-t
argument, only any changes to those tags are included in the output
To produce a CSV with only changes to the highway
or building
tag, run this command
osm-tag-csv-history -i mydata.osm.pbf -o mydata.csv -t highway -t building
By default, all OSM objects in the file are included. With --object-types
/-T
only some can be output, e.g. -T wr
for only ways & relations.
Use --uid
to only output object changes by this OSM users (can be specified multiple times)
Many programmes can use CSV files. It's also possible to use hacky unix command
line programmes to calculate who's adding fuel stations (amenity=fuel
in OSM)
in Ireland:
osm-tag-csv-history -i ./ireland-and-northern-ireland-internal.osh.pbf -o - --no-header | grep '^amenity,fuel,' | cut -d, -f9 | sort | uniq -c | sort -n | tail -n 20
Here can find all times someone has upgraded a building from building=yes
to
something else.
osm-tag-csv-history -i data.osh.pbf -o - --no-header | grep -P '^building,[^,]+,yes,' | cat -n
And with some other command line commands, we can get a list of who's doing the
most to make OSM more descriptive by upgrading building=yes
.
osm-tag-csv-history -i data.osh.pbf -o - --no-header | grep -P '^building,[^,]+,yes,' | xsv select 8 | sort | uniq -c | sort -n | tail -n 20
osmium getid
The id
column (column 4) can be used by osmium-tool
to filter an OSM file by object id. This is how you get a file of all the pet shops in OSM in a file:
osm-tag-csv-history -i country-latest.osm.pbf -o - --no-header | grep '^shop,pet,' | xsv select 4 | osmium getid -i - country-latest.osm.pbf -o pets.osm.pbf -r
(For this simple case, osmiums
's tag
filtering is
probably better)
This programme can run on non-history files just fine. The old_value
, and
old_version
will be empty. This can be a way to convert OSM data into CSV
format for further processing.
The Geofabrik Public Download Service provides
non-history files which do not include some metadata, like usernames, uids or
changeset_ids. This tool can run on them and just give an empty value for
username, and 0
for uid & changeset_id.
If you have an OSM account, you can get full metada from the internal service.
Records are separated by a newline (\n
). A header line is included by default, but it
can be turned off with --no-header
(or forcibly included with --header
).
If any string (e.g. tag value, username) has a newline or characters like that,
it will be escaped with a backslash (i.e. a newline is written as 2 characters,
\
then n
).
The columns can be changed with --columns
/-C
, e.g (-C key,new_value,uid
).
The default value is key,new_value,old_value,id,new_version,old_version,datetime,username,uid,changeset_id
Default values, in order
key
The tag keynew_value
The current/new version. ""
(empty string) if the current
version doesn't have this key (i.e. it has been removed from the object)old_value
The previous value. ""
(empty string) if the previous version
didn't have this keyid
The object type and id. First character is the type (n
/w
/r
), then
the id. n123
is node with id 123. This format is used by osmium-tool
to filter an OSM file by object idnew_version
The current/new version numberold_version
The previous version number. ""
(empty string) for the first version of an objectdatetime
Date time (RFC3339 format in UTC) the object was created.username
The username of the user who changes it (remember: in OSM, users
can change their username, UIDs remain constant)uid
The user id of the user.changeset_id
Changeset id where this change was madeobject_type_short
/object_type_long
OSM type of the object (n
/w
/r
, or node
/way
/relation
)raw_id
OSM id of the objectepoch_datetime
Date time (Unix epoch time) the object was created. This is
how the data is stored in an OSM PBF file. This (rather than the ISO string
datetime
) makes processing about 15% faster (because the conversion of
epoch seconds in integer to ISO datetime format string doesn't need to be
done)tag_count_delta
: 0
if the tag is changed, +1
if the tag is added, -1
if the tag was removed. This is a more robust way to determine if a tag was
added or removed. Think of it as “the change in the number of OSM objects
with this key”Imagine this simple file:
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="osmium/1.7.1">
<node id="1" version="1" timestamp="2019-01-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="2">
<tag k="place" v="city"/>
<tag k="name" v="Nice City"/>
</node>
<node id="1" version="2" timestamp="2019-03-01T12:30:00Z" lat="0.0" lon="0.0" user="Bob" uid="2" changeset="10">
<tag k="place" v="city"/>
<tag k="name" v="Nice City"/>
<tag k="population" v="1000000"/>
</node>
<node id="2" version="1" timestamp="2019-04-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="20">
<tag k="amenity" v="restaurant"/>
<tag k="name" v="TastyEats"/>
</node>
<node id="2" version="2" timestamp="2019-04-01T02:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="21">
<tag k="amenity" v="restaurant"/>
<tag k="name" v="TastyEats"/>
<tag k="cuisine" v="regional"/>
</node>
<node id="2" version="3" timestamp="2019-04-01T03:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="22">
<tag k="amenity" v="restaurant"/>
<tag k="name" v="TastyEats"/>
<tag k="cuisine" v="burger"/>
</node>
<node id="2" version="4" timestamp="2019-04-01T03:00:00Z" lat="1.0" lon="0.0" user="Alice" uid="12" changeset="22">
<tag k="amenity" v="restaurant"/>
<tag k="name" v="TastyEats"/>
<tag k="cuisine" v="burger"/>
</node>
<node id="3" version="1" timestamp="2019-04-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="50">
<tag k="amenity" v="bench"/>
</node>
<node id="3" version="2" timestamp="2019-06-01T00:00:00Z" lat="0.0" lon="0.0" user="Alice" uid="12" changeset="100" visible="false">
</node>
</osm>
NB: This programme cannot read XML files, only PBF. This file was converted to PBF with osmium cat example.osm.xml -o example.osm.pbf
.
Running osm-tag-csv-history
on it produces this CSV file (formatted here as a table by with csvtomd
):
key | new_value | old_value | id | new_version | old_version | datetime | username | uid | changeset_id |
---|---|---|---|---|---|---|---|---|---|
name | Nice City | n1 | 1 | 2019-01-01T00:00:00Z | Alice | 12 | 2 | ||
place | city | n1 | 1 | 2019-01-01T00:00:00Z | Alice | 12 | 2 | ||
population | 1000000 | n1 | 2 | 1 | 2019-03-01T12:30:00Z | Bob | 2 | 10 | |
amenity | restaurant | n2 | 1 | 2019-04-01T00:00:00Z | Alice | 12 | 20 | ||
name | TastyEats | n2 | 1 | 2019-04-01T00:00:00Z | Alice | 12 | 20 | ||
cuisine | regional | n2 | 2 | 1 | 2019-04-01T02:00:00Z | Alice | 12 | 21 | |
cuisine | burger | regional | n2 | 3 | 2 | 2019-04-01T03:00:00Z | Alice | 12 | 22 |
amenity | bench | n3 | 1 | 2019-04-01T00:00:00Z | Alice | 12 | 50 | ||
amenity | bench | n3 | 2 | 1 | 2019-06-01T00:00:00Z | Alice | 12 | 100 |
Some things to note:
old_version
means there was no previous, or earlier, version.old_value
, and the new_value
is empty, as for n3 v2.The following other tools might be useful:
xsv
. a command line tool for slicing & filtering CSV data.osmium
a programme to process OSM data. You can use this to filter an OSM history file to a certain area, or time range.datamash
, command line CSV statistical tool.Copyright 2020, GNU Affero General Public Licence (AGPL) v3 or later. See LICENCE.txt. Source code is on Sourcehut, and Github.
The output file should be viewed as a Derived Database of the OpenStreetMap database, and hence under the ODbL 1.0 licence, the same as the OpenStreetMap copyright