Crates.io | maps-address-completion-service |
lib.rs | maps-address-completion-service |
version | 0.2.0 |
source | src |
created_at | 2023-10-16 23:30:55.924285 |
updated_at | 2023-10-17 21:29:57.256641 |
description | Serve city names, zip codes, street names and house numbers for auto completion |
homepage | https://github.com/julianbuettner/maps-address-completion-service |
repository | |
max_upload_size | |
id | 1005342 |
size | 87,040 |
Serve auto completions for addresses, like city names, zip codes, street names and house numbers. Useful for e.g. webforms where a valid address has to be entered manually.
cargo install maps-address-completion-service
curl -s https://download.geofabrik.de/europe/greece-latest.osm.pbf |
macs parse |
macs compress > greece.world
macs serve --world greece.world
This applications is split into three stages, which allow to see what's going on and modify data inbetween.
For the short guide, it is assumed, that
you installed the package (e.g. cargo install --path .
/
cargo install maps-address-completion-service
).
The first two steps are stdin, stdout based.
The first step converts from OpenStreetMaps data
(*.osm.pbf
) to json lines of the following format:
{"country":"ZA","city":"Pinelands","postcode":"7405","street":"La Provence","housenumber":"1"}
{"country":"ZA","city":"Pinelands","postcode":"7405","street":"Ringwood Drive","housenumber":"2"}
You can download OSM maps from the Geofabrik site.
There are entire continents as well as just regions.
wget https://download.geofabrik.de/europe/great-britain-latest.osm.pbf -O great-britain.osm.pbf
cat great-britain.osm.pbf | macs parse > map.jsonl
# Or compress it directly
cat great-britain.osm.pbf | macs parse | xz > maps.jsonl.xz
Here, everything get's sorted, street names and house numbers deduplicated, etc.
The resulting object it pretty much a memory representation of the final structure
and will therefore be a good index for how much memory will be consumed.
The building process requires between 3GiB and 6GiB of memory for the entire globe.
cat maps.jsonl | macs compress > great-britain.world
ls -lah great-britain.world
The server can be startet with
macs serve -w great-britain.world
macs serve --world great-britain.world --port 3000 --ip 127.0.0.1
[2023-10-16T22:45:11Z INFO macs::serve] Loading from world file "gb.world"...
[2023-10-16T22:45:11Z INFO macs::serve] World loadded, containing 3 countries.
[2023-10-16T22:45:11Z INFO macs::serve] Serve on 127.0.0.1:3000...
Now we can query:
# First get cities
curl http://localhost:3000/cities --url-query "country_code=GB"
# Then ZIP codes
curl http://localhost:3000/zips\
--url-query "country_code=GB" \
--url-query "city_name=London"
# Then streets
curl http://localhost:3000/streets \
--url-query "country_code=GB" \
--url-query "city_name=London" \
--url-query "zip=WC2R 0JR"
# Then house numbers
curl http://localhost:3000/housenumbers \
--url-query "country_code=GB" \
--url-query "city_name=London" \
--url-query "zip=WC2R 0JR" \
--url-query "street=Strand"
# All requests support prefix searching
curl http://localhost:3000/cities --url-query "country_code=GB" --url-query "prefix=Lon"
# All requests support result limiting
curl http://localhost:3000/cities --url-query "country_code=GB" -H "max-items: 16"
The server does not log requests.
All results are a one dimensional list of strings, json.
The country code is defined in the ISO-3166 standart.
Generally all data is in sorted vectors, not in hashmaps. This compactness results in an optimal memory usage and allows for binary searching. Therefore the timecomplexity of a request is in O(log n), not O(1).
This is a service intended to be used by backends rather than frontends. If used by frontends, configure
reverse proxy accordingly. When reverse proxying, inject a low max-items: 123
header and enable rate limiting.
The small request - big response nature might be attractive for DOSing.
OSM has a lot of faulty data, like cities named "<format"
or 1,2,3
, quoted house numbers or similiar things.
There is still room for performance improvements, but it's doing pretty fine already.
These are some numbers I stumbled across while making this little project. These all relate to OSM data, which for example maps barely the most important cities in Africa. It also injects faulty data. So please take those numbers with a big grain of salt.
* Again, based on OSM Data, which is faulty and incomplete
Contributions are very welcomed. If you wish any new features, feel free to open an issue.
When opening a pull request, please use cargo fmt
and keep the code as simple as possible.