Fetching data from the Open Street Maps

A non-obvious task of downloading data from OSM has unexpected, highly programmable solutions.

Sep 5, 2019

I have spent some time exploring Open Street Maps and its possibilities. This time I will quickly introduce you to what it is, what it contains and how you can use it.

Please note that I am nowhere near being an expert in OSM, I just had some experience with it and might save you some searching.

What is OSM?

OpenStreetMap is an equivalent to the Google Maps when it comes to displaying, surprise, maps. It is open source project, developed by people and companies from all around the world. And because of that, it also makes it the biggest database of geographical data, available to anyone.

In fact, it is my most common use case – I have no need for browsing OSM when Google Maps exist (sadly), but I find it irreplaceable when it comes to getting data for some processing.

Is it free?

OSM is licensed under Open Data Commons Open Database License (ODbL). It roughly means that you can copy, distribute and share its data as long as you state that OpenStreetMap and its authors were your source. You can further distribute results of your changes under the same license.

You are also required to use credit: © OpenStreetMap contributors. If it is a browsable map, the credit should go in the lower-right corner.

Consult Open Street Maps's copyright notice for more information.

I am not a law expert, so don't quote me on that, but I think that as far as hobby projects are concerned, you are absolutely fine to use it as long as you credit the OSM. For earning money, consult your lawyers (or some other lawyers).

How to query data from it?

Data in OSM seems to be kind of distributed. There are several sources where you can find it. There is famous among people familiar with OSM Geofabrik.de – a website serving really huge packages containing literally gigabytes of data.

Note

Fun fact: all the information stored by OSM surpassed 50GB of data, and remember, we're talking about a binary format for plain data, not any images or renders).

There's also overpass turbo, a website which allows you to test Overpass QL queries and display the data it was able to fetch. Personally I found it the most useful and learned to use this one. Unfortunately, it features its own query language that might be hard to grasp.

Making sense of OverpassQL

I haven't found any quick material for quickly getting basics of OverpassQL. Instead, I found some waaay to complex articles on Open Street Maps's wiki.

Settings

The query consists of statements ending with ;. If you want to specify some settings, you better do it on the first line. Settings are provided one after another in a format: [key:value] and as all statements, end with a semicolon. So for example, if you want the response in JSON, inside a specified bounding box (the format is [bbox:south,west,north,east]) and with a 25 second timeout:

[out:json][bbox:50.0,19.85,50.105,20.13][timeout:25];

Elements

Then you can query some actual data. It goes like this: way[building=yes]. There are three types of elements in OpenStreetMap:

  • nodes – defining points in space
  • ways – defining groups of points
  • relations – defining more abstract concepts, often containing several ways

Unions

So it seems that for given element, we can query it, especially with some expected parameter. Now how to use more queries? Wrap it inside a union using ( and );.

So let's say we want to fetch motorways and primary roads and add rivers to it.

(
  way[waterway=river];
  way["highway"~"motorway|primary"];
);

As you can see, you can use one element multiple times. You can match with no quotes for exact value, or help yourself with them to create a kind of or operator.

How to find out what properties can nodes, ways and relations have? Well, I don't know any other way than a process of trial and error: coming up with a description for some property, googling it, going through OSM wiki to find out proper tag, trying it with Overpass Turbo. Repeat.

Recursion operators

But in case it started to make some sense: time for recursion statements. Remember the way[...] queried above? Well, it's not enough since it is just meant to contain ways. In order to get nodes (which are crucial if we want to render that way) we have to tell our query to recursively get elements that queried ones refer to. Which means node referenced by the ways we found.

(._;>;);

Boom. That's a place where OverpassQL stops being human readable in case it ever was for you. Let's divide it into some simpler components.

What really happened here?

We can see that the outer part is wrapped in a union because of the ( brackets ). It gives us two separate statements:

._;
>;

The first one tells about a set. Sets are what ultimately stores queried data. If we did not specify any set to write (and we didn't, since it would require us to write (way[building=yes];)->.a;), the default set is called _ and all the sets are referred to as .name.

The second query is a recurse operator. What it does? It starts with elements in the input set and recursively queries all elemenents until it reaches nodes. There are also other operators like recurse up, just in case you needed to get all higher elements containing selected node.

Putting it together

The last statement in a query is an out; statement. It is used to output anything and is used by virtually any query you'd want to write.

So to put it all together, let's query all parks in my city, Cracow:

[out:json][bbox:50.0,19.85,50.105,20.13];
way[leisure=park];
(._;>;);
out;

You should try this query on the overpass turbo. You will see the result on the map to the right and explore the data in the Data tab.

As an experiment, remove the line containing recursion from query and run it. The Overpass Turbo will warn you about result missing any points making it impossible to render. I hope it made it a bit more clear why that recursion was necessary here.

How to download it?

You can call Overpass API (this time no turbo here) with your query. It will analyze it and return data. As simple as that. Here is an example using wget:

wget -O yayivegotdata.json http://overpass-api.de/api/interpreter\?data\=\[out:json\]\[bbox:50.0,19.85,50.105,20.13\]\;way\[building\=yes\]\;\(._\;\>\;\)\;out\;

Now it is all buildings in Cracow. It gives 53MB of data.

Different data formats

Now that you've downloaded data, time to learn about data formats. What you just got is called Overpass JSON (a JSON variant of the default OSM XML). Here are the popular choices for storing OSM data:

OSM XML

The default, classic OpenStreetMap format.

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="CGImap 0.0.2">
  <bounds minlat="54.0889580" minlon="12.2487570" maxlat="54.0913900" maxlon="12.2524800"/>
  <node id="298884269" lat="54.0901746" lon="12.2482632" />
  <!-- ... -->
  <way id="26659127">
    <nd ref="292403538"/>
    <nd ref="298884289"/>
    <!-- ... -->
    <nd ref="261728686"/>
    <tag k="highway" v="unclassified"/>
    <tag k="name" v="Pastower Straße"/>
  </way>
  <!-- ... -->

As you can see, it first store nodes, then encodes ways in a way that only references nodes by their IDs. The tags that you previosly used for fetching will be available under the tag elements in each way and relation.

Overpass JSON

Default OSM has more or less this format:

{
  "elements": [
    {
      "type": "node",
      "id": 273723506,
      "lat": 50.0571383,
      "lon": 19.9377451
    },
    {
      "type": "way",
      "id": 25442442,
      "nodes": [277279683, 277279684],
      "tags": {
        "building": "yes"
      }
    }
  ]
}

It almost exactly matches the OSM XML.

GeoJSON

What GeoJSON takes different:

{
  "type": "FeatureCollection",
  "features": [{
    "type": "Feature",
    "id": "way/25128757",
    "properties": {
      "building": "yes",
      "id": "way/25128757"
    },
    "geometry": {
      "type": "Polygon",
      "coordinates": [
        [
          [19.9380744, 50.0569556],

It stores information in features, grouping them in feature collections. Usually, the FeatureCollection will only serve as an object containing feature which is array of Features. One difference from the OSM format is that by convention, tags data is stored under properties. Also: IDs contain element type in them.

As you can see, now points are that build up a way are contained in it. Just like usually in JSON, all the data is there. This can potentially lead to an overhead and repetition of data, but that's not the case in the example from above. Actually, the file decreased in size by 2MB.

TopoJSON

And now comes another format – TopoJSON. It was created by Mike Bostock, the guy behind famous d3, observablehq and a lot of mindblowing visualisations of maps, mathematical properties of the world and so on.

The idea is to extract arcs from the nodes and build features from arcs and not nodes. Also: arcs are encoded in a smart way. They use offsets which are mostly the same for typically big numbers that happen in geographical data.

This way TopoJSON might seriosuly make you savings on the bandwidth. But it's not for free, as decoding it in runtime is a bit time consuming (and noticeably takes time compared to GeoJSON).

Other notable examples

There is a PBF Format. It is highly compressed, optimized binary format similar to the OSM XML.

Level0L is a cool attempt to, I would say, bring YAML experience to the OSM world. Just look at it:

node 298884269.1: 54.0901746, 12.2482632
node 261728686: 54.0906309, 12.2441924
node 1831881213: 54.0900666, 12.2539381

way 26659127.5
  nd 292403538
  nd 298884289

What's next?

You have found features interesting you on the Overpass Turbo website. Downloaded it from Overpass API. Now you can convert it between some of those formats to others.

Note

Fun fact: there's a TODO left in the OSM wiki article about file formats to provide a matrix of formats and link to available converter between each of them.

It's likely that if you want to render the data, you will be most happy with GeoJSON. You can convert Overpass JSON (which is probably the one you have if you downloaded it as I've shown) to GeoJSON using this command line tool: osmtogeojson.

Cool resources

Overpass API – article about Overpass API. Describes the service, provides some information about the Query language.

Overpass API/Overpass QL – this one is dedicated solely to the query language.

Overpass API/Language Guide – wait, there's another one. This one compares QL to the XML version.

Geographic data mining and visualisation for beginners – Overpass turbo tutorial – a tutorial about using Overpass Turbo by an experienced OSM contributor.

OSM file formats – article about file formats used with Open Street Maps.

Newsletter

Sometimes I write blogposts. If you want to get an old fashioned email announcing arrival of a new tech writing piece from me – you can leave your contact details below.

At the moment there are ... people subscribing.

<- Go to homepage
© Tomasz Czajęcki 2018 – 2022. All Rights Reserved.