One of our applications uses geocoding extensively. When we started the project, we included the excellent Geocoder gem, and set Google as the geocoding backend. As the application scaled, its geocoding requirements grew and soon we were looking at geocoding bills worth thousands of dollars.
An alternative Geocoder
Our search for an alternative geocoder landed us on Nominatim. Written in C, with a PHP web interface, Nominatim was performant enough for our requirements. Once set up, Nominatim required 8GB of RAM to run and this included RAM for the PostgreSQL (+ PostGIS) as well.
The rest of the blog discusses how to setup Nominatim and the tips and tricks that we learned along the way and how it compares with the geocoding solution offered by Google.
Setting up Nominatim
Next, we went through the official installation document. We decided to give docker a shot and found that there are many Nominatim docker builds. We used https://github.com/merlinnot/nominatim-docker since it seemed to follow all the steps mentioned in the official installation guide.
Issues faced during Setup
Out of Memory Errors
The official documentation recommends using 32GB of RAM for initial import but we needed to double the memory to 64GB to make it work.
Also any time docker build failed, due to the large amount of data that is generated on each run, we also ran out of disk space on subsequent docker builds since docker caches layers across builds.
Merging Multiple Regions
We wanted to geocode locations from USA, Mexico, Canada and Sri Lanka. USA, Mexico and Canada are included by default in North America data extract but we had to merge data for Sri Lanka with North America to get it in a format required for initial import.
The following snippet pre-processes map data for North America and Sri Lanka into a single data.osm.pbf file that can be directly used by Nominatim installer.
Slow Search times
Once the installation was done,
we tried running simple location searches like this one,
but the search timed out.
Usually Nominatim can provide a lot of information
from its web-interface
&debug=true to the search query.
PostgreSQL query planner depends on statistics collected by postgres statistics collector while executing a query. In our case, query planner took an enormous amount of time to plan queries as there were no stats collected since we had a fresh installation.
Comparing Nominatim and Google Geocoder
We compared 2500 addresses and we found that Google geocoded 99% of those addresses. In comparison Nominatim could only geocode 47% of the addresses.
It means we still need to geocode ~50% of addresses using Google geocoder. We found that we could increase geocoding efficiency by normalizing the addresses we had.
Address Normalization using libpostal
Libpostal is an address normalizer, which uses statistical natural-language processing to normalize addresses. Libpostal also has ruby bindings which made it quite easy to use it for our test purposes.
Once libpostal and its ruby-bindings were installed (installation is straightforward and steps are available in ruby-postal’s github page), we gave libpostal + Nominatim a go.
With this, we were able to improve our geocoding efficiency by 10% as Nominatim + Libpostal combination could geocode ~ 59% of addresses.