Share it!

Search operations in databases are usually optimized using indexes as well as careful data modeling. This is also true for coordinate-based databases by using spatial indexes. Optimizing spatial searches is possible with the aid of a great variety of multi-dimensional indexing methods. The most typical of these are:

  • Point queries that search for all objects that contain a given point
  • Region queries that search for all objects that overlap a given region

For instance, MySQL uses R-Trees with quadratic splitting to index spatial columns.  Spatial indexes are built using the MBR (minimum bounding rectangle) of a geometry. In most geometries, the MBR is a minimum rectangle that surrounds the geometries. For a horizontal or a vertical linestring, the MBR is a rectangle degenerated into the linestring. Finally, for a point, the MBR is a rectangle degenerated into the point.

Is it always a good idea to use spatial indexes?

As shown in the previous post, composite indexes (multi-column) optimize a search query as long as the WHERE clause covers the left-most part of the indexed columns. Nevertheless, your table data can make a heck of a difference when it comes to performance.

For instance, if your data contains information on the coordinates of homes these values would be seemingly random floating numbers such as:

geolat geolng
20.1243 50.4521
 22.6456 51.1564
 13.5464 45.4562
 55.5642 166.5756
 24.2624 27.4564
 62.1564 24.2542

Since geolat and geolng values hardly repeat themselves, a composite index such as index (geolat, geolng) would be something like this:

index_id geolat geolng
1     20.1243 50.4521
2     20.1244 61.1564
3     20.1251 55.4562
4     20.1293 66.5756
5     20.1302 57.4564
6     20.1311 54.2542

In this case, the second column of the composite index is almost useless! The speed of your query with such an index is most likely going to be similar to an index on just the geolat column. In these cases, it’s better to use two separate indexes, In extreme cases, it might be better to use no indexes at all.

Understand the information you store

The simple example above shows that it’s important to understand your data in order to create the right indexes. Otherwise, queries will not be optimized and the database engine will waste time on maintaining indexes that are seldom (o never) used.

Using the wrong indexes not only ends up in a poorly optimized query but will also slow them down to a point where full-table scans perform better than using indexes. Also, if you take into account the overload of maintaining a composite index, it becomes clear why you should really look at your data and how it’s used before adding indexes “just because everyone uses them that way”.

Going back to the problem, what’s the right index for spatial data? When using coordinate-based values, MySQL provides support for spatial data, where spatial indexes are stored in a single column instead of two separate geolat, geolng columns.

The key idea of the spatial indexes data structure is to cluster nearby objects and represent them with their minimum bounding rectangle in the next higher level of the tree. So, since all objects lie within this bounding rectangle, a query that does not intersect the bounding rectangle cannot intersect any of the contained objects. In other words, all the rows inside this rectangle can be safely discarded during the search, thus speeding up SELECT operations.

Trade-offs of using spatial indexes

While spatial indexes speed up coordinate-based searches, the biggest trade-off is that a spatial point (and its index) consumes more memory than regular, non-spatial data. Spatial indexes use eight-byte double-precision numbers for storing coordinates so this must be watched for on low-memory systems. However, as resources availability increases (such as in cloud or clustered systems), spatial indexes become the index of choice for coordinate-based table queries.

Share it!