Back top previously visited page (this function is disabled)Back to homepage (P)Back to previous page (E)Jump to next page (V)Display terms (this function is disabled)List of terms (this function is disabled)Print this page (this function is disabled)Jump to sitemap (D)Search in current page (this function is disabled)Help (S)

Building Spatial Databases – Theory / Spatial indexing

Learners guide

Summary

To access the required data quickly is an elementary task in GIS technology. This task is as seriously important in the spatial technology as it is important in the classic alphanumerical database. In this chapter some spatial indexing method will be shown.

Requirement

To make the self-test successfully.

Spatial indexing

Digital maps, vector or raster-based, usually require a large amount of processing and memory resources. During graphical representation of digital data, a bidirectional correspondence is made between the database and the display device.

Figure 37. Scale representation of the actual proportion between a viewport and the database.

Grey area described in Figure 37 represents the whole area stored in the database, whitely coloured area represents the actual viewport displaying the desired area. The most straightforward way is to read the whole database into memory, then initiate the graphical drawing of data. However, this method is not convenient in many aspects. On the one hand, it is unnecessary to read data outside the viewport, and on the other hand, memory usage is limited; therefore it is practical to load data inside the viewport. Moreover, by reading data from the hard disk is much slower than from the random access memory; therefore it is practical to split-up data into pages, and read them page-by-page. An I/O operation is equivalent to a read/write of a page. Then again, the number of reads must be synchronized with the size of a viewport. Both of the above described conditions represent opposite interests, which can only be resolved by the indexing of data, reducing the query against the database on different viewports.

For rapid querying on data inside a viewport, we must exclude the rest of the data. Unfortunately, the traditional indexing methods like B-tree on relational databases do not support this type of indexing requirements. A solution for this is using spatial indexing algorithms. Before discussing the spatial indexing algorithms, let us get familiar with terms like centroid and bounding box.

Figure 38. Centroid of a convex and concave polygons.

A centroid is a replacement object, a point, which is inside a polygon and capable of representing a polygon's spatial orientation. It is obvious that in the case of a convex polygon a centroid is its centre of gravity, whereas for a concave polygon it is a point on the centroid line (as represented in Figure 38). A bounding box is the quad with smallest area containing the actual polygon (as represented in Figure 39).

Figure 39. A bounding box.

Both terms are valid for polygons and polylines. During spatial indexing these terms are needed to represent remarkably complex objects (polygons and polylines).

For indexing spatial data, there are two methods: spatial objects partitioning and splitting up the examined space by partitions. In the following we will discuss the grid indexing and quad-tree indexing methods, which can be classified as space partition algorithms for accelerating the query. Lastly, R-tree indexing algorithm will be discussed, which is a data-driven indexing structure.

Grid indexing

Let us split planar objects into regular grids as described in Figure 40.

Figure 40. Objects with different areas displayed on regular grids.

In order to search objects rapidly, the first step is the primary filtering by using coordinates of objects and sorting them into grids, where grids have their own identifier index for faster querying. Objects falling into the given grids are returned as query results. By index identifiers we are requesting objects related to this index; this is more efficient than reading the whole database (Figure 41). Secondary filtering does the processing of geometries on the found objects (e.g. in the case of polygons, the reading of vertices and displaying them).

Figure 41. Index table and its relation to the original table.

As described in Figure 40, the size of grids is not always proportional to the size of objects. If the grid size is too large, then too many objects are related to it, which leads to inefficiency in the case of extreme data. If the grid size is too small, then large objects may relate to too many grids, which is not very efficient either. The algorithm may be improved if we use altering sized grid levels.

Grid indexing algorithm is a simple and efficient way of spatial indexing. Many commercial software apply this type of indexing despite its shortcomings, especially when the spatial distribution of objects is not equidistributed. The optimistic scenario would be that every grid should contain closely the same number of related objects. However, the constant size of grids will not make it possible. In the case of inhomogeneous spatial distribution, the process leads to the inefficient secondary filtering. Therefore, we need new indexing algorithms with better conditions. For homogeneous data distribution like raster-based data structures, grid indexing can be applied. However, the quadtrees indexing algorithm is much more efficient and widely used.

Back to table of contents (J)

Új Széchenyi terv
A projekt az Európai Unió támogatásával, az Európai Szociális Alap társfinanszirozásával valósul meg.

A Társadalominformatika: moduláris tananyagok, interdiszciplináris tartalom- és tudásmenedzsment rendszerek fejlesztése az Európai Unió támogatásával, az Európai Szociális Alap társfinanszírozásával, az ELTE TÁMOP 4.1.2.A/1-11/1-2011-0056 projekt keretében valósult meg.
Generated with ELTESCORM framework