PostgreSQL Indexing: GIN, Django, and Trigram Similarity
PostgreSQL supports a special type of index called GIN – Generalized Inverted Index. This type of index is especially helpful for improving the performance of queries against composite fields, particularly for applications such as full-text search or any relatively static data.
GIN and GiST
If you are familiar with GiST indexes in PostgreSQL, then GIN should be familiar. Some of the main performance differences between the two index types follow from the documentation :
- GIN index lookups are about three times faster than GiST
- GIN indexes take about three times longer to build than GiST
- GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled
- GIN indexes are two-to-three times larger than GiST indexes
In summary, while GIN lookups are generally faster, they can be much slower to build and update than GiST ones and take up more space. This can be an advantageous tradeoff, however, depending on the application. Static data is particularly well suited for GIN.
In Django, there is support for GIN and related technology for querying against these indexes called Trigram Similarity. To enable support for this feature, one must use the extension TrigramExtension. In addition, if one is using model fields that are not natively supported by GIN such as character fields – anything that is not jsonb, array, and tsvector – the BtreeGinExtension must also be enabled .
These extensions can be placed in a Django migration to enable database support. Once in place, subsequent migration operations can add GIN indexes to fields using the GinIndex module. For example, if one were to index a name field - which should be relatively static in the database – with GIN, it may look like the following (assuming a Django model enclosing the Meta class):
class Meta: indexes = [ GinIndex(fields=['name']), ]
Note that GIN does not support NULL values, so the column must be null=False in the model definition.
Once in place, one can then efficiently query the data using a trigram similarity annotation:
results = MyModel.objects.annotate( similarity=TrigramSimilarity('name', ‘Joe’ ).filter(similarity__gt=0.3).order_by('-similarity')
In this post, we have only scratched the surface of what GIN indexing can provide in the context of a web application. If you have been considering alternative search technologies but are already using PostgreSQL, you may find that leveraging its native support for text search and indexing beneficial.