Full-text Search (FTS) with PostgreSQL and SQLAlchemy

If you are building a text-heavy web platform such as a blog or a documentation platform, your users expect a search functionality. One of the simplest and effective way to build search experiences is to leverage the full-text search (FTS) capabilities exposed by your database.

Photo by Loverna Journey on Unsplash

PostgreSQL and FTS

PostgreSQL is a very capable RDBMS with batteries included. It comes with full-text functionality baked-in with tsquery and tsvector constructs.

CREATE TABLE video (
id PRIMARY KEY BIGSERIAL,
title VARCHAR(100) NOT NULL,
description TEXT NOT NULL,
published_at TIMESTAMP DEFAULT NOW()
);
class Video(db.Model):
id = db.Column(db.Integer(), primary_key=True,
autoincrement=True)
title = db.Column(db.String(255), nullable=False)
description = db.Column(db.Text(), nullable=False)
thumbnail_url = db.Column(db.String(2048), nullable=False)
published_at = db.Column(db.DateTime(), nullable=False)
select id, title, description, published_at FROM video WHERE title LIKE %term% OR description LIKE %term%
results = Video.query.filter(Video.title.like(term) | Video.description.like(f'%{term}%')).all()
  • It is difficult to provide a ranking of search results, which renders them ineffective when thousands of matches are found.
  • They do not support indexes and hence are slow.

Enter tsquery and tsvector

A tsvector is a sorted list of distinct lexemes, which have been normalized to merge different variants of the same word. It is a form optimized for search. It is used to store the target contents in a search system.

select id, title, description, published_at FROM video WHERE to_tsvector(title || ' ' || description) @@ to_tsquery(term)
results = Videos.query.filter(Video.description.match(term)).all()

Inverted Indices, TSVector Columns, and Generated Columns

To speed up the search, we need to pre-compute the tsvectors and store them in a gin index. A GiN or Generalized Inverted Index is a type of index used for full-text search. Like the “glossary” pages at the end of the book, an inverted index says which term is present in which documents.

ALTER TABLE video ADD COLUMN __ts_vector__ tsvector GENERATED ALWAYS AS (to_tsvector('english', title || ' ' || description)) STORED;
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import TSVECTOR
class TSVector(sa.types.TypeDecorator):
impl = TSVECTOR
from sqlalchemy import desc, Indexfrom .ts_vector import TSVector
from ..db import db
class Video(db.Model):
id = db.Column(db.Integer(), primary_key=True,
autoincrement=True)
title = db.Column(db.String(255), nullable=False)
description = db.Column(db.Text(), nullable=False)
thumbnail_url = db.Column(db.String(2048), nullable=False)
published_at = db.Column(db.DateTime(), nullable=False)
__ts_vector__ = db.Column(TSVector(),db.Computed(
"to_tsvector('english', title || ' ' || description)",
persisted=True))
__table_args__ = (Index('ix_video___ts_vector__',
__ts_vector__, postgresql_using='gin'),)
results = Video.query.filter(Video.__ts_vector__.match(term)).all()

Computer Whisperer. Open-source contributor. Find me at https://amitosh.in/