Mandatory AI generated image

Introduction

Litestack, and its Litedb component in particular, provides powerful and flexible full text search capability via the Litesearch module. Any context that uses Litedb, whether directly using the driver or using ActiveRecord or Sequel adapters will have access to the Litesearch functionality.

Litesearch is built on top of SQLite’s FTS5, thus, it needs SQLite to be compiled with FTS5 support to function. This is a default compilation option for SQLite so it should not be an issue for most users.

In this blog we will take a look at Litesearch, understand its features and cover some of the design decisions that went into the implementation. We will also be comparing search performance against some other full text search engines with Ruby drivers.

A quick overview of FTS5

If you are only interested in learning about Litesearch itself, then you can skip this section entirely.

As mentioned earlier, Litesearch is built on top of SQLite’s FTS5 extension, which provides full text search functionality. FTS5 is implemented as a virtual tables module. It is simple, yet powerful and we will quickly have a look at how you can create and query full text indexes

Creating FTS5 tables

To create an FTS5 table you issue a SQL statement similar to the following:

-- a plain fts5 table
CREATE VIRTUAL TABLE email USING fts5 (sender, receiver, subject, body);

-- one the uses a specific tokenizer
CREATE VIRTUAL TABLE email USING fts5 (
  sender, 
  receiver, 
  subject, 
  body, 
  tokenize='porter'
);

-- one that doesn't store the textual content, only the index itself
CREATE VIRTUAL TABLE email USING fts5 (
  sender, 
  receiver, 
  subject, 
  body, 
  contentless=true
);

-- one that doesn't store the textual content but relies on another table for it
CREATE VIRTUAL TABLE email USING fts5 (
  sender, 
  receiver, 
  subject, 
  body, 
  content=message -- name of the external content table 
  content_rowid=id -- name of the id field of the external content table
);

The examples above show different ways an fts5 table can be created and multiple creation options as well. We will quickly explain the three types of tables we see above

Contentful FTS5 tables

This is the default type that is created when you don’t specify any of the content or contentless options. This table creates it own backing storage table to store the text content next to the index. If the data you are storing comes from another SQL table in the same database then that means this data will be duplicated as the FTS5 table will store its own copy of it.

External Content FTS5 tables

This type relies on another existing table to store the text and it will query it whenever it needs to show results.

For example, imagine an FTS5 table called emails_idx, that is backed by the table emails. When you query emails_idx, what happens is as follows

-- SQL match query (we will explain that in a bit)
SELECT subject FROM emails_idx MATCH 'sqlite';
-- What really happens is the equivalent of the following
SELECT subject FROM emails WHERE id IN (SELECT rowid FROM emails_idx MATCH 'sqlite')

As can be seen, the actually data was collected from the backing table since the index only knows the row ids.

Contentless FTS5 tables

This type is similar to the external content in that it doesn’t store the textual data. But it also has no backing store whatsoever. That means that it cannot return any data from the index except the row ids and if you request any other column from the FTS5 table you will get a NULL value.

Litesearch supports all three types of tables, but when it comes to ActiveRecord and Sequel support it relies on the contentless type. We will see soon why.

Querying FTS5 tables

FTS5 tables can be queried much like any other table, but they also have special operators that engage the FTS5 engine and allow the query to search the index. As we can see here:

-- those are FTS queries that will search the FTS5 index
SELECT * FROM idx MATCH 'circuit';
SELECT * FROM idx('cuircuit');
SELECT * FROM idx WHERE idx = 'circuit';

FTS5 has a simple but powerful query syntax

SELECT * FROM emails_idx('subject:alert body: (excep* OR error AND "code red")')

As can be quickly seen from the example above, FTS5 syntax has:

  • Word matching
  • Phrase matching
  • Column filters
  • Prefix queries
  • AND/OR/NOT grouping

And many other features. Please have a look at the FTS5 documentation to see what’s supported. It’s worth noting that Litesearch passes the queries to FTS5 as is, thus all the search syntax rules apply.

FTS5 Limitations

FTS5 is a rigid when it comes to the table structure, you cannot change any aspect (e.g. add/drop/rename column) without dropping and recreating the whole index. This was the main challenge I faced while designing Litesearch, as I wanted to offer a more flexible data model that can evolve as the application evolves without having a huge cost each time the schema changes for any reason.

Litesearch

Litesearch attempts to abstract away all the index types and index operations that are performed by FTS5. At the same time, and through very low level manipulation of the FTS5 data structures it is able to deliver a non-destructive schema evolution model.

We mentioned earlier that Litesearch works either directly with the Litedb SQLite connection or with higher level abstractions like ActiveRecord or Sequel. In this blog we will be focusing on ActiveRecord integration specifically, for information on the other methods please refer to the Litesearch guide.

Creating a Litesearch Index

Litesearch indexes are tied to a specific table, hence in the ActiveRecord integration they are tied to a specific model and are created within a model’s class, as such:

class Article < ApplicationRecord
  # include the litesearch model module 
  include Litesearch::Model
  
  # define the index's schema
  litesearch do |schema|
    # columns mapped directly
    schema.fields [:body, :summary]
    # column with a specific weight in search 
    schema.field :title, weight: 10 
    # a value from a column in a referenced table
    schema.field :author, target: "authors.name" 
    # a value from ActionText 
    schema.field :headline, rich_text: true
  end
end

The above shows an example schema for an Article model. After defining this schema the index will be created and it will automatically synchronize with the model’s table. Synchronization happens via database triggers, thus even if you apply changes manually outside of ActiveRecord (e.g. using the SQLite3 CLI) the index will always be in sync with the data.

It is important to note that Litesearch synchronizes directly with the data in the actual SQL table, rather than whatever you at the ActiveRecord level. Setting a field in the schema means Litesearch will extract values from the column with that name directly, it will not call the method with that name on the model.

Modifying a Litesearch Index

As mentioned earlier, Litesearch is able to adjust the index schema without having to rebuild the index. Of course some changes require rebuilding the index (like changing the tokenizer), but Litesearch supports a wide array of non-destructive changes. For example the schema above can directly be changed in the class file as follows:

class Article < ApplicationRecord
  # include the litesearch model module 
  include Litesearch::Model
  
  # define the index's schema
  litesearch do |schema|
    # a footer field was added
    schema.fields [:body, :summary, :footer]
    # title's weight was changed
    schema.field title, weight: 5 
    # a value from a column in a referenced table
    schema.field :author, target: "authors.name" 
    # the headline field was removed, it has to appear with weight 0
    schema.field :headline, weight: 0
  end
end

Any field that is removed (weight set to zero) will no longer appear in search results. It will not be an error to query for that field specifically but it will always return no results. After any rebuild though, all traces of that field will disappear and querying specifically for it will return an error.

Querying the Index

ActiveRecord models that include Litesearch::Model will now have a new method called search

ActiveRecord::Base.search returns an ActiveRecord::Relation thus it can be part of a larger query chain.

The returned objects will be ActiveRecord objects, but they will have an extra property search_rank which will be set to the rank of each model against the search query.

# returns the top 3 articles that match the search query ordered by search_rank
res = Article.search("title: tomato body: potato").limit(3)

res.first.search_rank >= res.last.search_rank # => true

All the SQLite FTS5 query syntax rules apply as mentioned earlier

Querying Multiple Models

Another method is added to ActiveRecord::Base which is search_all. This method is accessible from any ActiveRecord::Base child class and it performs the search either in all the indexes whose classes include the Litesearch::Model module or only search in a list of supplied classes

# search all models that include litesearch
Book.search_all('1992') # could be any other model, or AR::Base itself

# search specific models
Book.search_all('Tim', {models: [Book, Author]})

In both cases, results could be a mixed list of objects belonging to different model classes. It is also important to note that search_all does not return a relation and cannot be combined with other AR operations

Similarity Search

Litesearch knows the term distribution within each row in the index and for the index overall. Thus it is able to do similarity search by extracting the most representative terms from the record (based on a simplified form of TF-IDF) and apply a search using these to find similar records in the table. This can be very useful to implement similar items features.

book = Book.find(1)
similar_books = book.similar

Litesearch Compared to Meilisearch

We will do a quick performance comparison against another full text search engine, Meilisearch. The meilisearch-rails gem was used and the Meilisearch server was installed. A data set of 16K books was downloaded from kaggle.com and the following test script was ran:

require 'active_record'
require 'meilisearch-rails'

ActiveRecord::Base.establish_connection({adapter: "litedb", database: './books.db'})

class Book < ActiveRecord::Base
  include Litesearch::Model

  litesearch do |schema|
    schema.fields [:title, :details, :format, :author, :genres]
  end
end

MeiliSearch::Rails.configuration = {
  meilisearch_url: ENV.fetch('MEILISEARCH_HOST', 'http://localhost:7700'),
}

# the articles table is a copy from the books table
class Article < ActiveRecord::Base 
  include MeiliSearch::Rails

  meilisearch do
    attribute :title, :details, :format, :author, :genres
  end
end

def bench(msg, count=1000)
  t = Time.now
  count.times {|i| yield i }
  time = Time.now - t
  puts "Finished #{count} #{msg} iterations in #{time} seconds, #{count/time} ips"
end

bench("Litesearch Word") { Book.search('batman')  }
bench("Meilisearch Word") { Article.search('batman') }

bench("Litesearch Phrase") { Book.search('"batman and robin"') }
bench("Meilisearch Phrase") { Article.search('"batman and robin"') }

In both cases the count of returned records was identical, but the performance was drastically different

TestMeilisearch latency (throughput)Litesearch latency (throughput)
Single word search3ms (326 ips)0.0364ms (27467 ips)
Phrase search2ms (479 ips)0.0295ms (33890 ips)

It’s only fair to mention that Meilisearch is doing more work per search since it implements features like typo tolerance and prefix search by default, still the performance gap is very large. Not to mention that the index size is more than twice as large as the Litesearch index, and the server process is consuming close to 800MB of RAM when the Litesearch test script was under 75MB after performing thousands of tests.

Conclusion

Litesearch is a fast and lightweight full text search engine for Ruby and Rails applications that use Litestack. It brings all the goodies of SQLite’s FTS5 engine and builds a flexible schema model on top of it. If you are using SQLite for you Ruby/Rails application in production (which you should) then Litesearch is the perfect addition for enriching your application.

Leave a comment

Blog at WordPress.com.