I am studying fulltext search engines for django. It must be simple to install,

Question

0

Asked: May 14, 20262026-05-14T21:38:39+00:00 2026-05-14T21:38:39+00:00

I am studying fulltext search engines for django. It must be simple to install,

0

I am studying fulltext search engines for django.
It must be simple to install, fast indexing, fast index update, not blocking while indexing, fast search.

After reading many web pages, I put in short list :
Mysql MYISAM fulltext, djapian/python-xapian, and django-sphinx
I did not choose lucene because it seems complex, nor haystack as it has less features than djapian/django-spĥinx (like fields weighting).

Then I made some benchmarks, to do so, I collected many free books on the net to generate a database table with 1 485 000 records (id,title,body), each record is about 600 bytes long.
From the database, I also generated a list of 100 000 existing words and shuffled them to create a search list. For the tests, I made 2 runs on my laptop (4Go RAM, Dual core 2.0Ghz): the first one, just after a server reboot to clear all caches, the second is done juste after in order to test how good are cached results. Here are the “home made” benchmark results :

1485000 records with Title (150 bytes) and body (450 bytes)

Mysql 5.0.75/Ubuntu 9.04 Fulltext :
==========================================================================

Full indexing : 7m14.146s

1 thread, 1000 searchs with single word randomly taken from database : 
First run : 0:01:11.553524
next run : 0:00:00.168508

Mysql 5.5.4 m3/Ubuntu 9.04 Fulltext :
==========================================================================

Full indexing : 6m08.154s

1 thread, 1000 searchs with single word randomly taken from database : 
First run : 0:01:09.553524
next run : 0:00:20.316903

1 thread, 100000 searchs with single word randomly taken from database : 
First run : 9m09s
next run : 5m38s

1 thread, 10000 random strings (random strings should not be found in database) :
just after the 100000 search test : 0:00:15.007353

1 thread, boolean search : 1000 x (+word1 +word2) 
First run : 0:00:21.205404
next run : 0:00:00.145098

Djapian Fulltext : 
==========================================================================

Full indexing : 84m7.601s

1 thread, 1000 searchs with single word randomly taken from database with prefetch : 
First run : 0:02:28.085680
next run : 0:00:14.300236

python-xapian Fulltext :
==========================================================================

1 thread, 1000 searchs with single word randomly taken from database : 
First run : 0:01:26.402084
next run : 0:00:00.695092

django-sphinx Fulltext :
==========================================================================

Full indexing : 1m25.957s

1 thread, 1000 searchs with single word randomly taken from database : 
First run : 0:01:30.073001
next run : 0:00:05.203294

1 thread, 100000 searchs with single word randomly taken from database : 
First run : 12m48s
next run : 9m45s

1 thread, 10000 random strings (random strings should not be found in database) :
just after the 100000 search test : 0:00:23.535319

1 thread, boolean search : 1000 x (word1 word2) 
First run : 0:00:20.856486
next run : 0:00:03.005416

As you can see, Mysql is not so bad at all for fulltext search. In addition, its query cache is very efficient.

Mysql seems to me a good choice as there is nothing to install (I need just to write a small script to synchronize an Innodb production table to a MyISAM search table) and as I do not really need advanced search feature like stemming etc…

Here is the question : What do you think about Mysql fulltext search engine vs sphinx and xapian ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T21:38:39+00:00

I haven’t tested Xapian but I did a presentation last year comparing fulltext solutions:
http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql

Sphinx is the fastest at searches. But it’s hard to index data that comes in incrementally, because adding data to an index is about as expensive as creating the whole index from scratch.

So some people maintain two Sphinx indexes: one large index with archived data, and one small index with recent data. Periodically (e.g. weekly) they merge the recent index into the archived index (merging two indexes is less expensive), and truncate the small index to prepare for the new week. This works great for something like a forum, but not as good for a wiki.

You might also check out Apache Solr. This is a wrapper for Lucene, and it makes using Lucene a lot easier and yet more featureful. I didn’t know about Solr when I designed that presentation.

The Washington Times is an example of a project that uses Solr together with Django:

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am studying fulltext search engines for django. It must be simple to install,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply