« SourceGuardian Extends Power of PHP Protection with Version 7.0 | Main | WAPT, Web Application Load, Stress and Performance Testing »

Setup an Open Source Search Engine with Sphinx

Sphinx is a full-text search engine, distributed under GPL version 2. a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data source drivers support fetching data either via direct connection to MySQL, PostgreSQL, or from a pipe in a custom XML format.

What is great with Sphinx that it came with a pure-PHP searchd client API, this will make its integration with PHP application much more easier. In addition to many features that make Sphinx a very interesting alternative to an open source SQL full-text search engine.

Key features Sphinx :

  • high indexing speed (upto 10 MB/sec on modern CPUs)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • supports distributed searching (since v.0.9.6)
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports phrase searching
  • supports phrase proximity ranking, providing good relevance
  • supports English and Russian stemming
  • supports any number of document fields (weights can be changed on the fly)
  • supports document groups
  • supports stopwords
  • supports different search modes ("match all", "match phrase" and "match any" as of v.0.9.5)
  • generic XML interface which grealy simplifies custom integration
  • pure-PHP (ie. NO module compiling etc) searchd client API
Current Sphinx distribution includes the following software:
  • indexer: an utility to create fulltext indices;
  • search: a simple (test) utility to query fulltext indices from command line;
  • searchd: a daemon to search through fulltext indices from external software (such as Web scripts);
  • sphinxapi: a set of API libraries for popular Web scripting languages (currently, PHP);

Download

You can download Sphinx from http://www.sphinxsearch.com/downloads.html

Documentation available on http://www.sphinxsearch.com/doc.html#installation

Sphinx is Open Source under GPL license but Commercial license is also available for embedded use.

Bookmark this article at these sites
Comments
1

You know what? Correct me if I'm wrong, but this will never be something that could be compared to a custom solution...

2


I cant think of any use for it since advanced search engines already staisfies all my needs.

3

This is not necessarily to compete with Google, there is lots of usage for such solutions in Web Marketing for example.

4

Sphinx is being used in many big projects with success. Sure it's not a direct competitor of Google but think about more specific databases that need a search function. Take a look at a list of sites that use it: http://www.sphinxsearch.com/powered.html

5

In response to a previous comment - you'd be surprised what a generic custom search system can achieve (e.g. Zend_Lucene, and this one also). I dare say you might also be surprised how complex a search system can be - making a "custom solution" represent far much more work than it is often worth.

6

Sphinx is a great piece of software. I use it on a website that has millions of records to index and search within. Its performance is great and also the flexibility to set weights of the fields dynamically. I'd mention possibility to externally set sorting strategies.
Also, support from the developer is of great help when things just won't fix.

Post a comment





(Email will remain hidden)





Please enter the security code you see here




Related entries
Email to a friend
Email this article to:


Your email address:


Message (optional):