Sphinx (search engine)

Sphinx
Developer(s) Andrew Aksyonoff
Initial release 2001 (2001)
Stable release
3.0.2 / 28 February 2018 (2018-02-28)
Written in C++
Operating system Linux, Windows, Solaris, FreeBSD, NetBSD, Mac OS, AIX
Type Search and index
License GPLv2 and commercial
Website www.sphinxsearch.com

Sphinx is a fulltext F/OSS search engine that provides text search functionality to client applications.

Overview

Sphinx can be used either as a stand-alone server or as a storage engine ("SphinxSE") for the MySQL family of databases. When run as a standalone server Sphinx operates similar to a DBMS and can communicate with MySQL, MariaDB and PostgreSQL through their native protocols or with any ODBC-compliant DBMS via ODBC. MariaDB, a fork of MySQL, is distributed with SphinxSE.[1]

SphinxAPI

If Sphinx is run as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl, Ruby and Python languages. Unofficial implementations for other languages, as well as various third party[2] plugins and modules are also available. Other data sources can be indexed via pipe in a custom XML format.[3]

SphinxQL

The Sphinx search daemon supports the MySQL binary network protocol and can be accessed with the regular MySQL API and/or clients. Sphinx supports a subset of SQL known as SphinxQL. It supports standard querying of all index types with SELECT, modifying RealTime indexes with INSERT, REPLACE, and DELETE, and more.

SphinxSE

Sphinx can also provide a special storage engine for MariaDB and MySQL databases. This allows those MySQL, MariaDB to communicate with Sphinx's searchd to run queries and obtain results. Sphinx indices are treated like regular SQL tables.

Full-text fields and Indexing

Sphinx is configured to examine a data set via its Indexer. The Indexer process creates a full-text index (a special data structure that enables quick keyword searches) from the given data/text. Full-text fields are the resulting content that is indexed by Sphinx; they can be (quickly) searched for keywords. Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. to "title" and "abstract" only). Sphinx's index format generally supports up to 256 fields. Note that the original data is not stored in the Sphinx index, but are discarded during the Indexing process; Sphinx assumes that you store those contents elsewhere.

Attributes

Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search. Attributes are named. Attribute names are case insensitive. Attributes are not full-text indexed; they are stored in the index as is. Currently supported attribute types are:

  • unsigned integers (1-bit to 32-bit wide);
  • UNIX timestamps;
  • floating point values (32-bit, IEEE 754 single precision);
  • string ordinals (specially computed integers);
  • strings(since 1.10-beta);
  • JSON(since 2.1.1-beta);[4][5]
  • MVA, multi-value attributes (variable-length lists of 32-bit unsigned integers).

JSON Attributes in Sphinx

Sphinx, like classic SQL databases, works with a so-called fixed schema, that is, a set of predefined attribute columns. These work well when most of the data stored actually has values: mapping sparse data to static columns can be cumbersome. Assume for example that you’re running a price comparison or an auction site with many different products categories. Some of the attributes like the price or the vendor are identical across all goods. But from there, for laptops, you also need to store the weight, screen size, HDD type, RAM size, etc. And, say, for shovels, you probably want to store the color, the handle length, and so on. So it’s manageable across a single category, but all the distinct fields that you need for all the goods across all the categories are legion. The JSON field can be used to overcome this. Inside the JSON attribute you don’t need a fixed structure. You can have various keys which may or may not be present in all documents. When you try to filter on one of these keys, Sphinx will ignore documents that don’t have the key in the JSON attribute and will work only with those documents that have it.

License

Sphinx is dual licensed:

  1. GNU General Public License version 2
  2. and, commercial licensing is available for use-cases which are not within the terms of the GNU GPLv2.

Sphinx use examples

  • Craigslist.org[6]
  • Recruitment.aleph-graymatter.com[7]
  • Tradebit.com[8]
  • vBulletin.com[9]
  • Mediawiki Extension[10]
  • Boardreader.com[11]
  • OMBE.com[12]

Feature list

  • Batch and incremental (soft real-time) full-text indexing.
  • Support for non-text attributes (scalars, strings, sets, JSON).
  • Direct indexing of SQL databases. Native support for MySQL, MariaDB, PostgreSQL, MSSQL, plus ODBC connectivity.
  • XML document indexing support.
  • Distributed searching support out-of-the-box.
  • Integration via access APIs.
  • SQL-like syntax support via MySQL protocol (since 0.9.9)
  • Full-text searching syntax.
  • Database-like result set processing.
  • Relevance ranking utilizing additional factors besides standard BM25.
  • Text processing support for SBCS and UTF-8 encodings, stopwords, indexing of words known not to appear in the database ("hitless"), stemming, word forms, tokenizing exceptions, and "blended characters" (dual-indexing as both a real character and a word separator).
  • Supports UDF (since 2.0.1).

Performance and scalability

  • Indexing speed of up to 10-15 MB/sec per core and HDD.
  • Searching speed of over 500 queries/sec against 1,000,000 document/1.2 GB collection using a 2-core desktop system with 2 GB of RAM.[13]
  • The biggest known installation using Sphinx, Boardreader.com, indexes 16 billion documents.[14]
  • The busiest known installation, Craigslist, serves over 300,000,000 queries/day[14] and more than 50 billion page views/month.[15]

See also

References

  1. "AskMonty: About SphinxSE". http://kb.askmonty.org. Monty Program AB. Retrieved 2013-08-16. External link in |website= (help)
  2. "Sphinx Wiki: Third Party Tools". http://sphinxsearch.com. Sphinx Search Wiki. Retrieved 2013-08-16. External link in |website= (help)
  3. "xmlpipe2". http://sphinxsearch.com. Sphinx Search Documentation. Retrieved 2013-08-16. External link in |website= (help)
  4. "JSON Attributes in Sphinx 2.1.1". http://sphinxsearch.com. Sphinx Search Blog. Retrieved 2013-08-16. External link in |website= (help)
  5. "Full JSON Support in Trunk". http://sphinxsearch.com. Sphinx Search Blog. Retrieved 2013-08-16. External link in |website= (help)
  6. "Sphinx at Craigslist". http://craigslist.org. Craigslist. Retrieved 2013-08-17. External link in |website= (help)
  7. "GM Recruitment". http://www.aleph-networks.com. Aleph-networks. Retrieved 2012-10-01. External link in |website= (help)
  8. "Lighting Fast PHP Site Search". http://tradebit.com. Tradebit. Retrieved 2013-08-17. External link in |website= (help)
  9. "Sphinx Search beta for Vbulletin 4.0". http://vbulletin.com. Vbulletin. Retrieved 2013-08-17. External link in |website= (help)
  10. "Sphinx Search Extension for MediaWiki". http://mediawiki.org. MediaWiki: Svemir Brkic, Paul Grinberg. Retrieved 2013-08-17. External link in |website= (help)
  11. "Powered by Sphinx Search: Boardreader". http://sphinxsearch.com. Sphinx Search. Retrieved 2013-08-17. External link in |website= (help)
  12. "Faster Searching on OMBE". https://www.ombe.com. Asay Media Network. Retrieved 2017-06-27. External link in |website= (help)
  13. "About Sphinx". http://sphinxsearch.com. Sphinx Search. Retrieved 2013-08-16. External link in |website= (help)
  14. 1 2 "Powered by Sphinx". http://sphinxsearch.com. Sphinx Search. Retrieved 2015-05-10. External link in |website= (help)
  15. "Craigslist: Factsheet". http://www.craigslist.org. Craigslist. Archived from the original on 5 August 2012. Retrieved 16 August 2013. External link in |website= (help)

Further reading

  • Aksyonoff, Andrew (2011). Introduction to Search with Sphinx: From installation to relevance tuning. O'Reilly Media. ISBN 978-0-596-80955-3.
  • Ali, Abbas (2011). Sphinx Search Beginner's Guide. Birmingham, England: Packt Publishing. ISBN 978-1-84951-254-1.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.