Extension:CirrusSearch
CirrusSearch Release status: stable |
|
---|---|
Implementation | Search, API , Hook |
Description | Implements searching for MediaWiki using Elasticsearch |
Author(s) | Nik Everett, Chad Horohoe, Erik Bernhardson |
Latest version | continuous updates |
Compatibility policy | Snapshots releases along with MediaWiki. Master is not backwards compatible. |
License | GNU General Public License 2.0 or later |
Download | README |
|
|
|
|
|
|
Translate the CirrusSearch extension if it is available at translatewiki.net | |
Vagrant role | cirrussearch |
Issues | Open tasks · Report a bug |
The CirrusSearch extension implements searching for MediaWiki using Elasticsearch.
This page is for installation. After the install is working, see Help:CirrusSearch for usage.
Goals
- No native dependencies that would make this difficult to install
- The only dependencies are pure-PHP MediaWiki extensions and Elasticsearch itself
- Provide a near-real-time search index for wiki pages that's extendable by other MediaWiki extensions
- Provide all of the query options MWSearch has given users, and more
Dependencies
- PHP and cURL
- In addition to the standard MediaWiki requirements for PHP, CirrusSearch requires PHP to be compiled with cURL support.
- Elasticsearch
- You will need to install Elasticsearch.
- MediaWiki 1.29.x and 1.30.x require Elasticsearch 5.3.x or 5.4.x.
- MediaWiki 1.31.x and 1.32.x require Elasticsearch 5.5.x or 5.6.x.
- MediaWiki 1.33.x, 1.34.x and 1.35.x require Elasticsearch 6.5.x (6.5.4 recommended).
Take note that a Java installation like OpenJDK is needed in addition.
- Elastica is a PHP library to talk to Elasticsearch. Install Elastica per the instructions below.
- Other
- Due to the actual handling of jobs by the CirrusSearch extension, it is advisable to set up jobs in Redis to prevent messages like Notice: unserialize(): Error at offset 64870 of 65535 bytes in JobQueueDB.php and subsequent errors like Unsupported operand types. See task T157759.
Installation
Elastica
Even though the instructions below tell you to only run Composer when installing from git, it may be necessary to issue it anyway in order to install all PHP dependencies.
- Download and place the file(s) in a directory called
Elastica
in yourextensions/
folder. - Only when installing from git run Composer to install PHP dependencies, by issuing
composer install --no-dev
in the extension directory. (See task T173141 for potential complications.) - Add the following code at the bottom of your LocalSettings.php:
wfLoadExtension( 'Elastica' );
- Done – Navigate to Special:Version on your wiki to verify that the extension is successfully installed.
CirrusSearch
- Download and place the file(s) in a directory called
CirrusSearch
in yourextensions/
folder. - Add the following code at the bottom of your LocalSettings.php:
wfLoadExtension( 'CirrusSearch' );
- Now follow the setup instructions in the CirrusSearch README delivered with your extension i.e.
$IP/extensions/CirrusSearch/README
. Note that all info in it might not apply to your version of the extension, especially the version of Elasticsearch supported. - Configure as required.
- Done – Navigate to Special:Version on your wiki to verify that the extension is successfully installed.
Upgrading
Please follow the upgrade instructions in the CirrusSearch UPGRADE file.
Configuration
The configuration parameters of CirrusSearch are documented at the "settings.txt" file. See also documentation on CirrusSearch configuration profiles.
$wgCirrusSearchIndexBaseName
configuration parameter which one needs to set, e.g. $wgCirrusSearchIndexBaseName = 'mywikidatabasename';
.Hooks
CirrusSearch extension defines a number of hooks that other extensions can make use of to extend the core schema and modify documents. The following hooks are available:
- CirrusSearchAnalysisConfig - allows to hook into the configuration for analysis
- CirrusSearchMappingConfig - allows configuration of the mapping of fields
- CirrusSearchBuildDocumentParse - allows extensions to modify ElasticSearch document produced from a page
- CirrusSearchBuildDocumentLinks - allows extensions to process incoming and outgoing links for the document
- CirrusSearchBuildDocumentFinishBatch - called when a batch of pages has been indexed
- CirrusSearchAddQueryFeatures - allows extensions to add query parser features
- CirrusSearchScoreBuilder - allows extensions to define rescore builder functions
API
CirrusSearch features can be used in API queries.
Searching happens via the normal search API, action=query&list=search
; you can use CirrusSearch-specific features, such as the morelike:
special prefix to find pages related to Marie Curie and radium: api.php?action=query&list=search&srsearch=morelike:Marie_Curie%7Cradium&srlimit=10&srprop=size&formatversion=2
Custom APIs and parameters are provided for querying CirrusSearch configuration and debug information:
action=cirrusdump
module: 2014?action=cirrusdumpcirrusDumpQuery
parameter to Special:Search or search API queries: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpQuerycirrusDumpResult
parameter to Special:Search or search API queries: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult- An additional parameter,
cirrusExplain
, can be passed withcirrusDumpResult
to have the Lucene explanation of the score included with the result dump: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain It can also be used to get the explanation in a human-readable format, by giving it one of the valuesverbose
,pretty
orhot
, such as: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain=pretty cirrus-config-dump
,cirrus-settings-dump
,cirrus-mapping-dump
modules to obtain dump from the CirrusSearch setup: api.php?action=cirrus-config-dump&formatversion=2
See also
- General links
- Usage help page - CirrusSearch usage documentation (needed after the install)
- Project page
- Info about Wikimedia Cirrus/Elastic setup
- Configuration help page - sets of tunable parameters that influence various aspect of the indexing
- Extension:AdvancedSearch - Enhances Special:Search by providing advanced parameters
- Debugging
Local development
Elastic Search service can be run with the Vagrant role (cirrussearch
) and MediaWiki Vagrant.
For Docker, you can use a command like docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.2
.
Then follow the installation and configuration directions.
If your web host is in a container you'll want to make sure the above container is on the same network, and in LocalSettings.php you will want to reference elasticsearch
as the host name.
This will not have the WMF plugins but can be sufficient for basic testing.
This extension is being used on one or more Wikimedia projects. This probably means that the extension is stable and works well enough to be used by such high-traffic websites. Look for this extension's name in Wikimedia's CommonSettings.php and InitialiseSettings.php configuration files to see where it's installed. A full list of the extensions installed on a particular wiki can be seen on the wiki's Special:Version page. |
- Stable extensions
- Search extensions
- API extensions
- Hook extensions
- GPL licensed extensions
- Extensions in Wikimedia version control
- APIAfterExecute extensions
- ApiBeforeMain extensions
- ArticleDelete extensions
- ArticleDeleteComplete extensions
- ArticleRevisionVisibilitySet extensions
- ArticleUndelete extensions
- BeforeInitialize extensions
- CirrusSearchAddQueryFeatures extensions
- CirrusSearchAnalysisConfig extensions
- CirrusSearchSimilarityConfig extensions
- GetPreferences extensions
- LinksUpdateComplete extensions
- MediaWikiServices extensions
- PageMoveComplete extensions
- PrefixSearchExtractNamespace extensions
- ResourceLoaderGetConfigVars extensions
- SearchGetNearMatch extensions
- SearchIndexFields extensions
- ShowSearchHitTitle extensions
- SoftwareInfo extensions
- SpecialSearchResults extensions
- SpecialSearchResultsAppend extensions
- SpecialStatsAddExtra extensions
- TitleMove extensions
- UploadComplete extensions
- UserGetDefaultOptions extensions
- All extensions
- Extensions requiring Composer with git
- Extensions used on Wikimedia
- Discovery
- Search