How to install cirrussearch extension for mediawiki

From 清冽之泉
Jump to navigation Jump to search

MediaWiki 原生的站内检索能力略有欠缺,但安装拓展 CirrusSearch 之后,站内检索能力则近乎完美。CirrusSearch 的检索能力主要来自 Elasticsearch 这款外部软件。据笔者实践,2G 小内存的机器就不要折腾了,运行几秒就耗干净内存了。但是对于内存稍强的机器,体验非常棒。据笔者多次实践,MediaWiki 安装 CirrusSearch 后,要正常运转,像本站这种小站,CirrusSearch 内存消耗稳定在 1.3G 左右,全站内存消耗稳定在 2.6G 左右。

安装依赖

以下软件要看准与 Mediawiki 版本兼容的相应版本。一般其实就是 Linux 发行版 stable 源中的相应版本。

  1. 外部安装并开启 elasticsearch 的服务
  2. apt 安装 php
  3. apt 安装 curl
  4. apt 安装 openjdk
  5. apt 安装 composer

详述一下 elasticsearch 的安装及运行:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-amd64.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-amd64.deb.sha512
shasum -a 512 -c elasticsearch-7.10.2-amd64.deb.sha512 
sudo dpkg -i elasticsearch-7.10.2-amd64.deb

sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable elasticsearch.service

sudo systemctl start elasticsearch.service

sudo systemctl status elasticsearch.service

检测 elasticsearch 是否成功开启:

# 检测代码
curl -X GET "localhost:9200/?pretty"

# 正常结果参考样子
{
  "name" : "Cp9sag6",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "AT78_T_DTp-1qgIasfxtQqA",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "f2733455d",
    "build_date" : "2016-03-30T09:51:41.449Z",
    "build_snapshot" : false,
    "lucene_version" : "8.5.0",
    "minimum_wire_compatibility_version" : "1.3.3",
    "minimum_index_compatibility_version" : "1.3.3"
  },
  "tagline" : "You Know, for Search"
}

安装拓展

所谓拓展,即 Mediawiki 的 Extension,均在 Mediawiki 官网 下载即可。

此处拓展不建议使用 git 方式安装,因为 master 分支,与 mediawiki 稳定版分支,大概率不兼容。所以采用从拓展页直接下载 mediawiki 稳定版分支对应的拓展版本,很可能官方解决了兼容问题,就非常方便且不易出错。

说人话就是,从网页端下载,再 tar 解压。通过这种方式安装对应兼容版本。

这种拓展详情页面大家都很熟悉吧,选好版本,兼容无忧

解压命令:tar -xzf Elastica-REL1_42-78f2f84.tar.gz -C /var/www/mediawiki/extensions

安装 Elastica

  1. 下载 Elastica 拓展,并解压至 extensions/ 文件夹
  2. 可选,在 Elastica 文件夹内,执行 sudo composer install --no-dev --no-plugins --no-scripts。用 sudo 不完美但目前我没找到更完美方法
  3. 在 LocalSettings.php 中添加 wfLoadExtension( 'Elastica' );

如果是通过网页端直接下载的插件,大概率不用 2 中的 composer 步骤,因为兼容问题官方 mediawiki 稳定版解决好了的。

如果是通过 git 下载的插件,大概率要用 2 中的 composer 步骤,因为拓展版本、master 版本、mediawiki 稳定版三处很可能不兼容,会导致生成索引时找不到相关 api。

安装 CirrusSearch

  1. 下载 CirrusSearch 拓展,并解压至 extensions/ 文件夹
  2. 可选,在 CirrusSearch 文件夹内,执行 sudo composer install --no-dev --no-plugins --no-scripts。用 sudo 不完美但目前我没找到更完美方法
  3. 在 LocalSettings.php 中添加 wfLoadExtension( 'CirrusSearch' );

如果是通过网页端直接下载的插件,大概率不用 2 中的 composer 步骤,因为兼容问题官方 mediawiki 稳定版解决好了的。

如果是通过 git 下载的插件,大概率要用 2 中的 composer 步骤,因为拓展版本、master 版本、mediawiki 稳定版三处很可能不兼容,会导致生成索引时找不到相关 api。

composer 样例

如果你还是非得用 git 方式安装以上两个拓展,那提供一个 composer 成功运行的两个样例。

# 从 git 安装后 composer 的样例
Do not run Composer as root/super user! See https://getcomposer.org/root for details
Continue as root/super user [yes]? 
No composer.lock file present. Updating dependencies to latest instead of installing from lock file. See https://getcomposer.org/install for more information.
Loading composer repositories with package information
Updating dependencies
Lock file operations: 45 installs, 0 updates, 0 removals
  - Locking symfony/string (v7.2.0)
  - Locking tysonandre/var_representation_polyfill (0.1.3)
  - Locking webmozart/assert (1.11.0)
Writing lock file
Installing dependencies from lock file
Package operations: 9 installs, 0 updates, 0 removals
  - Downloading nyholm/dsn (2.0.1)
  - Downloading elasticsearch/elasticsearch (v7.17.1)
  - Downloading ruflin/elastica (7.3.1)
  - Installing react/promise (v3.2.0): Extracting archive
  - Installing ezimuel/guzzlestreams (3.1.0): Extracting archive
9 package suggestions were added by new dependencies, use `composer suggest` to see details.
Generating autoload files
4 packages you are using are looking for funding.
Use the `composer fund` command to find out more!
# 从网页端安装后 composer 的样例
Do not run Composer as root/super user! See https://getcomposer.org/root for details
Continue as root/super user [yes]? 
Installing dependencies from lock file
Verifying lock file contents can be installed on current platform.
Nothing to install, update or remove
Generating autoload files
4 packages you are using are looking for funding.
Use the `composer fund` command to find out more!

容量比较

本节无实际教程意义,仅做个比较。

拓展不同安装方式占用容量比较
安装方式 状态 拓展名
Elastica CirrusSearch
网页方式安装 解压前 568KB 13MB
解压后 7.3MB 92MB
git 方式安装 composer 前 1.6MB 78MB
composer 后 8.3MB 118MB

生成索引

这里其实就是配置 CirrusSearch。

改下配置

首先,确保 Elasticsearch 按上文所说已安装并开启开机自动启用。确保你的 LocalSettings.php 里有这三行,其实就是加了第三行:

wfLoadExtension( 'Elastica' );
wfLoadExtension( 'CirrusSearch' );
$wgDisableSearchUpdate = true;

生成索引

其次,生成 Elasticsearch 索引:

sudo php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php
# 成功状态如下
Updating cluster ...
indexing namespaces...
	Indexing namespaces...done
content index...
	Fetching Elasticsearch version...7.10.2...ok
	Scanning available plugins...none
	Validating mappings...
		Validating mapping...ok
	Validating aliases...
		Validating some_your-wiki_content alias...ok
		Validating some_your-wiki alias...ok
		Updating tracking indexes...done
general index...
	Fetching Elasticsearch version...7.10.2...ok
	Scanning available plugins...none
	Validating aliases...
		Validating some_your-wiki_general alias...ok
		Validating some_your-wiki alias...ok
		Updating tracking indexes...done

再改配置

其次,从 LocalSettings.php 移除刚才添加的这行:

$wgDisableSearchUpdate = true

调整索引

再次,调整索引:

sudo php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip
sudo php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse
# 第一条命令成功状态如下
[            some_your-wiki] Indexed 10 pages ending at 13 at 146/second
[            some_your-wiki] Indexed 10 pages ending at 25 at 235/second
[            some_your-wiki] Indexed 10 pages ending at 331 at 437/second
[            some_your-wiki] Indexed 10 pages ending at 341 at 440/second
[            some_your-wiki] Indexed 4 pages ending at 347 at 439/second
Indexed a total of 324 pages at 439/second
# 第二条命令成功状态如下
[            some_your-wiki] Indexed 10 pages ending at 25 at 235/second
[            some_your-wiki] Indexed 10 pages ending at 39 at 284/second
[            some_your-wiki] Indexed 10 pages ending at 331 at 437/second
[            some_your-wiki] Indexed 10 pages ending at 341 at 440/second
[            some_your-wiki] Indexed 4 pages ending at 347 at 439/second
Indexed a total of 324 pages at 439/second

再改配置

最后,在 LocalSettings.php 中添加:

$wgSearchType = 'CirrusSearch';

这样,你就成功实现了 Mediawiki 网站的站内搜索,例如搜索“清冽之泉”,则网站内所有包含“清冽之泉”四字的页面和标题,都会即刻出现。