An elisp function to convert legal headers to wikitext format

From 清冽之泉
Jump to navigation Jump to search

本 Emacs 函数,可以很方便地处理国家法律数据库中等各处下载下来的法律文本,使其标题以 wikitext 形式呈现。具体排版效果可参见本站 罗拉快跑 分享的法条。

从其他地方下载的法律文本通用处理

(defun convert-legal-headers-to-wikitext ()
  "将当前 buffer 中的法律文书标题转换为 mediawiki wikitext 层级格式。
对以“第…编”、“第…章”、“第…节”开头的行进行转换:
- ‘编’ -> 二级标题 (== 标题 ==)
- ‘章’ -> 三级标题 (=== 标题 ===)
- ‘节’ -> 四级标题 (==== 标题 ====)
处理过程中先统一替换各种混乱空格为单个空格,并在最后将标题中的半角空格转换为全角空格。"
  (interactive)
  (save-excursion
    (goto-char (point-min))
    (while (not (eobp))
      (let ((line (buffer-substring-no-properties
                   (line-beginning-position) (line-end-position))))
        (cond
         ;; 处理“编”
         ((string-match "^\\s-*\\(第[^[:space:]]*编\\)\\s-*\\(.*\\)$" line)
          (let* ((part1 (match-string 1 line))
                 (part2 (match-string 2 line))
                 (title (concat part1 " " part2)))
            ;; 将全角和连续空格统一替换为一个半角空格,并去除首尾空白
            (setq title (replace-regexp-in-string "[  ]+" " " title))
            (setq title (string-trim title))
            ;; 替换半角空格为全角空格
            (setq title (replace-regexp-in-string " " " " title))
            (delete-region (line-beginning-position) (line-end-position))
            (insert (format "== %s ==" title))))
         ;; 处理“章”
         ((string-match "^\\s-*\\(第[^[:space:]]*章\\)\\s-*\\(.*\\)$" line)
          (let* ((part1 (match-string 1 line))
                 (part2 (match-string 2 line))
                 (title (concat part1 " " part2)))
            (setq title (replace-regexp-in-string "[  ]+" " " title))
            (setq title (string-trim title))
            (setq title (replace-regexp-in-string " " " " title))
            (delete-region (line-beginning-position) (line-end-position))
            (insert (format "=== %s ===" title))))
         ;; 处理“节”
         ((string-match "^\\s-*\\(第[^[:space:]]*节\\)\\s-*\\(.*\\)$" line)
          (let* ((part1 (match-string 1 line))
                 (part2 (match-string 2 line))
                 (title (concat part1 " " part2)))
            (setq title (replace-regexp-in-string "[  ]+" " " title))
            (setq title (string-trim title))
            (setq title (replace-regexp-in-string " " " " title))
            (delete-region (line-beginning-position) (line-end-position))
            (insert (format "==== %s ====" title)))))
        (forward-line 1)))))

从国家法律法规数据库下载的法律文本专用处理

defun bzk-wikitext-format ()
  "转换 buffer 为 wikitext 格式,依次执行:
1. 调用 `convert-legal-headers-to-wikitext` 函数转换标题格式;
2. 用 `flush-lines` 删除所有完全为空的行(匹配正则 `^$`);
3. 用正则将每行行尾替换为额外的换行符,实现段落间增加一个空行;
4. 在 buffer 顶端插入指定的 CSS 设置,隐藏 .tocnumber。

注意:此函数依赖于已定义的 `convert-legal-headers-to-wikitext` 函数。"
  (interactive)
  ;; 1. 调用转换标题的函数
  (convert-legal-headers-to-wikitext)
  ;; 2. 删除空白行(使用正则匹配完全为空的行)
  (flush-lines "^$")
  ;; 3. 用正则将行尾替换为换行符(即在每行后再插入一个换行符)
  (replace-regexp "$" "\n" nil (point-min) (point-max))
  ;; 4. 在 buffer 顶部插入 CSS 配置内容
  (goto-char (point-min))
  (insert "{{#css: \n.tocnumber { display: none; }\n}}\n"))