User:MwZurp

From Meta, a Wikimedia project coordination wiki

HOWTO drive mySQL with elisp commands[edit]

Intended audience[edit]

If you use a local Wikipedia (that is, installed on your own PC) and you use Emacs, this writeup may be of substantial interest to you.

02:45, 18 Jun 2004 (UTC) Update: Thanks to a tip from Evan, I've learned that more advanced results exist for editing Wikipedia through Emacs. In particular, I am reorienting my efforts to built on top of Chong Yidong's wikipedia-mode.el.

However, my efforts can add something to Chong's code by:

  • creating an emacs-centric experience of Wikipedia (that is largely free of using a browser)
  • fostering faster, more reliable interaction with Wikipedia by having the ability to do fast browsing from a local copy of Wikipedia. For instance, I can access my local Wikipedia in 300-500ms even for larger articles but the online Wikipedia is subject to:
    • slowdowns and bottlenecks when traffic is heavy
    • periodic server downtimes for maintenance
    • 3 to 5 second response times in even the best circumstances
  • possibly decreasing the load on the Wikipedia servers (especially if Wikipedia is downloaded via P2P tools like BitTorrent)

Anyway, you may benefit from the following discussion in which I detail my discoveries. Further, I hope to gradually expand this material in the weeks to come.

A simple Wikipedia query in elisp[edit]

First you have installed MediaWiki and a copy of Wikipedia; in these examples, the English language Wikipedia is used. Once installed, you can start to issue direct mySQL commands such as

$ mysql --user=wikiuser --password=wikipass
mysql> USE wikidb;
mysql> SELECT cur_id FROM cur WHERE cur_namespace=0 AND cur_title='Apple';
+--------+
| cur_id |
+--------+
| 156319 |
+--------+
1 row in set (0.20 sec)

Note, these examples assume:

  • you installed the Wikipedia into mySQL as wikidb (the default during the MediaWiki install)
  • you installed wikidb under the mySQL password of wikipass

Now, let's reexpress this interaction to get a more compact result:

$ echo "use wikidb;SELECT cur_id FROM cur WHERE cur_namespace=0 AND cur_title='Apple';" | mysql --user=wikiuser --password=wikipass|tail -1
156319

Finally, let's issue this as a command in elisp:

(shell-command-to-string
 (concat "echo \"use wikidb;"
         "       SELECT cur_id FROM cur"
         "         WHERE cur_namespace=0 AND cur_title='Apple';\""
         " | mysql --user=wikiuser --password=wikipass"
         " | tail -1"))

Now, the result comes back as the handy, compact string "156319\n" which can be manipulated within elisp.

Several elisp functions that query Wikipedia[edit]

To make life more enjoyable, let's encapsulate the above, specialized code in a generic elisp function as follows:

(defun wkp-select (field condition)
  "Perform mySQL \"SELECT\" operation to get FIELD for Wikipedia articles satisfying CONDITION."
  (shell-command-to-string
   (format "echo \"use wikidb;SELECT %s FROM cur WHERE %s;\" | %s | tail -1"
           field condition
           "mysql --user=wikiuser --password=wikipass")))

So, now we can get "156319\n" back from:

(wkp-select "cur_id" "cur_namespace=0 AND cur_title='Apple'")

Finally, this article ID can be fed back into a second query as follows:

(substring (wkp-select "cur_text" "cur_id=156319") 3 66)

which, as of 20:55, 17 Jun 2004 (UTC), gives the result:

"This article is about the fruit; for other meanings of the word"

Again, let's encapsulate these results in a second elisp function:

(defun wkp-get-article (title)
  "Get text of Wikipedia article TITLE from mySQL database."
  (let ((id (string-to-int
             (wkp-select "cur_id"
                         (format "cur_namespace=0 AND cur_title='%s'"
                                 title)))))
    (wkp-select "cur_text" (format "cur_id=%d" id))))

So, the function invocation:

(substring (wkp-get-article "Apple") 3 66))

as before, will give:

"This article is about the fruit; for other meanings of the word"

Getting the current online Wikipedia article[edit]

However, if you want to work with the current online wiki-markup, the following command will dump the result on stardard-output:

$ wget -q -O - "http://en.wikipedia.org/w/wiki.phtml?title=Apple&action=raw"

Incorporating this command, `wkp-get-article' can be rewritten as:

(defun wkp-get-article (title &optional online-p)
  "Get text of Wikipedia article TITLE."
  (if online-p
      (shell-command-to-string
       (format "wget -q -O - \"%s?title=%s&action=raw\""
               "http://en.wikipedia.org/w/wiki.phtml"
               title))
    (let ((id (string-to-int
               (wkp-select "cur_id"
                           (format "cur_namespace=0 AND cur_title='%s'"
                                   title)))))
      (wkp-select "cur_text" (format "cur_id=%d" id)))))

One application of this boost to `wkp-get-article' is that the online version can be compared to the local version. That is, if you did some offline editing with your local Wikipedia (during another outage :-P), you would want to check the version that you based your efforts on against the live version currently on Wikipedia.

For MwZurp 14:56, 18 Jun 2004 (UTC), the "Apple" article has changed but not the "Accessory_fruit" article. Hence, the following code reports that "Accessory_fruit" is unchanged:

(defun replace-in-string (string regexp newtext)
  (let ((skip (length newtext))
       (start 0))
    (while (string-match regexp string start)
      (setq string (replace-match newtext t t string)
           start (+ skip (match-beginning 0)))))
  string)
(if (let ((local  (wkp-get-article "Accessory_fruit" nil))
          (online (wkp-get-article "Accessory_fruit" t)))
      (string= (replace-in-string
                (replace-in-string local "\\\\n" "\n") "\C-m" "")
               (concat online "\n"))
    (message "Local copy of Accessory_fruit is up-to-date"))

During testing, assorted newline/linefeed discrepancies turned up. Further exploration may uncover other tweaks necessary to compare local/online articles.

TODO

  • apply `emerge-buffers' on local/online versions. Even better, work on a tool to help with the easiest three-way merge between local version, edited version, and online version.
  • load/manipulate/display online version with an elisp sentinel which will run only once the result has arrived from wikipedia.org. By loading this Wikipedia article as a background operation, emacs will not be tied up during the downloading.

Displaying the article in Emacs[edit]

Now, attention turns to displaying the article in an Emacs buffer. Note: this code works on the raw Wikipedia markup that Wikipedians change in a browser's edit-box. On the other hand, one could use the processed HTML that MediaWiki serves up because Emacs can display HTML using packages such as w3.el or w3m.el. However, the raw Wikipedia markup offers interesting possibilities such as the use of Emacs's outline-mode.

The following function is an initial effort at displaying a Wikipedia article. Now that the article has been removed from mySQL, vanilla elisp can be used without any more knowledge of mySQL. However, one does need to know the format of wiki-markup. Anyway, for elisp beginners, more information on the functions used by `wkp-display-article' can be obtained from Emacs itself which is rich in self-documentating features.

(Again, as mentioned above, wikipedia-mode.el offers a much more comprehensive approach. Anyway, this simple code may prove instructive to readers.)

(defun wkp-display-article (title)
  "Display Wikipedia article TITLE."
  (interactive "sArticle title: ")
  (let ((text (wkp-get-article title)))
    ;; handle redirects
    (if (string-match "#REDIRECT \\[\\[\\(.*\\)\\]\\]" text)
        (setq title (buffer-substring (match-beginning 1) (match-end 1))
              ;; note: as per MediaWiki policy only allow one redirect
              ;;   so as to avoid infinite recursion
              text (wkp-get-article title)))
    ;; set up an empty buffer
    (switch-to-buffer (format "*Wikipedia: %s*" title))
    (if (view-mode 0) (View-exit))
    (erase-buffer)
    (insert "\C-l " title "\n\n== Intro (placeholder) ==\n\n" text)
    ;; clean up assorted whitespace to make more readable
    (wkp-replace "\\\\n" "\n")
    (wkp-replace "\r" "")
    (wkp-replace " +$" "")
    (wkp-replace "\n\n\n+" "\n\n")
    (goto-char (point-min))
    ;; highlight some standard Wikipedia markup (needs more work!)
    (highlight-regexp "\\[\\^*\\]\\]" 'hi-blue-b)
    (highlight-regexp "[^'\n]*\\($\\|\\)" 'hi-pink)
    (highlight-regexp "[^'\n]*\\($\\|\\)" 'hi-green)
    ;; set up some handy modes
    (outline-mode)
    (setq outline-regexp "\\(\C-l\\|==+\\)")
    (view-mode)
    (font-lock-mode)
    ;; use `outline-mode' to compress a long article
    (when (< 3000 (length text))
      (hide-subtree)
      (show-branches)
      (save-excursion
	(outline-next-visible-heading 1)
	(show-entry)))))
(defun wkp-replace (regexp newtext)
  "Replace REGEXP with NEWTEXT throughout buffer."
  (goto-char (point-min))
  (while (re-search-forward regexp nil t)
    (replace-match newtext)))

Future Writeups and Extensions[edit]

From this simple base, many amazing results can be generated especially `wkp-mode' which is a new elisp mode that I have created. This mode includes

  • emacs fontification
  • very fast interaction with Wikipedia (ie, 300-500ms response on a 350MHz PC)
  • more spontaneous/enjoyable usage of Wikipedia:
    • for emacs users who don't like switching to another application to use Wikipedia
    • for people with dialup connections who have to first connect to the internet
  • crude hyperlinking in which mouse-2 hops to other articles
  • querying your local Wikipedia with fulltext searching
  • article sections broken down into `outline-mode'
  • at some future time, emacs-driven Wikipedia editing (not yet coded)
    • promoting the edits to Wikipedia from within emacs
    • doing offline Wikipedia editing from your local Wikipedia
    • promoting the offline edits in batch operations
    • during batch promoting, automatically comparing the local copy (that you based your edits on) against the current Wikipedia version

Again, except for editing, I have already prototyped the above features in a rudimentary `wkp-mode'. At the moment, this elisp code is intertwined in with other material in my rather large "~/.emacs" file. For the time being, I'm taking a leisurely approach to extracting out useful snippets of code. If there is anyone out there who is actually interested in my extentions to Chong's wikipedia-mode.el, leave a note on my talk page and I'll pick up the pace to make a reasonably finished result in a few days.

However, as noted above, I've learned there is already a highly useful wikipedia-mode.el which allows for Emacs-driven editing instead of using the painfully inadequate edit-box provided by browsers. Personally, I loathe using "edit-boxes" for any extended editing. Also, hunting through the text in an edit-box is such a hassle for someone used to the incremental-search, occur-mode, and highlighting tools available in emacs. (Plus, cut-and-paste between the edit-box and emacs has lead to some disasters best forgotten.)

Anyway, I have achieved some results that are orthogonal/independent of wikipedia-mode.el. For me, the key prize will be to get offline emacs-based editing of Wikipedia-articles. Further, I'd like emacs to drive the entire process and dispense with using a web-browser for most Wikipedia interactions. In contrast, wikipedia-mode.el is invoked by your browser for editing but can't actually drive a browsing interaction. As of 15:36, 18 Jun 2004 (UTC), I am still considering how to merge my efforts into wikipedia-mode.el.