Here is a memo on backing up MediaWiki instances, say deployed as a part of a Web site

Here is a listing of concrete steps:

Get inside the backup root directory on local file system:

cd /Volumes/BACKUP/

Backup using the backup script

  1. Login web server
  2. Update VCS repository
  3. Back up using MediaWiki_Backup/
     # assuming web directory is ~/
     # assuming the path to save a subdirectory backup_YYYYMMDD created by backup is path/to/backup/
     # get to the home path before start
     # Start backup. This will create backup path/to/backup/
     path/to/ -d $WIKI_BACKUP_PATH -w $WIKI_PATH
  4. Rsync the backup to a local hard drive:
    cd /Volumes/BACKUP/

    Backup the whole web site user's home directory that includes the backup files created above, using rsync

    rsync --exclude-from rsync_backup_exclusion.txt -thrivpbl rsync_backup/
  5. Ideally, upload the backup to cloud storage such as Drobpox.

HTML backup using wget for immediate read

Optionally, one can also keep a crawled version of a MediaWiki instances. Sometimes, it can be useful to have a copy of HTML files for immediate read offline.

cd /Volumes/BACKUP/
# crawl the whole Web site
# wget -k -p -r -E
# crawl the pages of the MediaWiki instance excluding the Help and Special pages
wget -k -p -r --user-agent='Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36' -R '*Special*' -R '*Help*' -E
cd ..
7z -a -mx=9 wget_backup_YYYYMMDD


  • -k: convert links to suit local viewing
  • -p: download page requisites/dependencies
  • -r: download recursively
  • --user-agent: set "fake" user agent for the purpose of emulating regular browsing as sometimes site checks user agent. Check user agent string at

As for time cost to create the wget-crawled backup, for reference, it took about 30 min to download a small MediaWiki installation with hundreds of user-created pages in an experiment I did.

If there is a small set of pages that you need to backup, curl may be alternatively used, for example,

# download multiple pages
curl -O[01-10]


blog comments powered by Disqus