aboutsummaryrefslogtreecommitdiff
path: root/README
blob: 05ce67a6f1e4d99333f44c77cd0ac560c33d2eea (plain)
1
2
3
4
5
6
7
Very WIP: A parser / template expander for Mediawiki dumps.

The latest version of all mediawiki sites is about 1.1TB. Pull with:
    curl -LO ftp://ftpmirror.your.org/pub/wikimedia/dumps/rsync-filelist-last-1-good.txt
    rsync --partial --progress -h -r --files-from=rsync-filelist-last-1-good.txt  rsync://ftpmirror.your.org/wikimedia-dumps/ /mirror/wikimedia

Or find the specific file (i.e. `enwiki-20170520-pages-articles-multistream.xml`) from https://dumps.wikimedia.org/mirrors.html