From 9412f78fff6ec8df8cde8e10f39b008053c61f30 Mon Sep 17 00:00:00 2001 From: Nick Shipp Date: Fri, 2 Jun 2017 23:01:00 -0400 Subject: Add README --- README | 7 +++++++ 1 file changed, 7 insertions(+) create mode 100644 README diff --git a/README b/README new file mode 100644 index 0000000..05ce67a --- /dev/null +++ b/README @@ -0,0 +1,7 @@ +Very WIP: A parser / template expander for Mediawiki dumps. + +The latest version of all mediawiki sites is about 1.1TB. Pull with: + curl -LO ftp://ftpmirror.your.org/pub/wikimedia/dumps/rsync-filelist-last-1-good.txt + rsync --partial --progress -h -r --files-from=rsync-filelist-last-1-good.txt rsync://ftpmirror.your.org/wikimedia-dumps/ /mirror/wikimedia + +Or find the specific file (i.e. `enwiki-20170520-pages-articles-multistream.xml`) from https://dumps.wikimedia.org/mirrors.html -- cgit v1.2.3-54-g00ecf