aboutsummaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorNick Shipp <nick@shipp.ninja>2017-06-02 23:01:00 -0400
committerNick Shipp <nick@shipp.ninja>2017-06-02 23:01:00 -0400
commit9412f78fff6ec8df8cde8e10f39b008053c61f30 (patch)
tree91d94109870edb2b4b9562b40fcef5ef94351cb8 /README
parent3f2480a08cc7335dda1c50af7f018a5a4c46d49d (diff)
Add README
Diffstat (limited to 'README')
-rw-r--r--README7
1 files changed, 7 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..05ce67a
--- /dev/null
+++ b/README
@@ -0,0 +1,7 @@
+Very WIP: A parser / template expander for Mediawiki dumps.
+
+The latest version of all mediawiki sites is about 1.1TB. Pull with:
+ curl -LO ftp://ftpmirror.your.org/pub/wikimedia/dumps/rsync-filelist-last-1-good.txt
+ rsync --partial --progress -h -r --files-from=rsync-filelist-last-1-good.txt rsync://ftpmirror.your.org/wikimedia-dumps/ /mirror/wikimedia
+
+Or find the specific file (i.e. `enwiki-20170520-pages-articles-multistream.xml`) from https://dumps.wikimedia.org/mirrors.html