RecentChangesScraper

From i3Detroit
Jump to: navigation, search

MediaWiki has several ways to be notified of changes, but they all suck for one reason or another. You can "watch" a page, which will cause you to get emails when it changes, but there's no way to subscribe to *all* changes, or to a digest thereof. Some extensions appear to fix that, but those also suck with version and library dependencies, etc.

So, this awful piece of hackery runs on Nate B's old desktop, with Task Scheduler triggering it once each night:

C:\wiki>type midnight.bat
del out.html
echo ^<html^> > out.html

wget -k --base=https://www.i3detroit.org/ -O in.html "https://www.i3detroit.org/wiki/Special:RecentChanges&days=1&limit=500"

grep -U -A 999 "/fieldset" in.html | grep -U -B 999 "printfooter" >> out.html

sed -e "s@/wi/@https://www.i3detroit.org/wi/@g" <out.html >done.html

find /c "diff=" <out.html
if errorlevel 1 goto end

call send-email.bat

:end

The "fieldset" and "printfooter" are some strings that happen to appear in the HTML of the RecentChanges page immediately before and after the meat of the page. Existence of the "diff" text shows that there were changes today, so we should send the email -- if there were no changes, no email is sent. (This was a design requirement.)

Then, send-email.bat is then just a one-liner that calls the mailer to deliver done.html to the specified email address (in this case, i3detroit@googlegroups).