robustify.js

robustify.js (https://github.com/renevoorburg/robustify.js) is a javascript add-on for web pages to fight link rot or content drift. It is an implementation of Herbert Van de Sompel's Memento Robust Links - Link Decoration specification (based in the Hiberlink project). With robustify.js active on a page, any clicked hyperlink will test if the linked page is available online. If the page is not online, robustify.js will redirect the user to a version in a web archive, by default using the Memento Time Travel service.

Because of the security model of Javascript, robustify.js needs a server side helper script to be able to see if a page is available. This script, statuscode.php, detects '404 file not found'-errors but it can also detect so called soft-404 errors (error pages that do not supply the proper 404 status code). This is done by using a technique known as fuzzy hashing.

Basic implementation

The easiest way to use robustify.js is by simply putting this code at the bottom of your pages, preferably just before the </body>-tag.

<script src="http://digitopia.nl/js/robustify-min.js"></script>
<script>
robustify({});
</script>

Live examples

Here are some examples of how robustify.js changes the behaviour of anchor links.
robustify.js will always present an alert before redirecting the user to a page other than specified in the original href-part of the anchor link.

A basic redirect

<a href="http://www.dds.nl/~krantb/stellingen/">DDS stellingen</a>
Following a redirect, this basic hyperlink leads to a 404, "File not found". Therefore, robustify.js will replace the link with a link that might lead to a version of this page in the default web archive. The preferred archive and preferred date of the archived version can be overridden when calling the script (or they can be specified using the data-versionurl of data-versiondate attributes).

Soft-404 redirection

Per default, robustify.js calls the statuscode.php server side helper with soft-404 detection enabled. The following example presents a link to a page that is essentially a soft-404, so a redirect to a web archive will follow:
<a href="http://www.trouw.nl/tr/nl/4324/Nieuws/archief/article/detail/1593578/2010/05/12/Een-hel-vol-rijstkoeken-en-insecten.dhtml">Een hel vol rijstkoeken en insecten</a>

The data-versiondate attribute

<a href="http://pedagogie.ac-toulouse.fr/histgeo/monog/comminge/sommaire.htm" data-versiondate="2009-01-01">Sommaire</a>
In this example the date of the desired archived version has been specified explicitly using the data-versiondate attribute. This date will be used when redirecting the user to the archive. Note that the archive might return a page from a date close to this date.

The data-versionurl attribute

<a href="http://www.heimatverein-butzweiler.de/index.php?option=com_content&view=article&id=59&Itemid=79&lang=de" data-versionurl="http://web.archive.org/web/20090811002411/http://www.heimatverein-butzweiler.de/index.php?option=com_content&view=article&id=59&Itemid=79&lang=de">Römische Langmauer</a>
For this page that is not available online, a version from a webarchive has been defined using the data-versionurl attribute. If the link has been decorated with both a data-versionurl and a data-versiondate the former takes precedence.

Local links

robustify.js will not modify links local to the website on which the script has been implemented. Further, additional measures have been taken to prevent it from running inside the context of a web archive.

Customized implementation

Per default, robustify.js calls http://digitopia.nl/services/statuscode.php to obtain JSON formatted header information regarding the url of the clicked link. Of course, this can be customized, as most behaviour of the script. Here is an example of a customized call:

<script src="http://digitopia.nl/js/robustify-min.js"></script>
<script>
robustify({ "archive"        : "https://web.archive.org/web/{yyyymmddhhmmss}/{url}",
            "dfltVersiondate": "2010-01-01", 
            "statusservice"  : "http://digitopia.nl/services/statuscode.php?url={url}",
            "ignoreLinks"    : [    "^http.?://[a-z]{2}\.wikipedia\.org", 
                                    "^http.?://(www\.)?wikidata\.org" 
                               ]
          });
</script>

Here is how this call has been customized:

"archive"
When a link returns a 404, robustify.js will redirect the user to a web archive. In determining where to send the user, the data-versionurl takes precedence above other options. If no data-versionurl has been given, the user will be redirected to the archive known by robustify.js. Default this is timetravel.mementoweb.org (actually an aggregator, not an archive). Using the "archive" option you may specify an url pattern for an other web archive.
"dfltVersiondate"
When a link is not available and no data-versionurl has been specified, the user will be redirected to the web archive known by robustify.js. When supplied, the value of the data-versiondate attribute of the link will be used as the preferred date for the version in the archive. When the link has not been decorated with a data-versiondate, the dateModified, or in absence of that the datePublished, of the page will be used (following the schema.org serialization, for example <meta itemprop="dateModified" content="2014-12-19">). Otherwise the value of "dfltVersiondate" will be used. If no versiondate whatsoever is available, the current date will be used.
"statusservice"
Defines how the statuscode service is called. In this example, soft-404 detection is disabled. To enable it, add the parameter "soft404detect" as in "http://digitopia.nl/services/statuscode.php?soft404detect&url={url}". Of course, you can run this service on your own server. Mind that to be able to do soft-404 detection, the statuscode.php script requires PHP to have the ssdeep extension loaded.
"ignoreLinks"
robustify.js will not alter the behaviour of local links, regardless whether they are defined relative or absolute. To make the script ignore more links, patterns may be defined here.

Internationalization

robustify.js has been designed to present alerts and dialogs using the preferred language of the browser. Currently, English and Dutch strings have been supplied. Feel free to add more support for more languages by modifying the code (https://github.com/renevoorburg/robustify.js).

In use

A real world implementation of robustify.js can be seen at Vici.org, Archaeological Atlas of Antiquity.

René Voorburg, updated February 2015