Saving Western Civilization Part 2
Yesterday I suggested that vendors of external hard drive backup products should include the full Wikipedia as a part of their offering. I averred that it would be a shame to be deprived of this critical resource if the internet was destroyed by the crazies. I received a torrent of comments.
A torrent, i.e., more than one, is unusual for this blog, which often goes many issues with little more than a harrumph from the odd millireader. Even better, the replies revised and extended my suggestion. In particular, Kirk had this to say:
I like your Wikipedia idea a lot! I would even go so far as to > encourage you to broaden your net to include other irreplaceable items. Some random suggestions:
Google might be a
particularly good partner in this endeavor as they could easily add a
slight hack to search for "local" results
I'm getting more
excited the more I think about it :)
To which I replied:
suggestions all! Actually, I've always wanted my own copy of the whole
internet. If I had a program that watched where I browsed and
transferred everything I saw and everything linked to it to a special
hard drive, it would come close. Unfortunately, if everyone did that,
that would succeed in shutting down the internet without cybercrimes.
As I indicated, there is no reason to limit the information to Wikipedia. Kirk's suggestions are good ones, and I would love to have the complete series of HP Journals so readily available. Of course, HP and the general public might feel different about such a specialized and well-illustrated byte absorber, albeit for different reasons. The difficulty is in winnowing the great mass of public domain material to that generally regarded as useful and is also conclusively in the public domain. I'm sure the vendors would be looking for commercial advantage, not an endless series of discussions, arguments, and even lawsuits. Wikipedia is available, free, and non-controversial in its usefulness. Another comment from Sam pointed to the on-line availability of Wikipedia dumps, although their format is so specialized that a normal PC user won't want to be bothered.
I ended yesterday's blogitem with this promise: I was going to go into the technical details of how this could work. I shall discuss that now, after my blush of enthusiasm morphed into a morass of practicality.
The Morass of Practicality
Although other sites are extremely popular, they don't suggest themselves for this application. Obviously E-commerce sites such as eBay and Amazon and social sites such as Facebook and Twitter depend on continuous connection to the internet to perform their function. News sites would turn into archives overnight, and the good ones would lose their ability to charge for searches, so they won't go for it. On the other end of the scale, relatively stable personal sites and blogs such as this one could usefully be archived but each would be of value only to a small number of fans. One might argue that Wikipedia isn't stable enough to be archived, but that's only partly true. Of course the corpus of Wikipedia is changing by the second, but individual articles might be edited anywhere from daily to never, depending on their currency and general interest. Eventually, the archived Wikipedia will become stale. Which, of course, gives everyone the opportunity to sell and buy and donate to it all over again!
Let's say you have bought a hard drive with a complete, searchable "snapshot" of Wikipedia installed. It will also have a small program, perhaps a browser "plug in," that can divert the Wikipedia URL to the hard drive. Since Wikipedia users will, at least before the deluge, want the current version of the article if it is newer than the disk-based one, this program will:
I'm not sure how complex the software must be that is provided on the external hard drive. But since these are sold by the hundreds of thousands in chain stores, the development cost will be spread out over enough units to make it minimal. The job of adding a hash or a byte-count header falls to Wikipedia, but that can be accomplished by a script that takes a few milliseconds to run every time an article is updated. Now what about you, the owner of your own personal Wikipedia?
Flaws and Energy Efficiency
There is one flaw in my scheme: In order to be transparent and appear instantaneous to the user, the external backup hard drive must remain on all the time. Normally these are powered up only when needed. Although this is a small increment of power— roughly equivalent to a cellphone charger or similar wall wart, it isn't nothing, and adds up over hundreds of thousands of units. To be "green," it would be nice to keep the power drain low. Here are a few semi- or full solutions.
Personally, I'd pick the third solution. I'm impatient, but not terminally so.
Although I promised "action" in the form of beseeching the external drive manufacturers to consider this, I've deferred epistolary activities until I finished writing part two, whose end we have just reached. I'll do it shortly.