The Internet is disappearing before our eyes, and nobody seems to notice

Websites are deleting their older articles. Sites are closing down. And material is potentially being lost forever...

website deleted Page Not Found: lead image of a blank page with a question mark on the site

There’s a running joke that the majority of entertainment news websites now are, basically, screenshots of one of five other sites that continue to pump out stories. Everyone else passes it on second hand, ideally at least with some contribution or opinion on it. But usually with 400 words of Google-teasing guff before they get to the embedded Tweet.

It was not always so, but you could be forgiven for thinking otherwise. For whilst the world wide web was originally underpinned by ideals that had amongst them democratising writing and material, what’s actually happening is a slow erosion of its archive.

Don’t believe me? Go looking for a movie news story that was published in and around the turn of the millennium. The Wikipedia page for the superb animated film The Iron Giant actually is a good starting place. Head to the bottom of the page, to the long list of references. How many of those links there are still active? Quite a few. How many lead to a page that’s not the specific article that was originally there? Quite a few also. Notwithstanding attempts to archive old articles – that I’ll come to – quite a lot of the articles about the film have simply disappeared. Sometimes behind a paywall, but also, oftentimes, they’ve just gone. Kaput. Not to be found anywhere else.

Page Not Found error message

This has been happening a lot, but barely anybody is talking about it.

It’s no secret that websites, just like other publications and outlets, come and go. Furthermore, websites go through sizeable restructures and redesigns, the technology underpinning them evolves, and back material is sacrificed as it’s ‘not compatible’. These are the accidental erasures, when a redesign loses years of comments or material, with few discernible ways of getting any of it back.

The sites that vanish altogether also take a lot of work with them, without leaving even an archive behind. Digital publishing simply doesn’t have the permanence of print, and anyone who tells you otherwise should be looked at with untrusting eyes.

In my past, I edited a weekly computer magazine for just over a decade. Most of the material ended up on its website. Said magazine closed down a few years ago, and the entire archive of articles has gone with it. Were it not for the print copies in my garage, and some unreliable-looking archive discs, the material would be gone forever. Even so, without an appointment at my house, you won’t be able to read it. Even then, I’d want a nice coffee.

For web journalists, this is now the perilous nature of the beast. One relatively prominent American film reporter I know has lost work several times when differing websites have merged/gone under/gone in a different direction. With the click of a button, the result of some decision way, way out of his hands, the online presence of his work has gone, and he’s not been able to get a lot of it back.

Much of this, to a degree, is known. What’s less known is that some websites are increasingly deleting older material, sacrificing it at the altar of Google. The ever-changing Google search algorithm – surely the rotten heart in the midst of online journalism – prioritises fast-loading websites. One way to get a website shifting quicker is to shed material from it. To give a smaller database of articles for it to host.

You’ll be heartened to know there’s a term for what this material is known as: “thin content”.

In this case, it might be duplicated material. It might be stuff that’s been syndicated from elsewhere. But it also might be news stories about long-released films. Of little immediate value to anyone, but also, nobody goes through old newspapers and deletes what’s on page 23 to save a bit of space. As archivists and historians will tell you, sometimes it’s what’s in the margins, or tucked away in small pieces that otherwise seem irrelevant, that leads to gold. At the very least, it’s part of the story of something. Yet generally algorithms are deciding what’s worth keeping and what it isn’t. Even on sites therefore that have been up and running for years, there are articles disappearing, to keep everything running that little bit quicker.

A fair trade off, some would argue. Heck, who hasn’t been cheesed off at the speed a site loads? Conversely, surely the reason to visit a website isn’t the pop-up banner, the request to send you alerts, the auto-playing video, the newsletter to sign up to, the advert that overlays your screen or the array of clickable ads inviting you to see what a woman who was once 18 now looks like now she’s 50. It’s for the article or video you wanted.

Is the answer to cut back on any of that? Is it heck. Instead, editorial archives are being pummeled, streamlined and stripped back, because that’s the stuff that’s harder to – here it comes – ‘monetise’.

Archive.org website frontpage

The frontpage of the website archive.org

There is some fightback. The first time I discovered the site archive.org, it was a nice novelty. An Internet archive, that’s keeping snapshots of how things used to be online. Now it feels damn-near essential. It’s the major resource that’s fighting back against corporations deleting material, and at least giving a place for some of it to live on.

Why can it do so? Because it’s non-profit. Every decision isn’t being determined primarily by the pursuit of the pound or dollar. And it’s doing the lifting that far better resourced corporations should be doing.

Yet there’s only so much it can do. It can’t archive everything, and there’s a lot being lost. My own skin in the game is that of my own work, some has been lost to ‘thin content’, some has been lost to a site merger where the algorithm picked an article on the same topic from a different writer, and some has been lost because of sites simply disappearing. Conservatively, I reckon hundreds of thousands of my own words have gone. It’s not just happening to me. It’s happening to writers across the web, and in virtually all cases, the first these writers known about their work going is when it’s gone.

My ego isn’t buffed enough to suggest that my own words disappearing is much of a loss to the human race. Heck, that news story I wrote about who’s directing the first Guardians Of The Galaxy film inevitably is of little interest now.

But, to paraphrase the villainous Hans Gruber from the first Die Hard film, sooner or later they’ll get to someone or something you do care about. And it’ll be gone. By the time you notice the work has disappeared, the mighty corporate that hit the delete button will have moved on to its next pressing profit-maximising decision, and it’ll be left to a non-profit archiving website to act as a safety net. In the meantime, the same five websites keep getting screenshotted, at least until their time comes as well.

Even as the world wide web grows, it’s narrowing. And its own active history is being removed, with not an eyebrow being batted. And yeah, I’ve backed this piece up, just in case…


Leave a Reply

More like this