The Challenges of Internet Storage – Marketplace

The internet is where much of what happens in our world is stored. But where is the Internet stored?

There are projects all over the world, like the Internet Archive, to try to preserve some online content.

Marketplace’s Meghan McCarty Carino spoke to Kayla Harris, professor and director of the University of Dayton’s Marian Library, wondering if the current archiving work is enough.

The following is an edited transcript of their conversation.

Kayla Harris: I’d say no, because it’s a mix of both the technical challenges of storing this huge amount, but also, I think, it’s about the human side of things and getting people to care about why we’d want to preserve that stuff in the first place . And I think there’s this very common misconception that, well, if it’s on the Internet, it’s there forever. And so there’s no understanding that no, it’s not necessarily there forever and someone or something has to save it if you want it to be there forever.

Meghan McCarty Cute: Can you think of any kind of prime example of something that won’t be there forever on the internet that you wish was still there?

Harris: I think part of it is, even though sometimes a website might still be there, websites often have dynamic content. So while the website itself might still be there, say a news website that is constantly changing with the latest headlines, even if the website itself might still be there, perhaps the flash-in-the-pan type of news it is not . And, you know, especially during COVID, for example, a lot of institutions, a lot of organizations, you know, they were putting things on their website, like, “closed now,” “we’re closed indefinitely.” And then when things would open up, they would update that, right, because you want people to have the most up-to-date information on your website. But if the page wasn’t archived, when it said something else, then it’s gone. And that part is a little bit harder, I think, for people to understand. Web pages and websites are so dynamic that the whole thing might still be there, but not the individual pieces.

McCarty Cute: I actually thought about it a lot. When I think about, you know, historically documenting the pandemic and looking at, you know, Ken Burns documentaries, where there’s so much written material about these different passages of history, and so much of our pandemic documentation is just digital material that could not always be around.

Harris: Yes, I mean archivists, heritage professionals, especially during the pandemic, a lot of them were drawing comparisons to the 1918 pandemic and the kind of materials, documents and even personal accounts that we had then. But how is that stuff communicated today? And if it’s online, we have to actively save it, otherwise it won’t be there for people of the future to be able to compare it to the 2020 pandemic.

McCarty Cute: What are some of the technical barriers you mentioned to archiving internet content?

Harris: One that’s just a little simple is that websites are meant to be dynamic. Unlike archiving or collecting other material heritage material, whether it be books, artifacts, etc., they are stable, they have a kind of ‘fixity’ whereby websites are constantly changing. This can be the home page content, it could be the design style you know, the first websites we’re using that are really cool, flashy HTML gifs, and that sort of thing. And then we update and now make our websites more accessible and adaptive. But also things like URLs. Part of the kind of deception is that there really isn’t a clear consensus on what constitutes a website. Is it its content? Or is it the URL, the domain where it resides? And on the Internet Archive’s Wayback Machine, which people can use to browse and see these previous iterations of captured websites, it’s by domain. So sometimes those change. So something like, you know, may have always been CNN. But if there was another URL that was owned by someone else, then that content will be there and it’s harder to track that history.

McCarty Cute: I can also see sort of a challenge of, you know, whose domain is this? Because one of the things that makes the Internet what it is is that it’s just kind of an open network that nobody is responsible for. So who is in charge of filing it?

Harris: Exactly. And here, I think, is where that human side comes in again, and unfortunately, or perhaps fortunately, the human side is also biased. So just like in a physical archive, there is an archivist or many archivists who sort through the materials they deem to be historically valuable, which somehow preserve the cultural heritage. And so there’s going to be a bias inherent in that, because what I think is important to future generations might not be exactly the same as what someone else thinks is important. There really isn’t the ability to archive everything, so what gets archived is often selective. And that could be selected by one person, by groups of individuals. But some prejudices about which cultural heritage is worth saving on the Internet will be introduced.

McCarty Cute: What kind of ethical concerns does all of this raise?

Harris: I think because of the way we think of the web as something dynamic and changing, not everyone who creates web content, whether it’s a website or social media for example, expects it to be permanent. And so, you know, you get into some ethically dicey situations when you think of things that people intend to post online as some sort of ephemera, and then from someone choosing to archive it without permission it’s a whole other thing to, you know, find the creator asking permission, etc. Is it really okay for that person to save someone else’s content, or for that organization to save another community’s content? I think this comes up in some protest movements. Sometimes archivists and others included, get caught up in this idea of ​​”Well, we have to document, we have to preserve.” But for things like protests or rallies, people who are physically there in person don’t necessarily count on having their photo taken and put online and then archived forever.

McCarty Cute: What kind of content are you most concerned about losing forever?

Harris: There is a collective right now called Saving Ukrainian Cultural Heritage Online, or Sucho. And when we think about the destruction of cultural heritage, sometimes it’s easy to see how, well, if somebody is bombing another country and destroying these world heritage sites, it’s tangible and easy to see. But what also happens when websites are taken offline and the online things that make up their cultural heritage? Another one that also stands out for me is local journalism. And there was a study done by the Tow Center for Digital Journalism at Columbia University a few years ago, and they called it “The Dire State of News Archiving in the Digital Age,” because they talked to these smaller news organizations about what they had in place for archiving their content. And most of them either a) didn’t know what that meant, or b) thought that if they had a Google Doc and they had it backed up somewhere, that was archiving. And so, I think that causes a lot of concern. Local news is really important to understand in a community. And again, back to the human question: people have to want to care about preserving that content.

McCarty Cute: So what do we lose if we lose that type of content?

Harris: I think, you know, our cultural heritage, our humanity. I’m sure everyone wrote a social media post or tweet or something along the lines of “Yeah, this doesn’t need to live on forever to document what it was like to live in this time and age.” But there are many other things on the Internet that are worth saving. There is a quote from one person, an academic, Megan Sapnar Ankerson. And she said, “It’s a lot easier to find an example of a 1924 movie than a 1994 website.” And so that physical medium, you know, the arts, the humanities, it’s all captured in this physical, in books and movies, plays, operas and that kind of thing. That cultural heritage happening online now, if we don’t save it, what will future generations look up to?

Speaking of leaking content, Google recently announced updates to its inactive accounts policy.

Basically, if an account has been inactive for two years or more, there’s a chance it will be deleted altogether.

This has people worried that the same would happen to inactive accounts on YouTube, which is owned by Google, meaning that videos more than a decade old, some of which defined content from YouTube’s early days, would also be deleted.

A Google rep later clarified that this rule wouldn’t apply to YouTube accounts, so for now, it seems those early genre-defining videos like “Zombie Kid Likes Turtles” or “Keyboard Cat” still won’t be lost to posterity.

#Challenges #Internet #Storage #Marketplace

Leave a Comment