Link rot after 8-10 years
Oct. 24th, 2012 01:08 amWhat are the chances that a given <a href="..."> hyperlink will still be valid in the future?
Set 1: Mr. T vs. Everything
A site from 2002-2003 pulled from Internet Archive.
393 total links -214 completely dead 179 surviving in any form - 69 saved on the Internet Archive, 24.4% archival rate (of 69/283) 110 surviving links, 28.0% retention rate 143 total hosts (duplicates include Angelfire, Geocities, etc) - 95 hosts with irretrievable content 48 hosts with no lost content. 60 hosts with retrievable content - 22 saved on the Internet Archive, 21.0% archival rate (22/105) 38 surviving hosts, 26.6% retention rate
Set 2: My links page
Officially last updated in 2003, but I may have added links to it up until 2005.
872 total links -403 dead 469 surviving links, 53.8% retention rate 769 total hosts -368 hosts associated with at least one dead link 401 hosts with no dead links, 52.1% retention rate
The Internet Archive was not checked for this larger data set, but I suspect that it will have most sites that were not configured to deny robot searches.
Summary notes
Link rot is a very serious problem. You can expect 40%-75% of links to be broken within ten years.
Novelty sites seem to have a lower retention rate than other sites. They're funny at one point in time, and then the owner forgets about them and lets the domain expire.
Links to individual articles often died after a website was reworked.
Redirections were counted as surviving links, even though they are more fragile. This includes plain HTML links to the page's current location, as long as the page was there at the new location.