Currently running a task that checks the http status on all the out bound URLs from my blog - all 1,482 (with obvious 200s removed, like github or youtube).
Once complete, I'll check for 404s and replace the URLs with links to the latest 200 copy from the archive.org project.
If it goes smoothly, I will write it up...
Problem with urls that are soo old, you still get 200s, like this one http://chrisbewick.com/blog/events/full-frontal-2009-back-to-brighton/ (read the url, clearly not supposed to be that!)
@rem yup, I had a WordPress plugin that checked for broken links and replaced them with archive.org links.
Worked perfectly for actually broken (40X, 50X, dead servers) but no way to detect domain takeovers.
Sometimes only a manual check will do.
@Edent honestly, though it sucks, it's probably worth just getting the first 200 result from archive.org and _always_ using that…
@rem, @Edent, love the archive.org approach but personally, decided to correct links to the archived version just before the post or comment was published (not the most recent one at the time of broken-link detection).
Harder to automate (and I haven’t automated that part) but for old entries, it seems to better reflect the vibe of the time.
@rem @j9t @Edent it's not too bad, you can construct a raw url for archive with a date in it, and it will give the closest capture to it.
For example make a URL like
https://web.archive.org/web/20120914000000id_/http://kevinmarks.com//
Where the '20120914000000id_' part is the iso date (and time if you care), and it will redirect to the nearest capture (in this case https://web.archive.org/web/20120806141355id_/http://kevinmarks.com/)