Remember Standing Rock.
Remember the Muslim travel ban.
Remember Afghan and Iraqi interpreters.
Remember Charlottesville.
Remember Heather Heyer.
Remember Stephen Miller.
Remember Joe Arpaio.
Remember Puerto Rico.
Remember the lifting of the ban on military gear for police.
Remember Bears Ears.
Remember "shithole countries."
Remember the DREAMers.
Remember child separation.
Remember the Black Voters Matter bus in Louisville, Georgia.
Remember Citizens United.
Remember Cambridge Analytica.
Remember dominionism.
Remember Steve Bannon.
Remember Roger Stone.
Remember Merrick Garland.
Remember Obamacare repeal.
Remember Betsy DeVos.
Remember the tariffs and the trade war.
Remember net neutrality.
Remember "the enemy of the people".
Remember the emoluments clause.
Remember Paul Manafort.
Remember Michael Cohen.
Remember gerrymandering.
Remember the federal deficit.
Remember the tax fraud.
Remember NATO.
Remember Helsinki.
Remember Montenegro.
Remember Maria Butina.
Remember "I like people who weren't captured."
Remember "if we have them, why can't we use them?"
Remember Xinjiang.
Remember Rodrigo Duterte.
Remember Jared Kushner and Mohammed bin Salman.
Remember the Rohingya.
Remember refugees.
Remember HIV and HPV.
Remember Recep Tayyip Erdoğan.
Remember Trump "fell in love" with Kim Jong Un.
Remember Jamal Khashoggi.
Remember "grab 'em by the pussy".
Remember the global gag rule.
Remember Roy Moore.
Remember Roe v. Wade.
Remember Christine Blasey Ford.
Remember the Paris Agreement.
Remember Scott Pruitt.
Remember ANWR.
Remember the trans military ban.
Remember "he wants to hang them all!"
Remember Marjory Stoneman Douglas.
Remember the NRA.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality
Anya is LIVE right now
FREE
Free to watch • No registration required • HD streaming
New and improved version with additional methodology and additional pleading!
(Before the Trump administration can take it all down)
Let me say first that there are better ways to do this. If you're working with an organized team of people, or if you understand web and database architecture, this is not how you want to go about it.
On the other hand, if you're solo, in a hurry - if, say, news outlets are reporting that the EPA's climate change site may be taken down - and you basically just know how to use a browser, this is better than nothing.
(Interrupting with an edit at this point. Stuff is already disappearing, and it's not all environmental in nature. This is urgent.)
All you'll need is:
a browser
a spreadsheet program like Excel or Google Sheets
and maybe a text editor (something as simple as Notepad is fine)
If you've got those, you can start feeding content into the Internet Archive's Wayback Machine in minutes.
Step 1: Collect a batch of URLs for content you want to save. There are a few ways to do this.
NEW: Use a free webcrawler to compile a list for you. Xenu's Link Sleuth (and yes, the name, I know, but focus) is a free download that will start from a URL you select and compile a list of URLs linked by that page and by any subpages. Its express purpose is to find broken links, but it works fine for this, though there are some things to be aware of, neither of which are really its fault. One, it can't crawl anything that has been set up to block crawlers, and some federal pages are in fact set up that way, and two, as fast as it is compared to a person, crawling a large site or subsection thereof will still take it a long time, and you won't get your list until it's done. For perspective, it's at 71% done on its current job for me, for a large subsection of a .gov site, and it's been running for about 24 hours. So if you're concerned about an imminent threat to a specific set of pages or files, do something about those more quickly, and have this going in the background. It's very simple to use, though. Once it's installed and open, go to File > Check URL..., enter your starting point in the top text field, make sure Check external links is on, and click OK. When it's done, there'll be a pop-up asking if you want a report. You want a report. Click yes (and click Cancel if you get a popup about FTP stuff after that) and you'll get a new browser tab (for a local temporary file, don't worry) displaying it. The URLs you want are in the List of valid URLs you can submit to a search engine: section.
Google site:<whatever it is> sitemap and you may turn up either the sitemap itself or a robots.txt document that will tell you where the sitemap(s) is/are. Government sites don't tend to expose these, but if you're lucky enough to find one, just clean it up in your text editor of choice and you'll have your list of URLs.
Google site:<whatever it is> <relevant term> (like, say, climate), cut and paste the Google search results into column A into a new spreadsheet one page at a time, and then sort column A alphabetically. All of the URLs will be clumped together, since they'll all start with h. Delete the other rows. There's your list. (It's probably worth creating a throwaway Google login just so you can alter your search settings. Opting for 100 results per page instead of 20 will really speed this up. You should also keep in mind that longer URLs may be partially replaced with ellipses in the search results, and that URLs containing ellipses will not work for what we're doing. You'll have to search-and-replace them back to their original state, or accept that they won't work.)
If you're dealing with a relatively small page or site, you can collect the URLs manually by right-clicking every promising-looking link, selecting Copy Link Location or Copy Shortcut or the equivalent, and pasting the results into your spreadsheet, one per row. This is a pain in the ass but there may be no better way to be sure you're getting as much as you can, especially on sites like the EPA's that go in for blocking crawlers.
Step 2: Paste your list of URLs into a spreadsheet, all in column A. If you used the search results method you already have this.
Step 3: Deduplicate column A, if you're using Excel. Google Sheets doesn't make this easy, so skip this step if that's what you're using.
Step 4: Paste the following string into cell B1 in your spreadsheet, with no spaces on either side: http://web.archive.org/save/
Step 5: Enter the following formula in cell C1: =CONCATENATE($B$1,A1)
Step 6: Drag that formula all the way down column C, for as many rows as you have entries in column A. (So column C should be full of entries that look like http://web.archive.org/save/http://www.whateversiteimsaving.gov/somepage.html .)
What's the point of this? The point is that the command to save a page to the Wayback Machine can be communicated as part of a URL, so entering any of your column C entries in a browser will upload the page in question to the Internet Archive. (Go ahead and try it with a random page if you want to test it.) Now you just need to open all of these links in quick succession.
Step 7: Go to https://httpstatus.io/ in a new browser tab. The "real" purpose of this site is to check whether batches of URLs are actually resolving correctly, as opposed to running into a 404 or some other kind of error. But it does this by, as you might have guessed, opening all of the URLs automatically and invisible and telling you what happened. There are other pages out there that do pretty much the same thing, though I haven't tested them all and urlitor in particular doesn't seem to work for Wayback Machine uploading.
Step 8: Paste up to 100 of your column C URLs into the field here and click Submit. You'll quickly (within seconds) get your results: anything with a green 200 badge has been uploaded to the Archive successfully. Anything with a red or yellow badge hasn't.
If you want to make sure one of your uploads worked, paste the column A URL into the BROWSE HISTORY field at http://archive.org/web/web.php , click the button, and see what comes up. If your upload worked, you should see a calendar with a circle around today's date. Click on or mouse over the date to access the version(s) archived that day. You can use the same function to see if a certain page has already been archived recently, too.
As for what to archive? Anything that looks useful. (NEW NOTE: Be proactive when you're reading the news. Those articles about rat poison at Standing Rock mention an EPA document, and a search for rozol standing rock site:epa.gov still turns it up here at the moment, but when I manually threw that URL into the Wayback Machine yesterday, as can be done one link at a time in the lower right corner of their homepage, it had never been archived before. This isn't me boasting. I'm not awesome. This is me finding that very alarming. This kind of search can be done any time an article mentions government documents that May Be Important.) PDFs can be uploaded just fine, and are a good bet on a government site; they're probably charts, forms, official publications, presentation slideshows, or transcripts. Excel files are an even better bet. If you figure Trump and his friends would hate it, save it.