i made a little web scraper in python, that can collect all of my posts tagged with a certain tag. like as a tumblr backup strategy!
and i used it on my #slugcat chronicles tag! you can see it here, on my website!! be careful scrolling it tho. all the images are full res png screenshots (you can use pinch-zoom to zoom it), and in total it takes 108MB to load all of them 0_0
also i didn't implement formatting or reblog trails yet, so the posts look super clean!
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
What is scraping? Itâs where you use a bot to cruse the internet for you and collect data into a spreadsheet. Bots can work a lot faster than you and donât get tired or distracted so if youâve got a lot of data to collect scraping makes your life easier.
Now itâs the end of 2019 and youâve got a yahoo group full of old Fanfic Files that youâd like to download before itâs gone forever. Sure you can download everything by hand or use a custom script to scrape it perfectly into folders quickly but lets be honest, you ainât got time for that shit - mentally or physically. Your best bet is using a free scraping tool and leave it going while youâre at work so you can come home to 1,000s of ancient fandom fics. Sure this inelegant solution will perform so slowly youâll wonder if itâs a geriatric, overweight snail powered by farts but f*ck it, at least you wonât have to sit at the computer clicking on every single htm and txt file in your yahoo group right?
Here are some helpful steps below the break.
Step One:
Create a folder for your downloads. With Chromeâs settings you can then switch your download folder to this new folder.
Trust me, you donât want your thousands of historical fic mixing in with all the other porn crap in your downloads folder. The scraping wonât be maintaining subfolder file structure so youâll have all your groupâs files in one heap of a mess. Possibly half of them helpfully named âChapter_One.htmâ The less sorting you have to do later the better.
Step Two:
Install a browser extension scraper. In this case weâre using Chromeâs Web Scraper:
Once installed you can access it on any webpage by opening Chromeâs developer tools using F12 and selecting âWeb Scraperâ from the toolâs menu bar.
Note: Youâll want to dock the developer tools window to the bottom of the screen using the buttons Iâve highlighted below. vvv
If you want more details on how to use Web Scraper, click on the blue web symbol that will appear to the right of your address bar in Chrome. Youâll find tutorials and documentation. I didnât read any of it. Why any of you are listening to me is a mystery. But if youâre happy with the blind leading the blind, jump to our next step.
Step Three:
Now we can import your Sitemap for scraping. Start by navigating in Chrome to the yahoo groups file page. Something like,
 https://groups.yahoo.com/neo/groups/awesomefanfic/files
Open the Web Scraper in developer tools (F12) for this page, if you havenât already, and click âcreate new sitemapâ>âImport SitemapâÂ
Then paste the following code into the "Sitemap JSONâ field:
Youâll going to need to change where it says âYOUR FILESâ to your yahoo groupâs files page. Copy it from the address bar. It will look something like:
"startUrl": ["https://groups.yahoo.com/neo/groups/awesomefanfic/files"]Â
You may want to rename your Sitemap in the second field. Otherwise it will be named âtestscrape1âł
Once youâve made your changes, hit the import button and youâre ready to start scraping.
Note: This imported JSON code provides a Sitemap structure designed to download all the documents up to a level two subfolder and place them all into that download folder. It will also create a spreadsheet record of each fileâs location.
Step Four:
Now we Scrape.Â
Once it starts scraping it will open a new window and slowly (SOOO SLOWLY) start downloading your old files. Note: Do NOT close Web Scraper window while it is running a scrape or you will have to start again from beginning. Again, it will not be downloading folders so all the files will be dumped in one big heap. Luckily the spreadsheet it creates showing where every file came from, will help.
Step Five:
Once youâve walked the dog, gone to work, eaten a feast for the elder gods, and danced with your coven at midnight the scraping should be about done (results dependent on number of grizzled files to download). The Web Scraper window should close on itâs own. Youâre tomb-of-ancient-knowledge will be ready in the location you specified in step one. Itâs possible the Scrape missed some hoary files because yahoo was refusing to open a folder or the file was in a deeper subfolder than level two (the bot checking for subfolders is why this scrape is so slow so I didnât go deeper). For best results you will need to review the scraped file data and compare it to what you downloaded. Use âExport data as CSVâ and then use your new spreadsheet to rebuild your file structure and check for any missing files. NOTHING listed in the âdummyâ column will be downloaded. You will have download everything in that column (and deeper) manually.
With luck, only a handful of files will need to be downloaded manually after your review. Treasure your prize and if youâre a mod of the group, help save the files for the gen z fic readers by sharing your hoard on Hugo-award-winning AO3.
Bonus (Expert Mode):
Using the spreadsheet and batch file scripts you can recreate your file structure quickly and move the files into the new folders. A few changes will turn your list of urls into folder locations on your PC and bam! Youâve got a everything organized again! A few things to watch for: 1. You will have to remove spaces and special characters from folder names. 2. File names should be in double quotes in your batch file or have all spaces removed. 3. Files with the same name from different folders (Like chapter1.htm) should probably be done manually before running the batch to move the rest.
-------------------------------
Thank you for reading.
Donât like my jokes or my solution? Feel free to call me out and post better solutions here. Hope this helps!
Free & Easy Way to Get YouTube Video Transcripts â Step-by-Step Guide
Want to extract a transcript from a YouTube video without paying for tools? This guide walks you through the easiest free methods to do itâno tech skills needed!
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
â Live Streamingâ Interactive Chatâ Private Showsâ HD Quality
Anya is LIVE right now
FREE
Free to watch ⢠No registration required ⢠HD streaming
Discover how ScrapingDog simplifies web scraping with its fast and reliable API. Learn to extract data from any website without dealing with proxies, CAPTCHAs, or complex setups.