Web scraper discussion

44 readers
3 users here now

founded 1 year ago
MODERATORS
1
 
 

I've been looking to create a local database of cooking recipes for personal use, but doing it manually is quite tedious, to say the least. It takes maybe 5-ish minutes per recipe to navigate the various websites copy the text, create file, re-format the inevitably flawed text into readable ASCII only, and look over the result for spelling, grammar, and readability errors (one guy who made the recipes was seemingly barely literate, could hardly pass 3rd grade English class).

Are there any utilities you are aware of that would make this easier? Obviously the more automated the better, but automating text-pulling from a website, line sizing, indents, list formatting, and easy headers would be the minimum to be "worth it". Useful features would be automated file creation and naming, savable config presets, unified functionality (I.E. one utility that does everything, rather than a web API, a reformatter, and a file-writer, for example). The recipes tend to be in a certain format (Rich text? not sure) that prevents much of the readability from being retained when copied and pasted manually.

I'm looking for one single utility if at all possible, due to not wanting excessive headaches on my end. I'm running Linux Mint. Thanks for the heads-up.

2
 
 

I'm tring to teach myself webscraping using python and would like to small projects/challenges on websites to scrape but more sites that are used rather than built to be scrapped

3
 
 

I usually use fantoccini or curl with the Rust bindings. I try to stick to just one programming language to avoid dealing with interoperability. I have used selenium with Python in the past but it wasn't great.