Jan 24: Scrapy - effectively
To use scrapy effectively it all boils down to reading this tutorial -
http://doc.scrapy.org/en/latest/intro/tutorial.html. Gist of it - -Directory/ project files structure -Scrapy configuration file -spider file
Once you have these bare minimums ready, now comes the XPATH fun. So the deal is that scrapy allows you to specify XPATH selectors to parse the page effectively. Some of them I found useful -
//text() - selects all text. Whilst /text() only selects the top level element without children.
.selector('..') - selects parent of the found node. @href - for anchor links and .selector(//div[contains(@class,'jaja')] - all divs with class jaja.
So you got the drift on that one.
Now a handy feature to explore the effectiveness of scrapy is the scrapy shell. Use scrapy shell 'URL' to launch the shell and 'shelp()' to know more how to traverse around it. You can experiment all your selectors over there.
Happy scraping















