Scraping Movies Information From http://www.justdial.com.
      Comments                  Â
I have created this blog for making notes to myself and other persons who are interested in scraping information from the Internet sites, automating routine operations using different programmable tools.
The first task I decided to tackle was scraping some information from the site Justdial. I have to grab all movies, showtimes, information on Chennai cinema theaters like name, address and telephone number. To get this information, I need analyze the first page of the cinema theaters that justdial.com return us. The address of this page is http://www.justdial.com/Chennai/Cinema-Halls/ct-7451/page-1. Using Firebug tools of the Firefox browser I can get all necessary html tags that contain useful information.
Unfortunately, when I examined the page of a particular cinema theater thoroughly, I noted that movies titles and showtimes appeared on the page by means of JavaScript. Luckily, pythonists have Selenium, a great framework for getting, testing and analyzing web pages. So, before parsing a web page with information loaded by JavaScript, we have to use a webdriver Selenium suggests. For my task I chose the PhantomJS headless browser. BeautifulSoup is the next great python library that helps me to parse the DOM tree the Selenium webdriver returned.
Armed with this information, let's start coding.
http://blablup.com/posts/scraping-movies-information-from-justdial.html