Scraping For Car Prices
I’ve blogged before about using Python and BeautifulSoup to gather data on car prices. You can input values for make, model, awd/4wd or not, city, state, zip code, and the script would nicely display highest, lowest and average price of a new car within 100 miles of your specified city.
However, there were problems with that code:
First, the python script was not easily portable between Linux and Mac Operating systems. On linux the BeautifulSoup Module is called bs4, and on Mac the module is called BeautifulSoup. For example you would say on Mac,
from BeautifulSoup import BeautifulSoup
While on Linux you would say:
from bs4 import BeautifulSoup
Second, autotrader.com has somehow stopped my python requests module from getting the html contents of their webpages, so my script is technically broken. Time for an upgrade!
For the new tool, I am utilizing the modules PycURL and re. PycURL does a cURL for the html contents of autotrader.com. And re is python’s regex module which I use to parse the html contents of the website. BeautifulSoup parses html, but since PycURL holds the contents of the website in a string, BeautifulSoup was not able to parse the string. Hence the need for re.
Here’s the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
import pycurl from StringIO import StringIO import webbrowser import re def ask4awd(): awd = raw_input("4wd/AWD?(type yes or no): ") return awd make = raw_input("Enter the make: ") model = raw_input("Enter the model: ") ask4awd = ask4awd() cityname = raw_input("Enter your city: ") state = raw_input("Enter your state: ") zipcode = raw_input("Enter your zip code: ") print"" city = cityname.replace(' ','+') carmodel = model.replace(' ','+') if ask4awd == "no": url = "http://www.autotrader.com/cars-for-sale/New+Cars/"+make+"/"+carmodel+"/"+city+"+"+state+"-"+zipcode+"?endYear=2016&listingType=new&searchRadius=100&showcaseListingId=0&showcaseOwnerId=0&startYear=1981&Log=0" elif ask4awd == "yes": url = "http://www.autotrader.com/cars-for-sale/New+Cars/"+make+"/"+carmodel+"/"+city+"+"+state+"-"+zipcode+"?driveCode=AWD4WD&driveCodes=AWD4WD&driveGroup=AWD4WD&endYear=2016&listingType=new&searchRadius=100&showcaseListingId=0&showcaseOwnerId=0&startYear=1981&Log=0" else: print "You must say yes or no" ask4awd() buffer = StringIO() c = pycurl.Curl() c.setopt(c.URL, url) c.setopt(c.WRITEDATA, buffer) c.perform() c.close() body = buffer.getvalue() c = 1 for i in body.split("\n"): if re.search('price-range-label',i): repl1 = re.sub(r'^.*\$','$',i) repl2 = re.sub(r'</span>','',repl1) if c == 1: print "Highest Price: ", repl2 elif c == 2: print "Lowest Price: ", repl2 elif c == 3: print "Average Price: ", repl2 c += 1 print"" site = raw_input("Would you like to go to the site? ") lowersite = site.lower() if lowersite == "yes" or lowersite == "y": webbrowser.open(url,new=2) else: print "Goodbye"
This will work with Linux or Mac, but you’ll need to make sure you’ve got the PycURL module installed. For Mac, I did:
sudo easy_install pycurl
But I don’t know if there were any pre-requisutes, so if you have trouble, try installing homebrew and using homebrew to install pip.
For Linux, you can probably try:
sudo apt-get install python-pycurl.
Once pycurl is installed the script should run happily.

















