Data Mining With Python & Pandas - 2 of N - Why Indians Should Be Happy That Germany Won Against France
It's the 13th minute and Hummels just scored! The excitement!
In one of the earlier posts, I had mentioned about the data.gov.in website which is really a fantastic place to explore some interesting data.
I have taken this particular data source to decide whom to support in today's FIFA World Cup 2014 Quarter Finals.
Let's get started from where we previously left off.
df.head()
The data is loaded and ready to go.
Germany vs. France, let the game begin!
Before we begin with meddling around the data that we have in our hands, let's just look at this snippet from Wikipedia:
According to the Department of Commerce, the fifteen largest trading partners of India represent 62.1% of Indian imports, and 58.1% of Indian exports as of December 2010. These figures do not include services or foreign direct investment, but only trade in goods.
Well, that's one goal to Germany!
But what does the data really say? Well, let's find out...
#some assignments to speed up things later on country = 'Country exporting to India' value = 'Value (INR) - 2012-13' #create a filter fromGermans = df[country] == 'GERMANY' #slice the dataframe germany = df[fromGermans] germany.sort(columns=[value], ascending=False).head(10)
Interesting code isn't it? I'll explain it line-by-line in just a moment, but now, let's take a look at what the above lines of code produces:
Those are the top 10 goods that India imports from Germany ordered descending by how much India had to spend on each of those - i.e - Costliest on top.
For some reason however, data.gov.in hasn't updated the quantity of import for most of the goods in the top 10. Weird!
Okay, let's get back to the code. Pandas does some really clever data indexing, so once you've loaded data into your DataFrame, they can be selected, sliced, drilled-down, etc. in any manner you want (and in some really clever ways that you will find out exclusively on pythonplay.com - I couldn't resist a marketing pitch. The effects of late night blogging after watching the World Cup quarters I suppose! )
Also, in another quarters, Federer won and moves on in the Wimbledon to the next round.
What I'm doing here is basically called boolean indexing:
#create a filter fromGermans = df[country] == 'GERMANY' #slice the dataframe germany = df[fromGermans]
I create a filter / criterion for slicing the DataFrame - notice that it's a vector operation, but essentially Pandas gives you the power to do it by pretending that it is a scalar value.
Hold on, France is attacking...
fromFrench = df[country] == 'FRANCE' france = df[fromFrench] france.sort(columns=[value], ascending=False).head(10)
Ah, but - How much more do we spend on German goods than French goods? Turns out that number is - 560723316290! I can't even comprehend this number at one look.
57000 Crores.
germany.sum()[value] - france.sum()[value]
India imports stuff from Germany that is worth INR 57K Crores more than that from France!
So what if the Indian economy is influenced by all this? We just want a good game of football, don't we?
Germany 1 - 0 France.









