A talk by Gary Bernhardt from PyCon 2014
seen from Germany
seen from Türkiye
seen from China
seen from United States
seen from United States
seen from China

seen from United States

seen from Canada
seen from Singapore

seen from T1
seen from Germany
seen from Türkiye
seen from United States
seen from Yemen
seen from Australia
seen from China
seen from Malaysia
seen from United States
seen from Canada
seen from India
A talk by Gary Bernhardt from PyCon 2014

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch ⢠No registration required ⢠HD streaming
M. deHaan Ansible slides from PyCon2014 MTL
speakerdeck. And here is the actual talk.
Estreando o tumblr evoluxdev com a talk do Alex Gaynor: Fast Python, Slow Python da PyCon 2014, com os fatos e mitos sobre o desempenho e velocidade do Python.
Kate Heddleston's PyCon 2014 presentation:
So you want to be a full-stack developer? How to build a full-stack python web application.
Kate Heddleston
Audience level: Intermediate
Category: Best Practices & Patterns
Description
This is a talk about building full-stack python web applications where you manage every part of the application yourself. I will walk through how to setup a production server with your web application code, a local development environment using vagrant, and how to deploy from your local environment to production. I will also walk through python and Django libraries that will make your life easier.
Abstract
Since I started doing free-lance python web-applications work, I've been getting a lot of questions from friends about how I became a full-stack python web apps developer. Some of my friends are strong front-end engineers that want to do more API level programming, others are back-end/infrastructure engineers who to know more about human computer interaction, front-end, and feature building. This talk will focus on how to build a full-stack web application and the python specific research that I have amassed over the past several years.
My goal with the talk would be to give people a conceptual understanding of all the different parts of a web-application first. Giving them the big picture will help them to organize the more specific technical information that I give them as I go on.
I will then drill into the different major parts of the application and present them with the research I have collected about how they can set things up, what tools they can use, what the trade-offs are, etc. The major sections include production server configuration, local development environments, deploying, and integration with third party sites. I will also touch on monitoring, exception handling and notification, and a few other things are not necessary for individual projects but important as you start to scale.
Next, I will talk about the application code layer and many of the python libraries and tools they can use to make it feasible for them to build a full-stack web application on their own. An example is django-supervisor, which is a pip installation that provides a very simple interface that allows engineers to use supervisor without messing with the conf files on the machine.
I also plan to offer party favors in the form of starter code repositories and instructions. My goal is to make learning different parts of the python stack more accessible to people, giving them some toys to play with and a place to start. Ultimately, this can lead to more well-rounded engineers who understand how to go about researching and gaining proficiency in new parts of the stack on their own.
PyCon 2014
So, I went to PyCon US this year. The short of it was that PyCon is amazing. Easily the best organized convention Iāve ever been to. I havenāt been to too many tech conventions, so that may not be saying much.
PyCon US is partitioned into Tutorials, Talks, and Sprints . This year there were two days of tutorials, three days of talks, and four days of sprints. The tutorials are two blocks of three hours classes. The first two days of talks had a Keynote at 9AM until 10AM and then five different speakers every 45 minutes from 11AM to 5:40 PM. The topics covered everything from machine learning, to garbage collection, to security, Ā to analyzing rap lyrics.
Hereās the tutorial (https://us.pycon.org/2014/schedule/tutorials/) and talk (.pycon.org/2014/schedule/talks/) schedules if you want to get a feel for the topics at the conference.
My favorite part of PyCon, however, were the sprints (https://us.pycon.org/2014/community/sprints/). Sprint leaders (mostly project leads, but some werenāt) would describe what the state of their project was and what they would like to accomplish by the end of the four days. A very abbreviated list of projects sprinted on are: CPython, PyPy, IPython, Apache Allure, Twisted, SaltStack, Mercurial, Django. Many of the core developers for these projects are present at the sprint and are willing to teach or mentor in exchange for some grease work.
I feel that working on CPython and talking to the core developers was hugely beneficial to my personal development as a programmer. Just being in the same room as the Mercurial sprint taught me a lot about how revision control works and what trade-offs Mercurial has made in comparison to other revision control systems.
Ā Ā Ā One thing I found pleasantly surprising was the number of not only female attendees at PyCon US, but the number of female speakers. According to the opening statement, PyCon US had a little over 100 talks and 33% of the speakers were female. Many tech conferences boast when they have 1-3 female speakers and a 5-10% female attendance. I was really struck with how open and inviting the conference felt as a whole.
As with all endeavours in my life, I made a couple of large (and in one case, costly) mistakes. The first of which is that I didnāt have a working laptop going into PyCon. I thought (naively) that I could make it without a laptop. But I realized after my first afternoon of tutorials that in order to really learn most of what was being taught, I needed to work through and break the material given to the class. I immediately rented a laptop (which costed an arm and a leg) and never for a second felt like it was a bad investment.
The second mistake I made was not having any business cards. I had 34 business cards to go through when I returned home. Even just one week later I remember the people who gave me a card more than those who didnāt.
I had a rude awakening in Canada when it came to data usage. In America I have a smart phone that can do things. Lots of things that I have gotten very used to having. I thought I could supplement my mobile usage with Wi-Fi. The PyCon Wifi is barely fast enough to load static HTML files. It wasnāt even fast enough to pip install something. I really need to Think about my mobile data usage and Prepare my laptop going to any tech conference.
One general project tip I picked up from the development sprints was to always have the current project version as the first thing on the readme file (if not all project files). There were several times where I was working on the wrong version of code because projects either had version numbers in weird places or missing altogether. I also learned that I like it when people name their development branches after the version theyāre working towards.
Ā Ā Ā From here on out Iām going to cover some specific topics and things I learned at PyCon. Some of this may seem very simple to some of you while other pieces are going to be incredible specific and too detailed. These topics are self contained units, you donāt need to read them sequentially or at all to understand the rest.
CPython
One thing I picked up at PyCon is that virtual_env makes a set of symbolic links to the referenced Python installation. By installing something to the virtual_env, then it will only be installed to that virtual_env. But installing something to the linked Python instance (say the base instance), then that will also show up in the virtual_env. This can lead to a lot of confusion and make it very hard to figure out what version of a package is actually being used by Python.
Pythonbrew is a Mac exclusive solution to this problem. Python 3.4 comes with a tool called pyvenv which has support for separate Python binaries as well as site and package directories.
One thing that surprised me was that pip doesnāt install as the current user by default. If installing as the current user is desired then use the --user option.
One talk pointed out that import is, at its core, basically just a implicit namespace (the module name being the namespace unless you use the as syntax) with an exec of the files in the module. This actually helped me really understand a lot of my frustrations with import errors (or errors that happen in a module but are reported as a bad import). Also, this finally cleared up what the difference between what a Python module and package is. A module is a Python source file, a package is a directory of Python source files (modules).
Hereās a cool, though I donāt know how useful, import fact. Thereās a buildin function called reload that will re-exec a module. I heard about this when someone was talking about how Django renders server side changes without a http server restart.
Python (2.7+) comes with an OrderedDict (https://docs.python.org/2/library/collections.html), which is a dictionary that remembers the order in which elements were inserted. Iām assuming that an OrderedDict is just a list of (key, value) pairs that has its access and insert methods
in a C Module. The example use cases for an OrderedDict are XML/HTML processing libraries or parsing HTTP headers.
One thing I found interesting is that Python lists have a 94% space utilization on average. Dicts on the other hand have a minimum of 33% of their allocated space unused and a maximum of ~50%, which is what is expected from a generic key hash data structure.
In newer version of CPython (2.6 with object slots, 3.2+ by default, and almost all versions of PyPy by default), objects have less time and space overhead than dicts. Use objects when an arbitrary number of references isnāt required. IE. if a struct would have worked in C then use a NamedTuple or Object. This one really caught me by surprise.
On a similar note, use the __slots__ attribute if possible (https://stackoverflow.com/questions/1336791/dictionary-vs-object-which-is-more-efficient-and-why/1336829#1336829).
__dict__ is not necessarily a dictionary even if it exposes (most) of the same methods as a dictionary.
"Every problem in computer science can be solved with another layer of indirection." - David WheelerĀ Ā Ā
IPython
Ā Ā Ā A large number of presentations, classes, and open spaces at PyCon came with an IPython notebook. I canāt even begin to explain how excited I am about IPython. Itās an enhanced version of CPython that comes with a large amount of convenience functionality. The big ones that I remember off the top of my head are the ?, ??, %timeit, %%time, %run, %edit, History, interpreter shell commands, and startup files.
My favorite examples are the ? and ?? operators. ? will give present the docstring for any object, ?? will show the source code. Most of these operators are described here: http://ipython.org/ipython-doc/stable/interactive/tutorial.html
Probably one of the best features of IPython, however, is the browser-based IPython notebooks. These are Python interpreters embedded in a web browsable format. IPython notebooks come with a URL and can be hosted as a dynamic page either locally or via a Mod_wsgi enabled web server. Github will host a set of IPython notebooks on their network for free.
IPython notebooks have support for simple markdown, Latex, CPython code, and a large number of other DLS markups. These things are seriously awesome, theyāre so intuitive that itās just easier to run one on your own then listen to me prattle on about them.
http://nbviewer.ipython.org/github/ipython/ipython/blob/master/examples/Notebook/Running%20Code.ipynb https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks Probably my favorite part about this is that there is now work being done to have the IPython model work for many other languages like, IHaskell notebooks, IGo, IScala, IClosure, etc.
Sphinx and Hieroglyph
I didnāt work with Sphinx or Hieroglyph enough to really get a feel for either tool, but Iām very interested in how both of them work. Sphinx is a docfile generation tool which is used to build html, latex (pdf by extension), man pages, xml, or plain text documentation. By producing the first set of documentation all the other kinds of documentation come for free. Ā Ā Ā From what Iāve seen, Sphinx takes a little bit of configuration and getting used to, so it seems like itās built for decent sized project to really see the benefits. Next time I write something that needs HTML and a man page Iāll be trying out Sphinx. Ā Ā Ā Iāll also be trying Hieroglyph next time I give some kind of presentation. Itās a tool that is built on top of Sphinx which generates HTML presentations.
Twisted
I have heard of the Twisted project before, but I didnāt really understand what it was or what it might be useful for. I still havenāt used the library (much) but I now understand that out of the box itās able to parse and construct network protocol connections. Itās useful for sending and receiving standard OSI network protocols (TCIP, SMTP, POP3, IMAP) and comes with support for custom protocols.
This is not immediately useful to me personally, but given some of the projects I want to work on in the future this will become useful knowledge down the road. Ā Ā Ā And no, asyncio.py does not replace Twisted... or so I was forcefully told. Ā
Distributed (network) Messaging
This is one topic that is very immediately useful to me. Iāve been looking into distributed messaging brokers with the intent of finding the system best suited to create a distributed job broker.
Ā Ā Ā I donāt know much about most of these tools, so Iām just going to list them off in the order in which Iām going to investigate them. The one message broker Iāve used before is RabbitMQ, other then that: Ā GearMan, Ā ZeroMQ, Celery, ActiveMQ. I would love to hear peoples thoughts on this topic.
Machine Learning
Ā Ā Ā This is by far the topic that I spent the most time at PyCon learning about. I came into PyCon with almost no understanding of how machine learning works and now I feel like I know just enough to be dangerous to those sitting in close proximity. Basically, bring a large pinch of salt. Ā Ā Ā If youāre interested, this is a link to a IPython Notebook which will walk you through the basics of machine learning. Absolutely wonderful tutorial. https://github.com/jakevdp/sklearn_pycon2014
To start off with letās talk about some practical applications to machine learning, specifically projects that I think are worth pursuing. I would love to take the bugzilla database at my work as a test and training data set and then guess the chance that a new bug is a product bug, configuration bug, or any other number of useful categorizations. The next step would be for all, known and new, product bugs guess what section of code the bug is most likely found in. If I can get a program that has a reasonable amount of accuracy (say 75% for arguments sake) then this would be very useful. I find the concept of having computers accurately predict and categorize fascinating.
Cool story and all, but how does one actually turn a computer into a more accurate magic 8-ball? The basic idea behind machine learning can be summed up into 5 parts: Acquiring data, feature scaling (making data uniform), choosing an algorithm (sometimes called a model), applying the algorithm to the data (called data fitting), and validation.
The three hard parts are acquiring the data, feature scaling, and validation. Scikit-learn already comes with a large number of ML algorithms and a processes to fit the data to the algorithm. Acquiring the data is exactly what it sounds like. Iād like to answer how one does data acquisition but the instructor glossed over that detail. Someone in the class asked āhow do I learn to gather dataā and the instructorās answer was āgo to college.ā
Feature scaling is what the database world refers to as normalization. Normally a preprocessor which makes sure the data set the algorithm is going to work on uses the same assumptions. The suggestions given in class were to make sure the data gathering has constraints on it which donāt allow too little, too much, or malformed data. The other major alternative is to take an existing database and chop off the pieces which break the code.
The example that I was given at PyCon was attempting to do image analysis on a set of thumbnails that all have different resolutions. Running a ML algorithm against a set of images with different resolutions will produce garbage. In the IPython notebook I linked to they use a PCA (principal component analysis) to drop sections of the example database.
The specific technologies we used in the tutorials were: scikit-learn, Pandas, Numpy, SciPy, and MatPlotLib. Ā Scikit-learn is built on top of Numpy, SciPy, and MatPlotLib. Itās open source and uses the BSD license. Created make data preprocessing, classification, regression, and clustering an easier proposition. Scikit-learn is the reason why choosing an algorithm and fitting the data to that algorithm is one of the easiest parts of machine learning. The developers of scikit-learn have taken the portion of ML that used to be the hard part and trivialized it. The example page for a Support Vector Machine shows just how simple the process of using the scikit-learn library is:
from sklearn import svm X = [[0, 0], [1, 1]] y = [0, 1] clf = svm.SVC() clf.fit(X, y)
clf.predict([[2., 2.]])
Taken from: http://scikit-learn.org/stable/modules/svm.html
Ā Ā Ā Numpy and SciPy are two Python libraries built to address two of CPythonās (the standard Python implementation) major faults: Speed of computation and memory use. According to these benchmarks (https://stackoverflow.com/questions/7596612/benchmarking-python-vs-c-using-blas-and-numpy) there is very little to no difference between c++ and Python. Both NumPy and SciPy use less memory than native Python data structures but I donāt know what the difference is and Iām having trouble finding documentation that shows conclusive benchmarks. Topic for another time.
Ā Ā Ā MatPlotLib is a Mathlab-like 2D plotting framework. It makes it very easy to produce visualizations of all the datasets that scikit-learn and Pandas produce. Iām especially fond of the seaborn extension to MatPlotLib http://www.stanford.edu/~mwaskom/software/seaborn/examples/index.html . Seaborn is simple and produces high quality visualizations with little to no effort on the programmer's part. I view it almost as the Twitter Bootstrap (http://getbootstrap.com/) of data visualization.
Ā Ā Ā There are two major types of machine learning algorithms, supervised and unsupervised. Unsupervised ML algorithms are a set of algorithms that take in a dataset and attempt to group data which is similar. This SO answer does a pretty good job of explaining the differences (https://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning) but Iām going to try to put it into my own words anyway.
Supervised means a dataset that contains data and some kind of classification of the data. Normally in supervised learning a human has gone through each portion of the data and set the classification by hand (thus the supervised part). Ā The dataset is split into a ātrainingā and ātestā set, normally a 75% and 25% split respectively. The algorithm ālearnsā from the training set and then attempts to guess what the right classification is on the training set. The assumption is that the human guess are ācorrectā often enough that there is something to test against. This makes it very easy to validate a modelās fit of the data.
Unsupervised means the dataset doesnāt come pre-categorized. So the algorithm attempts to create a set of classifications or categorizes based on patterns it finds in the dataset. This requires much less effort on the part of the programmer but the results are much less trustworthy. In general unsupervised algorithms are much more susceptible to outliers, Ā poor feature scaling, and ambiguity. They algorithms also tend to be more sensitive to certain kinds of patterns and less sensitive to others. As such the false positive and false negative rate is higher. The verification processes is also more difficult with unsupervised algorithms.
Sadly the class that was supposed to cover Model verification didnāt get to the topic. This is something that Iāll have to look into at a later time. Hereās a short list of jargon that was thrown out (in very quick succession) of things one can use to verify a model: RMSE, precision, recall score, Ā f-score, ROC curves, cost curves. What any of that means is a mystery to me (at the time of this writing at least).
Here are some miscellaneous notes that I found useful but couldnāt quite fit into the rest of this write-up:
sklearn
Ā Ā Ā _ prefix means that the method or component are uninitialized until after the data fit.
data.
supervised
Large amounts of uncertainty tend to break support vector machines.
Ā Ā Ā Ā Ā Ā Numpy
Ā Ā Ā numpy array slicing has been overridden. Donāt expect it to act like normal array
slicing.
numpy slices return references, not views.
Don't use strings. I learned this the hard way. Convert strings to integers if possible.
MatPlotLib
Ā Ā Ā I thought pyplot.Loglog and pyplot.semilog were awesome ways to display data.
Pandas
Ā Ā Ā dropna() is a great way to remove null data.
Ā Ā Ā Pd.get_dummies is useful for changing data to an enumerated type.
Ā Ā Ā Passing mixed data to pandas then Panda will insert everything as an object.
Conda
Ā Ā Ā Iāve been informed that if I want to do any amount of scientific computing in Python i should use Conda as my package manager. Having never used it myself I donāt know about the validity of this claim.
Partitioning is the heart of scaling
Ā Ā Ā One thing many of the talks focused on (especially the ones that were web focused) is how important it is, for performance reasons, to be able to partition a technology stack. The classic web example of this concept is sticking a web server and database on different servers. The traditional software engineering way of stating this is that low coupling is very important.
Ā Ā Ā For some reason Iāve always paid attention to the security benefits of separating a web server from the database server. Iāve also heard people talk about performance reasons but Iāve never understood that. Doesnāt putting these services on different servers now mean they have to at least deal with network latency? But the thing I never really understood is that the network is not always the system bottleneck. Itās only been my bottleneck because everything Iāve written has been so simple. It is becoming more important now that I want to do things like e-mail a list of users upon some action, change a pageās url and the thousands of references to that page, or build a categorization for the data a user is attempting to submit as a bug report.
Ā Ā Ā And that made me realize that partitioning a system allows for isolation of hot spots. Partitioning allows me to focus system resources on the overloaded portions of the system. This is the same pattern I follow when Iām attempting to solve a performance problem in code (profile, isolate, refactor) just applied to a system or service rather than functions or classes.
Local outreach
There was an interesting keynote given by Jessica McKellar that talked about teaching
Python in schools. Specifically the sorry state of programming education in the United States and her opinions on how the Python programming language could help that. https://www.youtube.com/watch?v=4QOoAw6Su7M
Flask
Ā Ā Ā Flask is my web framework of choice when I do web development. I went a talk called āDeveloping Flask Extensionsā and came away with some interesting ideas (https://www.youtube.com/watch?v=OXN3wuHUBP0). Probably the biggest thing that I came away with is that the @app decorator is the heart of the Flask framework. Extensions to Flask will ultimately change the properties of the @app decorator.
Ā Ā Ā Some cool extensions that I got to play with are:
Flask-Debugtoolbar : This is a flask port of the Django debug toolbar. This thing is seven
kinds of amazing. It displays a huge amount of debugging information and makes it very easy to see the state of a web application.
Ā Ā Ā Flask-WTF : Iām a big fan of WTForms. This is a flask extension that makes integrating
with WTForms a very simple task. If you are writing web applications in Python, take in data, and donāt know what CSRF stands for, you should probably be using WTForms. Another option is Flask-SeaSurf.
Ā Ā Ā Flask-Admin : Another port of a Django feature. This gives many (not all) of the features
that the default Django admin page. These are just the extension that I played with at PyCon. A much larger (but still not exhaustive) list can be found at (http://flask.pocoo.org/extensions/).
Subprocessing
This is going to sound dumb to anyone thatās used Python for scripting purposes, but thereās a module called subprocess and itās awesome. The things one learns...
import subprocess Ā Ā Ā subprocess.call(['git', 'clone', git_url])
Security
Ā Ā Ā Iām not a security expert and am definitely not a web security expert. But these videos were interesting to me. Iām just going to leave these links here on the chance that you want to know more about these topics.
Ā Ā Ā ssl
Ā Ā Ā Ā Ā Ā https://www.youtube.com/watch?v=SBQB_yS2K4M
Ā Ā Ā http headers
Ā Ā Ā Ā Ā Ā https://www.youtube.com/watch?v=T-5p5ewqhVw
Ā Ā Ā Ā Ā Ā https://www.youtube.com/watch?v=sL_syMmRkoU
Ā Ā Ā Ā Ā Ā https://www.youtube.com/watch?v=nQOahpei6kw
Ā Ā Ā OWASP and Qualys SSL Labs
https://www.owasp.org/index.php/Main_Page
https://www.ssllabs.com/ssltest/
Dynamic Code Analysis.
Iāve always been a fan of Pylint and coverage.py. I believe that theyāre very useful tools and produce cleaner code. But I was introduced to a tool called flake8 which is a combination of pep8, pyflakes, and a cyclomatic complexity checker. Pylint freaks out at Mixins and since Flask uses so many mixinās Iāve been having a hard time using Pylint on flask projects. Iāve been using flake8 as a fallback when Pylint is too strict.
Ā Ā Ā Ā Ā Ā Cool PyCon Talks
http://pyvideo.org/category/50/pycon-us-2014
Augie Fackler, Nathaniel Manista / Deliver Your Software In An Envelope
https://www.youtube.com/watch?v=mTj297sGzxw
Alex Gaynor / Fast Python, Slow Python
https://www.youtube.com/watch?v=7eeEf_rAJds
Jessica McKellar / Building and breaking a Python sandbox
https://www.youtube.com/watch?v=sL_syMmRkoU
Allison Kaptur / Import-ant Decisions
https://www.youtube.com/watch?v=aS5kXzbsLLQ

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch ⢠No registration required ⢠HD streaming
pycon 2014 Sweden: the bad bits
I'm just back fromĀ pycon 2014 Sweden.
And all the tweets and commentary will be positive, as they always are. If you want to say something negative its best to not say it at all?
Hmm, not me. Ā Someone has to give this feedback; we can't sweep it under the carpet.