The Python Packages You Need For Machine Learning and Data Science
Python packages you need for machine learning and data science
The open source Computer Vision library, Open-CV, is your best friend when it comes to images and videos. It offers a very efficient solution to common image problems such as face detection and object detection. Python packages you need for machine learning and data science
Data visualization is your primary way to communicate with attendees without data. If you think about it, even apps are a way of visualizing various data interactions behind the scenes. Matplotlib is the foundation for image display in Python.
Python wouldn't be the most popular programming language without Numpy. It is the foundation of all data science and machine learning packages, essential packages for all deep mathematics, including Python. All the nasty linear algebra and fancy math you learned in college is basically handled very efficiently by Numpy.
If machine learning is your passion, then the Skit-Learn project has you covered. The best place to start and the first place to find any algorithm you want to use for your predictions. It has many handy evaluation methods and training wizards, such as grid search.
Scipy provides basic mathematical methods for performing complex machine learning processes. Again, it's kind of weird that it only has 8500 stars on GitHub.
Once the size of our dataset exceeds a certain threshold of terabytes, it can become difficult to use off-the-shelf standard implementations of machine learning algorithms that are often offered. XGBoost is meant to save you from waiting weeks for calculations to complete. It is a highly scalable and distributed gradient boost library that will ensure your calculations run as efficiently as possible. Available in almost all common data science languages ​​and stacks.
Since we're talking about Python packages, we should take a moment to talk about their master pip. Apart from that, you cannot install any other. Its sole purpose is to install packages from places like the Python Package Index or GitHub. But you can also use it to install your own custom packages. 7400 stars doesn't show how important it is to the Python community.
 If you've ever worked with dates in Python, you know that doing so without dateutil is a pain. You can measure the current date, the next month, or the distance between dates in seconds. And best of all, it handles time zone issues for you, which if you've ever tried to do without a library, can be a huge pain. 1600 stars on GitHub shows that happily many people don't have to go through the frustrating process of working with time zones.
If you're wondering what my favorite Python package is, look no further. This is a small app called TQDM. All it does is it gives you a render bar that you can throw around any loop and it gives you a progress bar that tells you how long each iteration takes on average and more importantly how long it will take.
StatsModel is your gateway to the classic world of statistics, as opposed to the fancy new world of machine learning. Includes several useful statistical tests and evaluations. On the contrary, these are more stable and should definitely be used by any data scientist from time to time. 6600 stars are probably more comments on the awesomeness of classic stats versus deep study.