Imagine a race car driver meticulously adjusting their vehicle before a competition. Just as fine-tuning a race car optimizes its performanc
seen from United States
seen from China

seen from United States

seen from Australia
seen from China
seen from China
seen from United Kingdom

seen from Malaysia
seen from United States
seen from United States
seen from South Korea

seen from United States
seen from United Kingdom

seen from United States

seen from United States
seen from Türkiye
seen from Norway
seen from United States

seen from United States

seen from France
Imagine a race car driver meticulously adjusting their vehicle before a competition. Just as fine-tuning a race car optimizes its performanc

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming
Finding the right hyperparameters for your deep learning model can be a tedious process. It doesn’t have to.
With the right process in place, it will not be difficult to find state-of-the-art hyperparameter configuration for a given prediction task. Out of the three approaches — manual, machine-assisted, and algorithmic — this article will focus on machine-assisted. The article will cover how I do it, get to the proof that the method works, and provide the understanding of why it works. The main principle is simplicity.
Few Words on Performance
The first point about performance relates to the issue of accuracy (and other more robust metrics) as a way to measure model performance. Consider f1 score as an example. If you have a binary prediction task with 1% positives, then a model that makes everything a 0 will get close to perfect f1 score and accuracy. This can be handled with some changes to the way f1 score deals with corner cases such as “all zeros,” “all ones,” and “no true positives.” But that’s a big topic, and outside of the scope of this article, so for now I just want to make it clear that this problem is a very important part of getting systemic hyperparameter optimization to work. We have a lot of research in this field, but the research is focused more on algorithms, and less on the fundamentals. Indeed, you can have the fanciest algorithm in the world — often also really complex — making decisions based on a metric that does not make sense. That’s not going to be hugely useful for dealing with “real-life” problems. Make no mistake; EVEN WHEN WE DO GET THE PERFORMANCE METRIC RIGHT (yes I’m yelling), we need to consider what happens in the process of optimizing a model. We have a training set, and then we have a validation set. As soon as we start to look at the validation results, and start making changes based on that, we start to create a bias towards the validation set. Now we end up with the training results that are a product of the bias the machine has, and we have the validation results, that is the product of the bias we have. In other words, the model we get as a result does not have the properties of a well-generalized model. Instead, it’s biased away from being generalized. So it would be very important to keep this point in mind.
The key point about a more advanced fully-automated (unsupervised) approach to hyperparameter optimization, involves first solving these two problems. Once these two are solved — and yes there are ways to do that — the resulting metrics would need to be implemented as a single score. Then that score becomes the metric against which the hyperparameter optimization process is optimized. Otherwise, no algorithm in the world will help, as it will optimize towards something else than what we are after. What are we after again? A model that will do the task that the prediction task articulates. Not just one model for one case (which is often the case in the papers covering the topic), but all kinds of models, for all kinds of prediction tasks. That is what a solution such as Keras allows us to do, and any attempt to automate parts of the process of using a tool such as Keras should embrace that idea.
What Tools Did I Use?
For everything in this article, I used Keras for the models, and Talos, which is a hyperparameter optimization solution I built. The benefit is that it exposes Keras as-is, without introducing any new syntax. It allows me to do in minutes what used to take days while having fun instead of painful repetition. You can try it for yourself:
pip install talos
Or look at the codes / docs here.
But the information I want to share, and the point I want to make, is not related to a tool, but the process. You could follow the same procedure any which way you like. One of the more prominent issues with automated hyperparameter optimization and related tools is that you generally tend to end up far away from the way you’re used to working. The key to successful prediction-task-agnostic hyperparameter optimization — as is with all complex problems — is in embracing cooperation between man and the machine. Every experiment is an opportunity to learn more about the practice (of deep learning) and the technology (in this case Keras). That opportunity should not be missed at the expense of process automation. At the same time, we should be able to take away the blatantly redundant parts of the process. Think of doing shift-enter in Jupyter for a few hundred times and waiting for a minute or two between each iteration. In summary, at this point, the goal should not be in a fully-automated approach to finding the right model, but in minimizing procedural redundancy on burdening the human. Instead of mechanically operating the machine, the machine operates itself. Instead of analyzing the results of various model configurations one by one, I want to analyze them by the thousands or by hundreds of thousands. There are over 80,000 seconds in a day, and a lot of parameter space can be covered in that time without me having to do anything about it.
cybernetic flowcycle symbiosis
PDF download: http://bit.ly/cyberflowsymbiosis
Bayesian optimization
Looking at the NIPS papers (and workshops) I noticed many references (some hidden) to Bayesian optimization as a useful subroutine, so I decided to investigate and see what it's all about. I found a great tutorial by Eric Brochu, Vlad Cora, and Nando de Freitas, and I decided to follow it and see what it's all about. Note that this is one of the techniques used in James Bergstra's paper on hyperparameter optimization, and he has optimized and tested code online to do this kind of optimization, so you should use that code rather than the one here.
Regardless, I thought it to be a useful exercise to implement it and see what happens.
Bayesian optimization is a really cool and simple way of globally optimizing a function which is very expensive to evaluate. So expensive, in fact, that you're not bothered by the fact that it involves optimizing over a gaussian process with something like simulated annealing (or really whatever gradient-free global optimization algorithm strikes your fancy). One example of this is hyperparameter optimization, where your neural network has over 10 different tuning parameters which interact in odd ways and essentially make your research irreproducible. Then, you really care about not running too many iterations of something silly such as simulated annealing over your real objective function and are willing to spend the computational budget of a few minutes to get a really good point to try out.
So how does it work? First, you assume you have evaluated your objective function on a bunch of arbitrarily chosen points of your space. Then, in each iteration, it fits a gaussian process over your function evaluations and searches over this gaussian process for the best point to try out next. This search is a global optimization of something called an acquisition function, which can be as simple as the expected value of your objective at the point plus the predicted variance at that same point (the UCB acquisition function) or the expected improvement evaluation your function at that point can have over your current best point. Then, you select your current best point, evaluate your expensive objective function in it and add it to the list. Repeat until you're satisfied that your objective value is no longer improving all that much.
Bayesian optimization also makes sense in an intuitive way: essentially you're replacing a worst-case assumption (for example, doing grid search and hoping to find optimal points assumes that your function is Lispchitz with constant related to the size of the grid) on your function with an average-case assumption expressed as any kernel you want. Effectively instead of saying that (f(x')-f(x))/||x'-x|| is always smaller than K you say that f(x')-f(x) ~ Normal(0, k||x'-x||), which then allows you to fiddle with the norm (or even replace it with something entirely different), and also generalizes nicely to discrete or mixed discrete-continuous spaces (as long as you're still able to do a reasonable job of searching over your acquisition function).
https://gist.github.com/1402892
Edit: Ruben Cantin has an efficient multithreaded C++ implementation with python bindings.
narrative fractals are a story of an experience within a story of an experience. (experience is defined as a holographic projection of reality within multiple subsets of connected information, clustered as networked trends which have density to a symbolic state) (symbolic state is defined as a cultural influence which has embedded meaning and intention that grows as its usage is experienced) in a sense, narrative fractals are a system within a system which has scope to go as far as necessary to deliver a construct of its own existence, for interplay of certain() yet initially undefined variables. a narrative fractal is the unfolding (= able to be told or seen through instruction of its own deliverance) of timelines which expand (= reaching extensions of development) and contract (= condensing compounds into tangible substances) further than their foreseen instance (that which is derived through effort vs momentum from a certain state / mode). narrative fractals are connected to one another, involving multiple entities with their own instances portrayed as 'environmental construct influencers' to a vector (= a super variable point in the fluid hyperreality where a mind experiences a narrative fractal projection). (entity is an organism which is connected to any other organism and is not limited to a human mind).
- @entpm

Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
Free to watch • No registration required • HD streaming