Automating Hyper Parameter Tuning
The past decade has seen a surge in the field of Machine
Learning. Data generation has overwhelmed men and machines alike and
researchers around the world have taken it upon themselves to combat the
problem of resource and time insufficiency.
Machine-learning works wonders today with its powerful
algorithms and preprocessing techniques. However, with the ever-increasing
volume of data, given the limited machinery, it often takes hours, days and
sometimes even weeks to get the optimum predictions. This is mostly due to the
following reasons:
- · Vague features in dataset
- · Manual feature engineering
- · Brute force techniques for preprocessing
- · Computationally expensive grid searches for hyperparameter tuning
- · Heavy brute force methods for model selection
- · Heavy volume data
Notice that the limitations mostly arise due to manual
intervention and brute force methods. What if we could optimize the process by
automating machine learning?
This is a concept which has recently surfaced in the realm
of data science and is sure to help us break certain concrete obstacles.
First, what is automation of machine learning?
Simply put, it is how coders can leave the task of tweaking
algorithms to determine the optimum route to the system itself. In other words,
the system will employ “algorithms to optimize algorithms”.
The scope of this article will be automation of
hyperparameter tuning of a particular model.
Second, what are hyperparameters?
Every model in machine learning work with a few key parameters
which are either externally fed to it or are set by default. For instance,
Random Forest model has a hyperparameter called ‘n_estimators’ which is set to
10 by default. It determines the number of decision trees the algorithm is to
rely upon. A coder can alter the number of trees based on his preferred
performance metric. This holds true for every other hyperparameter.
The main question is, how must the coder decide what is the
optimum value for a hyperparameter? And given n hyperparameters with m values
each, there can be m^n number of combinations!
It is a tedious, time taking process for any programmer to
sit and try every combination to find the best fit. So, we must automate this
process! Mentioned below are few of the best libraries and the techniques they
employ which I found useful while implementing my code.
Grid
Search: For every combination, the machine will fit the model to
determine the scoring metric (say accuracy). In other words, it is purely brute
force. Even though this is a full proof technique to obtain the optimum
combination of hyperparameters and is definitely faster than manual labor, each
fit itself takes sufficient amount of time and thus, fails to overcome the
barrier of time insufficiency.
Library:
Recommendation: Try this library when the number of
combinations is small.
Random
Search: This
is a naive approach since it employs random combinations with no regard for
previously obtained scores. This technique picks points in the combination set
at random for a given number of iterations, and reports the one which gives
relatively the best performance.
Even
though a very uncertain technique, random search sometimes gives really good
results, with the added advantage of reduced time. This happens because it gets
to pick over a wide range of combinations in very few iterations. However, it
is a matter of chance that the algorithm will hit an optimum or a near-optimum
combination within the limited iterations.
Libraries
which you can explore to implement random search are:
- BTB Uniform Tuner: https://github.com/HDI-Project/BTB/blob/master/examples/random_forest.py
- RandomSearchCV: RandomSearchCV Documentation
- Skopt dummy minimize: https://scikit-optimize.github.io/#skopt.dummy_minimize
Bayesian
Optimization: This
is the new hot technique and employs the characteristics of what can be called
“intelligent randomization”. During the first few iterations it picks up
completely random points, and then with the help of an acquisition function
determines the next best trial. An acquisition function takes care of a very
stately tradeoff called the exploration-exploitation tradeoff. It is done by
taking into consideration the score of a previously selected random combination
(exploitation function) along with a function which seeks to explore
combinations beyond the vicinity of the already selected ones.
If the weight of the exploration function is too high, the
Bayesian optimization will edge on becoming a random search basically. However,
if the exploitation function has too much weight, the acquisition function may
only operate within a local optimum and miss out on a global optimum.
This might sound complicated, but with deeper reading, one
gets the hang of it.
There are a few library functions which saves us from the
complicacies of the internal working of Bayesian Optimization.
- HyperOpt: http://hyperopt.github.io/hyperopt/
- Skopt BayesSearchCV: https://scikit-optimize.github.io/notebooks/sklearn-gridsearchcv-replacement.html
- SkOpt GP minimize: Uses Gaussian function to acquire next combination in Bayesian Optimization since a gaussian process is a very quick route.
- https://scikit-optimize.github.io/#skopt.gp_minimize
- Bayes_Opt: https://www.kaggle.com/paso84/xgboost-bayesian-optimization (example)
Refer to Scikit Optimize: Bayesian Hyperparameter Optimization in Python to understand Bayesian Optimization in further depth.
Sequential
Optimization using trees: A tree-based regression model is used to
find out the next best point for evaluation of the cost function which helps
improve the model with every iteration, costing very less time to compute the
minimum.
Library functions:
- SkOpt forest minimize: https://scikit-optimize.github.io/#skopt.forest_minimize
- SkOpt gbrt minimize: https://scikit-optimize.github.io/#skopt.gbrt_minimize
Even though not as certain as brute force methods, these
libraries promise to bring forth near-optimum points and sometimes even optimum
in the best time possible!
Happy automating folks!
This is amazing, very rare to find these type of blogs. Must say very well written.
ReplyDeleteMachine Learning
Thank you Avneet!
Delete