Automating Hyper Parameter Tuning


image courtesy: instascaler.com

The past decade has seen a surge in the field of Machine Learning. Data generation has overwhelmed men and machines alike and researchers around the world have taken it upon themselves to combat the problem of resource and time insufficiency.

Machine-learning works wonders today with its powerful algorithms and preprocessing techniques. However, with the ever-increasing volume of data, given the limited machinery, it often takes hours, days and sometimes even weeks to get the optimum predictions. This is mostly due to the following reasons:
  • ·         Vague features in dataset
  • ·         Manual feature engineering
  • ·         Brute force techniques for preprocessing
  • ·         Computationally expensive grid searches for hyperparameter tuning
  • ·         Heavy brute force methods for model selection
  • ·         Heavy volume data

Notice that the limitations mostly arise due to manual intervention and brute force methods. What if we could optimize the process by automating machine learning?

This is a concept which has recently surfaced in the realm of data science and is sure to help us break certain concrete obstacles.

First, what is automation of machine learning?

Simply put, it is how coders can leave the task of tweaking algorithms to determine the optimum route to the system itself. In other words, the system will employ “algorithms to optimize algorithms”.

The scope of this article will be automation of hyperparameter tuning of a particular model.

Second, what are hyperparameters?

Every model in machine learning work with a few key parameters which are either externally fed to it or are set by default. For instance, Random Forest model has a hyperparameter called ‘n_estimators’ which is set to 10 by default. It determines the number of decision trees the algorithm is to rely upon. A coder can alter the number of trees based on his preferred performance metric. This holds true for every other hyperparameter.

The main question is, how must the coder decide what is the optimum value for a hyperparameter? And given n hyperparameters with m values each, there can be m^n number of combinations!
It is a tedious, time taking process for any programmer to sit and try every combination to find the best fit. So, we must automate this process! Mentioned below are few of the best libraries and the techniques they employ which I found useful while implementing my code.

Grid Search: For every combination, the machine will fit the model to determine the scoring metric (say accuracy). In other words, it is purely brute force. Even though this is a full proof technique to obtain the optimum combination of hyperparameters and is definitely faster than manual labor, each fit itself takes sufficient amount of time and thus, fails to overcome the barrier of time insufficiency.

Library:
Recommendation: Try this library when the number of combinations is small.

Random Search: This is a naive approach since it employs random combinations with no regard for previously obtained scores. This technique picks points in the combination set at random for a given number of iterations, and reports the one which gives relatively the best performance.
Even though a very uncertain technique, random search sometimes gives really good results, with the added advantage of reduced time. This happens because it gets to pick over a wide range of combinations in very few iterations. However, it is a matter of chance that the algorithm will hit an optimum or a near-optimum combination within the limited iterations.

Libraries which you can explore to implement random search are:

Bayesian Optimization: This is the new hot technique and employs the characteristics of what can be called “intelligent randomization”. During the first few iterations it picks up completely random points, and then with the help of an acquisition function determines the next best trial. An acquisition function takes care of a very stately tradeoff called the exploration-exploitation tradeoff. It is done by taking into consideration the score of a previously selected random combination (exploitation function) along with a function which seeks to explore combinations beyond the vicinity of the already selected ones.

If the weight of the exploration function is too high, the Bayesian optimization will edge on becoming a random search basically. However, if the exploitation function has too much weight, the acquisition function may only operate within a local optimum and miss out on a global optimum.
This might sound complicated, but with deeper reading, one gets the hang of it.

There are a few library functions which saves us from the complicacies of the internal working of Bayesian Optimization.

Refer to Scikit Optimize: Bayesian Hyperparameter Optimization in Python to understand Bayesian Optimization in further depth.

Sequential Optimization using trees: A tree-based regression model is used to find out the next best point for evaluation of the cost function which helps improve the model with every iteration, costing very less time to compute the minimum.

Library functions:

Even though not as certain as brute force methods, these libraries promise to bring forth near-optimum points and sometimes even optimum in the best time possible!

Happy automating folks!

Comments

Post a Comment

Brands Worked with or Featured On

Brands Worked with or Featured On

Popular Posts