Meta Features and Initial Seeding for Hyperparameter Tuning
Meta Feature Approach
Machine understands only data, irrespective of the source.
If data is similar, machine learning algorithms will behave similarly across data sets given similar criteria to judge upon.
As in the human world, one benefits from the productive
labor of another, machines can learn and transfer their learning from one use
case to another, given that the base for implementation shares a degree of equivalence.
What are meta features?
Meta data is data/information about the data. In this case, it is the information about the
datasets. There can be numerous meta information mined from the data.
Here are given the prime meta features that help to determine the behavior of
the datasets.
- · Number of samples
- · Number of features and (or) dimensionality
- · Mean variance score across features
- · Mean information gain across features
- · Maximal information gain across features
- · Number of target classes
- · Mean standard deviation across features
- · Mean coefficient of variation
- · Mean coefficient of correlation
- · Tree based meta features:
§
Number of nodes
§
Maximum depth of tree
§
Mean of n_node_samples: Number of samples in
every node
§
Mean weighted_n_node_samples
§
Mean impurity
§
Mean threshold
The tree based features can be easily obtained in the form of
a dictionary by using classifier.tree_.__getstate__()
They can also be individually found out by using format:
classifier.tree_.n_node_samples
There are several other meta features which can be obtained
from the dataset in question. A detailed list is available here: https://www.fruct.org/publications/ainl-fruct/files/Fil.pdf
Once the features are obtained, they need to be stored in an
appropriate format like list, tuple or array.
For comparison, one can use algorithms like K-Nearest
Neighbors, or K-clustering. However, since we chose limited meta features and
had limited number of datasets for our required use cases, our criteria for
judgement is Euclidean distance between two given vectors (list/tuple of meta
data).
Thereafter, a threshold upon the Euclidean distance needs to
be decided upon to classify which datasets are close enough to qualify transfer of a
previous learning, and which aren’t close enough and should resume the training
procedure with the default/base hyperparameters.
Transfer Learning Approach
Transfer learning is a concept which is widely used in deep
learning to avoid re-training of lower layers in the network. This saves a lot
of training time along with memory resources.
It is a process where the knowledge of a machine learning model which has already been trained is applied to a similar or related problem.
This approach has been utilized in a very basic scheme for
this ideation.
When we find similar datasets, we utilize the largest
dataset (in terms of size) to train and tune the model. Thereafter, the set of
hyperparameters obtained for the model is used as a seed for tuning the model for
the datasets which are in the vicinity of the Donor dataset (on basis of Euclidean distance). Previously a
random seed was used, and experiments with a limited scope show that the borrowed
seed works better than any random seed.
This can be a possible explanation:
Since an algorithm understands only data irrespective of the
source, it is bound to produce similar predictions given datasets in each other’s
vicinity. Thus, a set of hyperparameters which works well for dataset 1, will
work nearly as good as the optimum hyper parameters in dataset 2. Therefore, we
take forward a borrowed set of hyperparameters and feed it to a hyper parameter
tuner for dataset 2.
For instance, if we take a Bayesian optimizer into
consideration, the optimizer borrows the initial seed instead of a random seed
and exploits it further to obtain better points in the vicinity of the seeded
point. This must be done alongside random exploration. If random calls for the
optimizer is set to 10, one of the points should be borrowed and the rest
should be randomly seeded. This makes way for exploration and exploitation,
both in random and controlled directions.
Hyperparameter tuning is supported by various libraries and
one can employ any to implement the above concept. Summary of the libraries: http://machinelearningtub.blogspot.com/2018/10/automating-hyper-parameter-tuning.html
The observations below support the claim that seeding the
tuner with a borrowed model improves the score when compared to random
initialization of points in the hyperparameter combination set; after the same
number of iterations.
Donor:
|
Dataset 1
|
|||||||
Dataset 2
|
Dataset 2
|
Dataset 3
|
||||||
Base
|
Scores
0.56 |
0.98
|
Scores
0.91 |
0.98
|
Scores
0.02 |
0.99
|
||
Base Model with Class Weight
|
0.97
|
0.78
|
0.94
|
0.96
|
0.28
|
0.94
|
||
Tuned Base
|
0.91
|
0.86
|
0.95
|
0.93
|
0.51
|
0.77
|
||
Borrowed Model
|
0.87
|
0.88
|
0.97
|
0.85
|
0.18
|
0.9
|
||
Tuned Borrowed Model
|
0.96
|
0.82
|
0.96
|
0.91
|
0.49
|
0.78
|
||
Euclidean Distance
|
486841
|
564415
|
551532
|
Further experimentation is required to establish the foundation
of the concept.
References:
2. https://towardsdatascience.com/transfer-learning-946518f95666
Image Courtesy:
https://towardsdatascience.com
Image Courtesy:
https://towardsdatascience.com
Comments
Post a Comment