Meta Features and Initial Seeding for Hyperparameter Tuning


Meta Feature Approach


Machine understands only data, irrespective of the source. If data is similar, machine learning algorithms will behave similarly across data sets given similar criteria to judge upon.
As in the human world, one benefits from the productive labor of another, machines can learn and transfer their learning from one use case to another, given that the base for implementation shares a degree of equivalence.

What are meta features?

Meta data is data/information about the data.  In this case, it is the information about the datasets. There can be numerous meta information mined from the data. Here are given the prime meta features that help to determine the behavior of the datasets.

  • ·         Number of samples
  • ·         Number of features and (or) dimensionality
  • ·         Mean variance score across features
  • ·         Mean information gain across features
  • ·         Maximal information gain across features
  • ·         Number of target classes
  • ·         Mean standard deviation across features
  • ·         Mean coefficient of variation
  • ·         Mean coefficient of correlation
  • ·         Tree based meta features:

§  Number of nodes
§  Maximum depth of tree
§  Mean of n_node_samples: Number of samples in every node
§  Mean weighted_n_node_samples
§  Mean impurity
§  Mean threshold

The tree based features can be easily obtained in the form of a dictionary by using classifier.tree_.__getstate__()
They can also be individually found out by using format: classifier.tree_.n_node_samples

There are several other meta features which can be obtained from the dataset in question. A detailed list is available here: https://www.fruct.org/publications/ainl-fruct/files/Fil.pdf

Once the features are obtained, they need to be stored in an appropriate format like list, tuple or array.
For comparison, one can use algorithms like K-Nearest Neighbors, or K-clustering. However, since we chose limited meta features and had limited number of datasets for our required use cases, our criteria for judgement is Euclidean distance between two given vectors (list/tuple of meta data).

Thereafter, a threshold upon the Euclidean distance needs to be decided upon to classify which datasets are close enough to qualify transfer of a previous learning, and which aren’t close enough and should resume the training procedure with the default/base hyperparameters.

Transfer Learning Approach


Transfer learning is a concept which is widely used in deep learning to avoid re-training of lower layers in the network. This saves a lot of training time along with memory resources.

It is a process where the knowledge of a machine learning model which has already been trained is applied to a similar or related problem.

This approach has been utilized in a very basic scheme for this ideation.

When we find similar datasets, we utilize the largest dataset (in terms of size) to train and tune the model. Thereafter, the set of hyperparameters obtained for the model is used as a seed for tuning the model for the datasets which are in the vicinity of the Donor dataset (on basis of Euclidean distance). Previously a random seed was used, and experiments with a limited scope show that the borrowed seed works better than any random seed.

This can be a possible explanation:

Since an algorithm understands only data irrespective of the source, it is bound to produce similar predictions given datasets in each other’s vicinity. Thus, a set of hyperparameters which works well for dataset 1, will work nearly as good as the optimum hyper parameters in dataset 2. Therefore, we take forward a borrowed set of hyperparameters and feed it to a hyper parameter tuner for dataset 2.

For instance, if we take a Bayesian optimizer into consideration, the optimizer borrows the initial seed instead of a random seed and exploits it further to obtain better points in the vicinity of the seeded point. This must be done alongside random exploration. If random calls for the optimizer is set to 10, one of the points should be borrowed and the rest should be randomly seeded. This makes way for exploration and exploitation, both in random and controlled directions.

Hyperparameter tuning is supported by various libraries and one can employ any to implement the above concept. Summary of the libraries: http://machinelearningtub.blogspot.com/2018/10/automating-hyper-parameter-tuning.html

The observations below support the claim that seeding the tuner with a borrowed model improves the score when compared to random initialization of points in the hyperparameter combination set; after the same number of iterations.
Donor:
Dataset 1
Dataset 2

Dataset 2
Dataset 3
Base
Scores
0.56

0.98
Scores
0.91
0.98
Scores
0.02
0.99
Base Model with Class Weight
0.97
0.78
0.94
0.96
0.28
0.94
Tuned Base
0.91
0.86
0.95
0.93
0.51
0.77
Borrowed Model
0.87
0.88
0.97
0.85
0.18
0.9
Tuned Borrowed Model
0.96
0.82
0.96
0.91
0.49
0.78
Euclidean Distance
486841
564415
551532




Further experimentation is required to establish the foundation of the concept.

References:

Comments

Brands Worked with or Featured On

Brands Worked with or Featured On

Popular Posts