Optimization Functions

August 26, 2018

Optimization Functions

· Constant Learning Rate Algorithms

Challenges:

· Choosing a proper learning rate can be difficult. A learning rate that is too small leads to painfully slow convergence, while a learning rate that is too large can hinder convergence and cause the loss function to fluctuate around the minimum or even to diverge.

· The same learning rate applies to all parameter updates. If our data is sparse and our features have very different frequencies, we might not want to update all of them to the same extent, but perform a larger update for rarely occurring features.

Stochastic Gradient Descent: Optimum results not obtained, computationally inexpensive. Performs a parameter update for each training.

Batch gradient descent: Computes the gradient of the cost function w.r.t. to the parameters θ for the entire training dataset.

Mini-batch gradient descent: Mini-batch gradient descent finally takes the best of both worlds and performs an update for every mini-batch of n training examples.

· Adaptive Learning Rate Algorithms

Advantage: Parameters modified/adapted with iterations. Manual intervention is minimum.

Adagrad: Good results. Adagrad adapts updates to each individual parameter to perform larger or smaller updates depending on their importance.

Disadvantage: computationally expensive

Adam: Faster than Adagrad with good results, Adaptive Moment Estimation is most popular today.

Disadvantage: Computationally expensive

For a more in-depth and mathematical explanation, refer to the following link:
https://towardsdatascience.com/

Search This Blog

Machine Learning Hub

Optimization Functions

Comments

Post a Comment

Brands Worked with or Featured On

Popular Posts

Automating Hyper Parameter Tuning

Automated Question-Answering System (Preliminary)

AWS: SageMaker Service

AI and Automation Testing

t-SNE Intuition

Random Forest in the Parliament!

Hard Drive Test Data Problem (Kaggle): Solution Approach

Fundamental Concepts of Neural Nets: Perceptron

Getting Started With Statistics

Meta Features and Initial Seeding for Hyperparameter Tuning