Random Forest in the Parliament!


The Random Forest Classifier is one of the most-used and talked-of algorithms in Data Science and Machine Learning and justly so!


Firstly, this algorithm is so intuitive that users and even laymen (non-technical associates or stakeholders) understand how it works with extreme ease. This is so because the components in a Random Forest which are essentially decision trees, mirror the decision-making technique of humans very closely - the traditional if-and-else process! 

It can be stated with a high degree of confidence that most of us have used a decision tree in our lives, with or without code.

Secondly, the Random Forest, being a collection of several independent classification models or decision trees, serve as an extremely robust method to arrive at any decision.

To understand how a random forest works, one need only be slightly imaginative.

Imagine a group of voters in a parliament house scenario. The speaker proposes a question or bill and the objective is to decide if the bill must pass or not (a typical classification problem).




Now, consider that each member of the parliament represents a decision tree. Each has been trained differently, based on the political situation at his/her respective state. In other words, the decision which comes off each member is based on different sections of data which may or may not intersect  (or ‘bags’ of data: a method called Bagging; note that intersection is not ideal for regions administered by our leaders).

Once every member has voted, the votes are counted and the decision with the majority count wins!




Yes, it is that simple.

Note that each member has been trained separately in their own time and region, and the test point or the new bill just passes through the trained elements or the members of the parliament during the time of decision making. The members vote on the basis of a list of conditions that they must consider on the basis of what they have witnessed with respect to n number of factors like economic growth, employment rate, literacy rate, sanitary awareness and so on in their respective regions (or features).



This is it. This is all that you need to know to have a strong high-level intuition of random forest. This is why I suggest imagining real life instances whenever you jump into problems. Through this simple example, we covered all these concepts in one sitting:

- Classification problem intuition (Pass or Terminate a Bill)
- Identifying features (Employment rate, literacy rate, etc.)
- Bagging (Division of the available data)
- if-else decision making (Condition-based decisions from every member)
- Ensemble of classification models (Bill passed or terminated based on majority voting)

But the most important concept that we learned even without mentioning it once was BIAS resulting from unbalanced data.

Similarly, Random Forest can be represented by a ton of other life-like examples which is usually not possible with most of the other classification algorithms with such ease. But you can count on me, because I will still try.

In spite of a series of innovations across the data science community, Random Forest remains one of the most intuitive techniques, which calls for the need of celebrating simplicity at its best!

Keep a lookout on this space for a series of machine learning concepts, delivered to you like you already know it!

Comments

Brands Worked with or Featured On

Brands Worked with or Featured On

Popular Posts