One Stop Guide For Data Visualization Using Matplotlib
Data Visualization for Data Science-
What and Why?
Data visualization is the act of
taking statistics and positioning it into visual factors such as a map or
graph. Data visualizations make sizable and minute data easier for the human
brain to comprehend and visualization also makes it elementary to perceive
patterns, trends and outliers in categories of data.
Data Visualization is important
because visually represented numbers are more appealing when presented to
business owners or stakeholders. According to Tableau, “[data visualization is]
one of the most useful professional skills to develop. The better you can
convey your points visually, the better you can leverage that information.”
Data Visualization Packages
It basically has 3 packages:-
- Matploltlib- This is the most basic package
which is used to plot simple and standard graphs like bar, pie etc. Here
plotting is fast
- Seaborn- This is a package built on top of
matplotlib and supports many complex graphs like box plot, pair plot etc
- Plotly- This is an advanced package which
helps us get get some cool features related to graphs
This article covers visualizations using the basic matplotlib
library majorly used in the Data Science field.
Import the Libraries
The first step to work on data visualization with matplotlib is
to import the correct packages for it along with the numpy and pandas
libraries. See the picture below.
What kind of Graphs can be created
using Matplotlib?
- Line Plot- This
plot is mostly used to show a relationship between two data values. One
data value is always dependent on the other data value. The picture below
shows the relationship between x and y.
- Scatter Plot- Scatter
Plots are sometimes called the correlation plots. Its a 2-d data
visualization used to show the relationship between two variables.
- Bar Plot- This chart is
used when data is classified into nominal or ordinal categories. It is
mostly used to compare data and is one of the most used plots in data
visualization.
- Pie Plot- Pie chart is used when we have categorical data in our data set. It is really helpful when we want to know the composition of the different parameters.
- Histogram Plot- It
is similar to a bar graph and is mostly used to assess a probability
distribution. The data mostly here is shown in the form of bins and shows
the frequency distribution.
- Box Plot- It is a visual
representation of statistical five number summary of a given dataset. It
usually shows the minimum, middle, maximum, first and third quartile
values. It is used to see the nature of data and also see the skewness of
data. It is also used to see outliers in a given dataset.
- Density Plot-
A Density Plot visualises the distribution of data over a continuous
interval or time period. This chart is a variation of a Histogram that
uses kernel smoothing to plot values, allowing for smoother distributions
by smoothing out the noise. The peaks of a Density Plot help display where
values are concentrated over the interval. An advantage Density Plots have
over Histograms is that they're better at determining the distribution
shape because they're not affected by the number of bins used.
- Area Plot- An area chart
is a good way to demonstrate trends over time to the viewer. This chart is
based on the line chart. The filled area can give a greater sense of the
trends in a particular dataset.
Summary!
So far we have learnt on how to use some basic and most used
graphs using the basic data visualization package that is matplotlib.
Comments
Post a Comment