## introduction

Exploratory data analysis is a method of evaluating or understanding data in order to obtain insights or key features. EDA Can be divided into two categories: graphical analysis and non-graphical analysis.

EDA is an important component of any data science or machine learning process. To build a reliable and valuable output based on it you must explore the data, understand the relationships between variables, and understand the underlying structure of the data.

This tutorial will cover the EDA steps using the Python programming language.

## dataset

For this article, we will be doing customer churn prediction. When customers stop doing business with a company, it is known as customer churn or customer crunch.

Since the cost of acquiring a new customer is usually higher than keeping an existing customer, understanding customer churn is critical to a company’s success. As a result, churn analysis is the first step in gaining a better understanding of your customers.

To gain a deeper understanding of our data, we will delve deeper into Exploratory Data Analysis (EDA). dataset is available Here,

## import python libraries

First, we need to import all the libraries needed for analysis, namely *panda *to handle data*Numpy *for numerical calculations*matplotlib and seaborn *for visualization.

,

**loading dataset in python**

Now, load the dataset into a pandas dataframe.

## Structured based data exploration

This is the first part of EDA where the data frame is evaluated for structure, column and data types. The goal of this step is to get a general understanding of the dataset.

**Display first 5 observations**

We get the output like this:

**Display last 5 observations**

Output is:

**Display variables and number of comments**

This can be done with df.shape which gives the output as a tuple containing 2 values. The first value counts the number of data points and the second value represents the number of features in the dataset.

This dataframe has 7043 rows and 21 columns.

**Display variable names and their data types**

**Count the number of non-missing values for each variable**

df.count() counts the number of non-empty values. This gives an idea of the missing values in our dataset.

**descriptive statistics**

Now to learn more about the features of the dataset we will use df.describe() which by default returns the statistical information of all the numerical features in our data frame.

df.describe() gives some basic statistical details such as count, percentile, mean, standard deviation, and a 5 point summary including minimum, first quartile, second quartile, third quartile, and maximum numerical features.

**What about graded features?
**

We can also get a summary of all categorical attributes, by providing an include argument and assigning it the value ‘all’.

**Display a complete summary of the dataset**

** df.info()** Summarizes the dataframe, including the data type, size, and memory storage.

## handling missing values

**missing values **There are unknown values in the dataset. It is important to understand the concept of missing values in order to manage data successfully. The first step is to find the missing values in the dataset and then treat them using the appropriate method.

**finding missing values**

We have 11 missing values in ‘Total Charges’ column. Now, we will look at different ways of dealing with them.

**missing value treatment**

To treat missing values we can use the following methods:

Only 11 values are missing for the variable ‘total charge’. Since these data records are comparatively little compared to the total data set, we can discard them.

*Full. let’s check!*

analysis using charts

## data visualization

Let’s imagine the target variable i.e. churn. It has two categories – yes or no.

** display frequently**

The plot shows the class imbalance of the data between churned and non-churned. To address this, re-sampling would be an appropriate approach.

The total fee is the sum total of the monthly fee. So let’s know about their relationship.

- Here we can see that total fee and monthly fee are highly correlated.

Here we are trying to visualize the churn rate with respect to the contract.

This is a visualization of the payment method. It has four categories.

This graph shows the churn rate with respect to dependents.

This graph shows the churn rate with respect to the partners.

## Conclusion

In this article, we tried to analyze customer behavior. First, we explored datasets at a basic level. We looked for missing values and treated them by discarding those values. We then used Pandas DataFrames to perform exploratory data analysis on the sample data by plotting various graphs such as count plots, pie charts, line plots and histplot. This gave us some useful information such as: “customers with month-to-month contracts churned the most”, “total fees and monthly fees were highly correlated”, etc. In this way, we perform EDA on the dataset to explore the data. And get all the possible insights from it, which can help in model building and better decision making.

However, this was only a basic overview of how EDA works; You can go deeper into this and try the steps on larger datasets.

you can contact me linkedin,** **