Data Mining Solutions for Direct Marketing Campaigns: Project Analysis

Verified

Added on  2022/08/10

|40
|5414
|24
Project
AI Summary
This project presents a comprehensive data mining solution for optimizing direct marketing campaigns, specifically focusing on a Portuguese bank's phone-based marketing efforts. The analysis utilizes a dataset of 45,211 customer records to predict whether a customer will subscribe to a term deposit, aiming to enhance campaign efficiency and reduce costs. The project begins with descriptive studies and data visualization to understand customer profiles and identify key trends. Subsequently, it employs three machine learning techniques: Decision Tree, Logistic Regression, and Support Vector Machine (SVM). The models are evaluated, and Logistic Regression is identified as the most accurate, achieving 90% accuracy. The findings highlight the significant roles of job profile, education, marital status, contact duration, contact month, and prior campaign results in influencing customer decisions. The project concludes with recommendations for the bank to refine its strategies by considering these factors, thereby increasing the success rate of future marketing campaigns. The analysis is performed using R software, and relevant codes are included in the appendix.
Document Page
Running Head: DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Name of the Student:
Name of the University:
Author Note:
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
1DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Executive Summary
Banks always adopt different strategies to enhance their business. Marketing campaign is
one of the typical strategies. In this case, a bank has decided to conduct an over-phone marketing
campaign. For this, they require a cost efficient solution based on the client profile to see
whether a customer will accept their offer or not. Here a Portuguese bank dataset has been given
based on 45211 customers. The main goal is to find whether a user takes the subscription or not.
To find out this, it is necessary to identify those factors responsible for the success of a
campaign. For this, some descriptive studies have been done at first. The graphical representation
of the variables show that people having low balance, good educational qualification are much
more interested in term loan. If the number of contacts for a client exceeds 10, then it would lead
to certain rejection. In the data analysis part, machine learning tools such as Decision tree,
logistic regression, SVM are applied to the data (Lantz 2013). The accuracy of the models are
very close to each other. However, logistic has shown the best fit to the data with 90% accuracy.
It also shows that job profile, qualification, marital status, duration of contact, month of contact,
result of previous campaign have significant role in the decision of a customer regarding the
bank offer. Therefore, it can be suggested that if bank considers these facts, then their strategies
will become more successful.
Document Page
2DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Table of Contents
Introduction......................................................................................................................................3
Discussion........................................................................................................................................4
Data Description..........................................................................................................................4
Data Validation............................................................................................................................5
Descriptive Study.........................................................................................................................5
Data Visualization.......................................................................................................................8
Data Analysis:............................................................................................................................14
Decision Tree:.....................................................................................................................14
Logistic Regression:...........................................................................................................16
Support Vector Machine(SVM):........................................................................................19
Comparison of the models:........................................................................................................21
Conclusion.....................................................................................................................................22
References......................................................................................................................................23
Appendix........................................................................................................................................25
Document Page
3DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Introduction
A bank has decided to conduct a marketing campaign based on phone calls. For this,
previous records of a Portuguese bank have been provided (Moro 2014). The main objective is to
find out a cost efficient solution to support the campaign based on a customer’s profile. In other
words, it is required to find whether a client will accept the offer for term deposit given by the
bank or not.
To achieve this goal, a thorough analysis have been done here. First of all, some
descriptive studies have been performed to observe the nature of the variables. Then some
graphical representation has been made to see the trends, patterns of the data. In the analysis part,
three machine learning techniques have been performed- Decision Tree, Logistic Regression and
Support Vector Machine (SVM). The best model for the data is chosen by comparing these three
models. The results show the significant factors that were responsible for the outcome of the
campaign. Hence, if those factors are considered by bank, then it would lead to higher success
rate of the proposed programme. In other way, it can be said that if bank takes necessary steps
for those factors, then more customer will be convinced to accept the proposal.
The necessary calculations are performed using R software (Cirillo 2016). The relevant R
codes are given in the Appendix.
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
4DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Discussion
Data Description
The data is based on marketing campaigns of a Portuguese banking institution. The
campaigning was conducted over phone-calls. Further, it should be noted that often more than
one time a same client was contacted to make sure whether he/shell will open a term deposit or
not. There are 45211 observations with 17 variables. The variables are:
Nominal Variables:
Job: Occupation of the client
Marital: Marital Status of the client (single, married,divorced)
Education: Education level of the client
Default: Does the person have credit in default? (Yes or No)
Housing: Does he have any housing loan? (Yes or No)
Loan: Does the client have any loan?(Yes or No)
Contact: contact communication type( “cellular”, ”telephone”, ”unknown”)
Month: last contact month of the year
poutcome: outcome of the previous campaign( “failure”, “non-existent”, “success”)
y: has the client subscribed a term deposit(“yes”, “no”)
Numerical Variables:
Age: Age of the client
Balance: balance of the client
Day: last contact day in a week
Document Page
5DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Duration: duration of contact in last time(seconds)
Campaign: number of contacts performed for a client during the campaign
pdays: number of days after the client was contacted last time
previous: number of contacts performed before this campaign for a client
The relevant R codes are shown in the Appendix.
Data Validation
At first, it is required to check whether the data has any missing values or duplicate rows.
Here the data has no missing value. Hence, data cleaning tools were not required for this study.
Descriptive Study
An overall descriptive analysis is performed on the bank data (Plonsky 2015). The results
are given below.
Document Page
6DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
7DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Document Page
8DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Table 1.1 shows that on average, age of a client is 40.9441 years. The median shows
that 50% clients have age less than 39 years. First and third quartile show that 25% clients are
below 33 years and 25% client are above 48 years respectively. The minimum age is 18 years
and maximum is 95 years.
The mean balance is 1362. First and third quartile show that in 25% cases, the balance is
less than 72 and greater than 1428 respectively.
Out of 45211 people, 815 clients have credit in default. 25130 persons have housing
loans, 7244 have personal loan. Only 5289 clients have subscribed a term deposit (table 1.8). In
previous campaign, there are 1511 successful cases and 4901 failed cases (table 1.14).
Document Page
9DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Data Visualization
Data visualization tools provide a convenient way to see, understand the patterns of the
data (Visser 2017). Hence, it is more appropriate to represent the variables in the data by means
of visual elements like graphs, charts, maps (Wickham 2016).
Graph 1: Age Distribution of the Clients
0
2500
5000
7500
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Age
Count
Age Distribution
The graph shows that most of the clients have age between 25 years to 60 years. There
are very few people above 70 years.
Graph 2: Age vs Marital Status of client having term deposit
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
10DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
It can be observed that most of the clients are either single or married irrespective of
having term deposit subscription. The number of divorced clients having term deposit or not is
comparatively very small.
Graph 3: Age Distribution of the clients who have subscribed term deposit
no yes
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 9510015 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95100
0
2000
4000
6000
8000
Age
Count
Age Distribution by Subscription
Most of the clients who do not have term deposit are between age 25 to 55 years.
Similarly, maximum subscribed cliengs have age between 25 to 55 years.
Document Page
11DATA MINING SOLUTIONS FOR DIRECT MARKETTING CAMPAIGNS
Graph 4: Balance vs Term Deposit
no yes
0 25000 50000 75000 100000 0 25000 50000 75000 100000
0
10000
20000
30000
Balance
Count
Balance Histogram
It can be observed that the clients who have very less balance have subscribed term
deposit. On the other hand, people with good balance do not have term deposit subscription.
Graph 5: Education vs Term Deposit
0
5000
10000
15000
20000
primary secondary tertiary unknown
Education Level
count Subscription of Term Deposit
no
yes
Term Deposit Subscription based on Education Level
Most of the people who have subscribed term deposit are having secondary or tertiary
level of educational qualification. Persons who have only primary education are not that much
interested in the subscription.
chevron_up_icon
1 out of 40
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]