SIT742 Modern Data Science Report: Portuguese Bank Campaign Analysis

Verified

Added on 2022/11/28

AI Summary

This report analyzes a Portuguese bank's marketing campaign data to improve future strategies. It begins with data wrangling, including cleaning, removing irrelevant columns, and handling missing data. The report then explores supervised and unsupervised learning methods, converting categorical variables into dummy variables for machine learning algorithms. It examines the effects of various variables on customer deposit decisions, highlighting the importance of statistical variables. The report emphasizes the benefits of group work for error detection and improved analysis results. It references relevant publications on data wrangling and machine learning.

Modern 1
Modern Data Science
Name of Author
Name of Class
Name of Professor
Name of School
State and City of School
Date

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Modern 2
Attribute Distribution
Data provided is on the campaign of a Portuguese bank which aims at improving
marketing campaigns in the future. Customers were to be persuaded to subscribe to terms deposit
that is the very last variable in the bank.csv file provided. There are seventeen (17) variables that
were picked for focus during the campaign. The variables are age, job, marital, education,
default, balance, housing, loan, contact, day, month, duration, campaign, pdays, previous,
poutcome, deposit
Algorithms for data Wrangling and Processing
Data wrangling is the only process in data science and data analysis that takes the longest
time when it comes to the actual analysis of a data set. Wrangling in other terms is the cleaning
of data, removing outliers, filling or removing missing data, malicious data removal, erroneous
data, irrelevant and inconsistent data preparation as well as better formatting of data. There are
actually irrelevant data columns that would not bring forth too much of a change in the analysis
of the campaign dataset and therefore have to be dropped. This would bring us to a total of
thirteen (13) columns in total. Where father deletion of columns that would not bring match
impact on the machine learning algorithms too had to be dropped. The actual codes for dropping
these very columns are embedded in the Ipython notebook (Kazil and Jarmul, 2016).
There is the checking of the missing data entries, but in the entire data cells with 11162
observations, there are no missing cells and therefore there won’t be any cells to be filled or
dropped. In summary, this is the normalization stage of any dataset. Of the variable that was
marked with the ‘yes' and the ‘no' entries, the only way to perform proper machine learning
analysis is by changing them into dummy variables of zeros (0s) and ones (1s).
Supervised and Unsupervised learning
Computers and machines scientifically study algorithms and statistical models and use
them by relying on pattern and inference instead of explicit instruction. There are three types of
machine learning and these are supervised machine learning, unsupervised machine learning and
reinforced learning. In this study, the focus will be on supervised and unsupervised learning.
Supervised learning, there is usually a target variable and one or more than one predictor
variables. There is a usual sense of classification and determination that drives to a set point.
Looking at the variables which can be changed to dummy variables using one hot encoding,
since machine learning algorithms do not run string variables, you realize that the supervised
learning results are the best with better plots that are extremely understandable to a data scientist
and that can easily be explained to an individual of a different discipline (Müller and Guido,
2016).

Modern 3
Objective effects
All the variables that are chosen for study are in one way or the other having some effect
on a customer’s agreement to depositing in an account at the bank. The R-value stands at past
0.5, a true indication of how the effects are more positive.
In order to achieve a true improvement in the analysis results, all that needs to be done is the
inclusion of more statistical variables that would serve as if there are more determinants to a
specific target variable and not only one or a few predictor variables.
Working in groups
This is the best thing that a student can have to go through as one experiences the
different talents and insights that there are in a group. In this way, it is easier for error detection
that would help in the actual analysis process. This boosts results.

Modern 4
Kazil, J. and Jarmul, K., 2016. Data wrangling with Python: tips and tools to make your life
easier. " O'Reilly Media, Inc.".
Müller, A.C. and Guido, S., 2016. Introduction to machine learning with Python: a guide for data
scientists. " O'Reilly Media, Inc.".

1 out of 4

SIT742 Modern Data Science Report: Portuguese Bank Campaign Analysis

Paraphrase This Document

Related Documents

Teesside University: CIS4035-N Machine Learning Application Report

University Project: Machine Learning for Thera Bank Loan Prediction

+13062052269

info@desklib.com

SIT742 Modern Data Science Report: Portuguese Bank Campaign Analysis

Paraphrase This Document

⊘ This is a preview!⊘

Related Documents

Teesside University: CIS4035-N Machine Learning Application Report

University Project: Machine Learning for Thera Bank Loan Prediction

+13062052269

info@desklib.com