Limited-time offer! Save up to 50% Off | Solutions starting at $6 each  

Project in R - Cardio Good Fitness : Exploratory Data Analysis

Added on - 21 Jan 2022

Trusted by 2+ million users,
1000+ happy students everyday
Showing pages 1 to 6 of 29 pages
Project in R -
Cardio Good
Fitness
Data Analysis
Table of Contents
1Project Objective............................................................................................................................2
2Assumptions...................................................................................................................................2
3Exploratory Data Analysis – Step by step approach...............................................................3
3.1 Environment Set up and Data Import........................................................................................3
3.1.1Install necessary Packages and Invoke Libraries..............................................................3
3.1.2Set up working Directory .................................................................................................3
3.1.3Import and Read the Dataset...........................................................................................3
3.2 Variable Identification...............................................................................................................3
3.2.1Variable Identification – Inferences.................................................................................3
3.3 Univariate Analysis...................................................................................................................4
3.4 Bi-Variate Analysis....................................................................................................................9
3.5 Missing Value Identification...................................................................................................16
3.6 Outlier Identification..............................................................................................................16
3.7 Variable Transformation / Feature Creation .................................................................. 16 - 19
4Conclusion.................................................................................................................................... 20
5Appendix A – Source Code................................................................................................. 21 - 28
1.Project Objective
The objective of the report is to explore the cardio data set (“CardioGoodFitness”) in R and generate
insights about the data set. This exploration report will consist of the following:
-Importing the dataset in R
-Understanding the structure of dataset
-Graphical exploration
-Descriptive statistics
-Insights from the dataset
2.Assumptions
After analysing the data, we can say that this is to identify the profile of the typical customer for
each treadmill product offered by CardioGood Fitness. We can decide to investigate whether there
are differences across the product lines with respect to customer characteristics. Therefore, it has
been decided to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail
store during the prior three months. The data are stored in the CardioGoodFitness.csv file.
It has been identified from the dataset that the following customer variables to study: (product
purchased),TM195, TM498, or TM798; gender; age, (in years); education, (in years); (relationship
status), single or partnered; annual household (income); average number of times the customer
plans to (use the treadmill each week); average (number of miles) the customer expects to walk/
run each week; and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent
shape.
The 180 observations of the dataset relate to 180 unique customers of the treadmill products. The
characteristics in the dataset are linked to the fitness level and treadmill usage characteristics of the
customers. It can also be assumed that the data provide is accurate as per the survey/ data collected
by the company. Below data dictionary is considered for the 9 variables in the dataset:
Sl. No.DimensionDetail Description
1ProductModel of treadmill product (TM195 / TM498 / TM798)
2AgeAge of the customer (Years)
3GenderGender of the customer (Male & Female)
4EducationEducation of the customer (Years)
5Marital StatusMarital status of the customer (Single & Partnered/ Married)
6UsageWeekly average number of times the customer plans to use the treadmill
(No. of times per Week)
7FitnessWeekly average number of miles the customer expects to walk/run on the
treadmill (Miles per Week). 5 being the “very fit” and 1 being “very unfit”
8IncomeAnnual income of the customer (Assumingly in US$)
9MilesTotal distance covered on the treadmill (Miles)
3.Exploratory Data Analysis – Step by step approach
A Typical Data exploration activity consists of the following steps:
1. Environment Set up and Data Import
2. Variable Identification
3. Univariate Analysis
4. Bi-Variate Analysis
5. Missing Value Treatment (Not in scope for our project)
6. Outlier Treatment (Not in scope for our project)
7. Variable Transformation / Feature Creation
8. Feature Exploration
We shall follow these steps in exploring the provided dataset.
Although Steps 5 and 6 are not in scope for this project, a brief about these steps (and other
steps as well) is given, as these are important steps for Data Exploration journey.
3.1Environment Set up and Data Import
3.1.1 Install necessary Packages and Invoke Libraries
Use this section to install necessary packages and invoke associated libraries. Having all the
packages at the same places increases code readability.
3.1.2 Set up working Directory
Setting a working directory on starting of the R session makes importing and exporting data files
and code files easier. Basically, working directory is the location/ folder on the PC where you
have the data, codes etc. related to the project.
Please refer Appendix A for Source Code.
3.1.3 Import and Read the Dataset
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing the file.
For example:Cardio <- read.csv("CardioGoodFitness.csv")
Please refer Appendix A for Source Code.
3.2Variable Identification
Following R functions used during the analysis:
-dim (): See dimensions (# of rows/ # of columns) of the data frame.
-names (): See Feature names of the dataset.
-str (): Display internal structure of an R object, to identify classes of the features.
3.2.1 Variable Identification inferences
a.summary (<data frame>): Provides summary of the dataset.
str(Cardio): Provides the structure of the object “Cardio”. Here it states as data.frame for the
object as it has variables which are of “Factor” and “ int” data types.summary (Cardio): gives
summary all the 9 variables like the frequency of each variable:
b.colSums(is.na()): Check missing values. There are no missing values in the data set.
No. of ObservationsNo. of Variables Dimension
1809
No. of FemalesNo. of Males
76104
Marital Status (Partnered)Marital Status (Single)
10773
3.3Univariate Analysis
Univariate Analysis can be done on the Categorical Variables and Numeric Variables.
The Categorical Variables are:Product, Gender and Marital Status:
Using the following histogram (Bar Chart) to represent the distribution by the three categorical
variables using Bar Plot and Pie Chart:
desklib-logo
You’re reading a preview
Preview Documents

To View Complete Document

Click the button to download
Subscribe to our plans

Download This Document