Ask a question from expert

Ask now

Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project

29 Pages7418 Words424 Views
   

Added on  2022-01-21

About This Document

Project in R - Cardio Good Fitness Data Analysis Project in R - Cardio Good Fitness Data Analysis 1 Project Objective 2 2 Assumptions .2 3 Exploratory Data Analysis – Step by step approach .3 3.1 Environment Set up and Data Import 3 3.1.1 Install necessary Packages and Invoke Libraries 3 3.1.2 Set up working Directory .3 3.1.3 Import and Read the Dataset.3 3.2 Variable Identification .3 3.2.1 Variable Identification – Inferences .3

Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project

   Added on 2022-01-21

BookmarkShareRelated Documents
Project in R -
Cardio Good
Fitness
Data Analysis
Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project_1
Table of Contents
1 Project Objective............................................................................................................................2
2 Assumptions...................................................................................................................................2
3 Exploratory Data Analysis – Step by step approach ...............................................................3
3.1 Environment Set up and Data Import........................................................................................3
3.1.1 Install necessary Packages and Invoke Libraries..............................................................3
3.1.2 Set up working Directory .................................................................................................3
3.1.3 Import and Read the Dataset...........................................................................................3
3.2 Variable Identification...............................................................................................................3
3.2.1 Variable Identification – Inferences.................................................................................3
3.3 Univariate Analysis...................................................................................................................4
3.4 Bi-Variate Analysis....................................................................................................................9
3.5 Missing Value Identification...................................................................................................16
3.6 Outlier Identification..............................................................................................................16
3.7 Variable Transformation / Feature Creation .................................................................. 16 - 19
4 Conclusion.................................................................................................................................... 20
5 Appendix A – Source Code................................................................................................. 21 - 28
Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project_2
1. Project Objective
The objective of the report is to explore the cardio data set (“CardioGoodFitness”) in R and generate
insights about the data set. This exploration report will consist of the following:
- Importing the dataset in R
- Understanding the structure of dataset
- Graphical exploration
- Descriptive statistics
- Insights from the dataset
2. Assumptions
After analysing the data, we can say that this is to identify the profile of the typical customer for
each treadmill product offered by CardioGood Fitness. We can decide to investigate whether there
are differences across the product lines with respect to customer characteristics. Therefore, it has
been decided to collect data on individuals who purchased a treadmill at a CardioGoodFitness retail
store during the prior three months. The data are stored in the CardioGoodFitness.csv file.
It has been identified from the dataset that the following customer variables to study: (product
purchased), TM195, TM498, or TM798; gender; age, (in years); education, (in years); (relationship
status), single or partnered; annual household (income); average number of times the customer
plans to (use the treadmill each week); average (number of miles) the customer expects to walk/
run each week; and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent
shape.
The 180 observations of the dataset relate to 180 unique customers of the treadmill products. The
characteristics in the dataset are linked to the fitness level and treadmill usage characteristics of the
customers. It can also be assumed that the data provide is accurate as per the survey/ data collected
by the company. Below data dictionary is considered for the 9 variables in the dataset:
Sl. No. Dimension Detail Description
1 Product Model of treadmill product (TM195 / TM498 / TM798)
2 Age Age of the customer (Years)
3 Gender Gender of the customer (Male & Female)
4 Education Education of the customer (Years)
5 Marital Status Marital status of the customer (Single & Partnered/ Married)
6 Usage Weekly average number of times the customer plans to use the treadmill
(No. of times per Week)
7 Fitness Weekly average number of miles the customer expects to walk/run on the
treadmill (Miles per Week). 5 being the “very fit” and 1 being “very unfit”
8 Income Annual income of the customer (Assumingly in US$)
9 Miles Total distance covered on the treadmill (Miles)
Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project_3
3. Exploratory Data Analysis – Step by step approach
A Typical Data exploration activity consists of the following steps:
1. Environment Set up and Data Import
2. Variable Identification
3. Univariate Analysis
4. Bi-Variate Analysis
5. Missing Value Treatment (Not in scope for our project)
6. Outlier Treatment (Not in scope for our project)
7. Variable Transformation / Feature Creation
8. Feature Exploration
We shall follow these steps in exploring the provided dataset.
Although Steps 5 and 6 are not in scope for this project, a brief about these steps (and other
steps as well) is given, as these are important steps for Data Exploration journey.
3.1 Environment Set up and Data Import
3.1.1 Install necessary Packages and Invoke Libraries
Use this section to install necessary packages and invoke associated libraries. Having all the
packages at the same places increases code readability.
3.1.2 Set up working Directory
Setting a working directory on starting of the R session makes importing and exporting data files
and code files easier. Basically, working directory is the location/ folder on the PC where you
have the data, codes etc. related to the project.
Please refer Appendix A for Source Code.
3.1.3 Import and Read the Dataset
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for importing the file.
For example: Cardio <- read.csv("CardioGoodFitness.csv")
Please refer Appendix A for Source Code.
3.2 Variable Identification
Following R functions used during the analysis:
- dim (): See dimensions (# of rows/ # of columns) of the data frame.
- names (): See Feature names of the dataset.
- str (): Display internal structure of an R object, to identify classes of the features.
3.2.1 Variable Identification inferences
a. summary (<data frame>): Provides summary of the dataset.
str(Cardio): Provides the structure of the object “Cardio”. Here it states as data.frame for the
object as it has variables which are of “Factor” and “ int” data types.summary (Cardio): gives
summary all the 9 variables like the frequency of each variable:
Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project_4
b. colSums(is.na()): Check missing values. There are no missing values in the data set.
No. of Observations No. of Variables Dimension
180 9
No. of Females No. of Males
76 104
Marital Status (Partnered) Marital Status (Single)
107 73
Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project_5
3.3 Univariate Analysis
Univariate Analysis can be done on the Categorical Variables and Numeric Variables.
The Categorical Variables are: Product, Gender and Marital Status:
Using the following histogram (Bar Chart) to represent the distribution by the three categorical
variables using Bar Plot and Pie Chart:
Data Import and Read the Dataset in R - Cardio Good Fitness Data Analysis Project_6

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Analysis Assignment Keels Agency
|13
|2823
|352