logo

Data Science Practice Assignment 2022

   

Added on  2022-10-17

12 Pages1674 Words15 Views
Data Science Practice 1
Data Science Practice
My Name
Course Title
Professor name
Date
Data Science Practice Assignment  2022_1
Data Science Practice 2
Introduction 3
Shape 3
Describription 3
Covariance 4
Correlation 5
Aggregation 5
Visualization 6
Machine Learning Algorithm 8
Collaborative filtering 9
Logistic Regression 9
K-Means 11
Conclusion 11
References 12
Data Science Practice Assignment  2022_2
Data Science Practice 3
Introduction
In this assignment, we have chosen to perform a machine learning algorithm to help in
identifying the core features of the rental properties that attracts most of the potential renters. To
achieve this task, we used the dataset provided by datacamp in abid to train the machine
learning algorithm and get the most of out the python library, Pyspark which was the main
library to train the model and get the results.
Before the machine learning algorithm was implemented, we conducted an exploratory data
analysis onto the data set to make some obvious hypothesis about the data set. This is as
explained in the subsequent subsections;
Shape
To get a better understanding of our dataset, we used the shape method to get the number of
rows and columns we are dealing with. This help inform the better algorithm to use to efficiently
conduct the machine learning module. The result of the shape is as shown below
Machine Learning Implementation.
Describription
In order to get a summary of the data set we are currently exploring, we used the pandas
describe method that returns some key statistical measurements for our data set data points.
This was helpful in understanding the mathematical representation of the data set for better
exploration (Callaghan et al., 2019). The method was able to return to us the
Data Science Practice Assignment  2022_3
Data Science Practice 4
To get for multiple columns, we used the following represents the multiple columns summary
descriptions for the data sets as shown below,
Covariance
The covariance is very important exploratory data analysis result that can be used to show how
two variables are able to change with respect to each other. A positive number from the
covariance method means that there is a general tendency that as one variable increases, so
does the other one (Dahbur, Mohammad and Tarakji, 2011). While a negative covariance in the
data sets meant that as one variable increases, the other variable it is being compared to
decreases (Vouros et al., 2019). This statistical measure was computed as shown below
Data Science Practice Assignment  2022_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Machine Learning Research Paper 2022
|12
|1485
|13

Data Science Practices Using Pyspark Project 2022
|13
|1910
|10

(PDF) SVM Classification with Linear and RBF kernels
|5
|1826
|79

Machine Learning | Data Classification.
|10
|1469
|24

Modern Data Science
|4
|759
|142

Machine Learning and Artificial Intelligence | Data Classification
|10
|1545
|17