Advance Analytic Capability of Different Issues Using Python & R

Verified

Added on 2021/12/11

AI Summary

This project provides a comparative analysis of Python and R programming languages for addressing various challenges in artificial intelligence and machine learning. It begins with an introduction to the languages, highlighting their features, advantages, and applications in areas such as data science, data mining, and web development. The project then delves into specific analytical techniques, including analytical tools, classification, association analysis, correlation, regression, clustering, and anomaly detection. The assignment provides code examples and comparative discussions of how each technique can be implemented using both Python and R, along with the libraries and tools used for data analysis and visualization. The project emphasizes the strengths of each language in different contexts, particularly highlighting R's capabilities in statistical analysis and Python's versatility in diverse AI applications. The project aims to provide insights into the practical application of Python and R in tackling complex data-driven problems within the domain of artificial intelligence.

Advance Analytic Capability of Different Issues Using Python & R
STUDENT NAME:
STUDENT ID:
1

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Table of Contents
Introduction......................................................................................................................................2
Python..............................................................................................................................................2
R.......................................................................................................................................................4
Machine Learning............................................................................................................................5
Analytical tool.................................................................................................................................5
Association....................................................................................................................................12
Correlation.....................................................................................................................................20
Regression......................................................................................................................................23
Clustering.......................................................................................................................................25
Anomaly Detection........................................................................................................................30
Reference list.................................................................................................................................35

Introduction
In recent era of development, we face different challenges related to automotive structure of
systems. As our world undergoes in very fast pace, each and every objective and work needs
perfection but human it barely performs such. So they go for programming and algorithm for
doing such. In our everyday life, we see several intelligent applications in out computers and
mobiles which include automated assistance, speech recognizer, face recognizer, map assistance
and route director, product assistance, product comparison analyzer and many more (Dincer et
al. 2017). We live in our world to make it beautiful and those are the inevitable requirements.
Those applications are done using different programming languages as per the ability to write
code and all.
Intelligent application or more specifically, Artificial Intelligence applications require something
more than the conservative programming architecture. Thus, we have to choose such languages
which meetthese criteria. Lots of Languages are available for this purpose like C, C++, Java,
Python, Matlab, Ruby, PHP, R etc. Those have their separate library structure and different
procedures. For AI applications, we should choose such languages which have enough rich
library and wide variety of coding structure and should be human friendly. Thus much of the
majority chose Python and R for this purpose (Volk et al. 2017).
Now first we see the features of those in a comparative manner.
Python
Python was developed and introduced by Guido Van Rossum in 1991. It has rich backbone of
C++. Python is a exclusive Object Oriented Language specially used for Scientific and Analytic
purpose. Data visualization is very easy in Python and so that if data analysis and visualization
comes in the scenario, we can perform such task using Python.
 Features

The important features and advantages of Python are (Antony et al. 2017):
1. Python is Dynamically Typed Language means it does not require any variable to be
assigned with its type rather those variable types are dynamically allocated.
2. Python is Interpreted Language.
3. The coding execution is very easy in Python
4. Easy to read the codes and syntaxes.
5. It is very expressive language as the syntax and codes are used mostly based on English
meaningful phrases.
6. Python is an Open-Source Language and all the packages need to be incorporated also
they are open sources.
7. This is a High Level OOP Language.
8. It supports portability that is once you have written any code in any OS, this code can be
run in any other platform or OS.
9. It has an extensible facility of merging and transforming code to different languages like
C, MATLAB etc.
10. Python code support embedding into device for example Micro Python, Raspberry Pi.
11. It has large standard library.
12. GUI programming is much easier.
 Applications:
Python is applicable in many areas like:
 Numerical Computation
 Scientific Computation
 Machine Learning
 Data Science
 Data Mining
 Web Development
 Artificial Intelligence.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

R
Another programming language of interest is R. R is basically and widely used for scientific and
Statistical computations. It has a facility to have one of the richest libraries among all the
languages (Nelson, 2016). As for the richness and wide availability, many of the data scientists
and scientist of machine learning prefer R. R was introduced by Ross Ihaka and Robert
Gentleman in 1993.
 Features
The features of R are:
1. This is a well-developed, simple and effective programming language.
2. It has an effective data handling and storage facility,
3. It provides a range of operators for calculations on arrays, lists, vectors and matrices.
4. It provides a large, coherent and integrated collection of tools for data analysis.
5. It provides GUI facility.
 Applications
R is applicable in many different areas like:
 Data Analysis
 Case matching
 Statistical Analysis
 Online data mining
 Face and Tag detection
 Data Science
 Machine Learning
 Artificial Intelligence

Let we discuss about one of applications which can be done using both Python and R. One of the
most demanding and hot topic for now a days is Machine Learning.
Machine Learning
Machine Learning is a set of algorithms with which the programmer will assign some objective
and which tends to learn the algorithm as per the requirement. Here program is not done
explicitly rather the algorithm is made such that it capable of learning from the environment with
being explicit definition of each cases (Prakash, 2015).
Machine Learning uses different integration of issues like data, algorithm analysis tool and a
platform to execute these. When the part of analysis comes into the scenario, it is inevitable to
use the statistics. Statistics helps the algorithm to analyze data is depth with application of
statistical calculations.
Here we will make the comparative discussion on the following topics with analysis and
simulation based on both Python & R. The topics are:
 Analytic tool
 Classification
 Association
 Correlation
 Regression
 Clustering
 Anomaly Detection
Analytical tool
Analytical tool stands for the analysis on a given data. The dataset is provided and we have to
analysis these with Python & R and have again to compare.

Data analysis with Python:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
file=pd.read_csv(‘//fruit.csv)
file.head()
name=file[‘Name’]
col=file[‘Color Score’]
puri=file[‘Purification’]
plt.title(‘Plotting Color Score and
Purification of Fruit’)
plt.plot(col)
plt.plot(puri)
Data analysis with R:
library(tidyverse)
ggplot(data = fruits) +
geom_bar(mapping =aes(x = cut))
ggplot(data = fruits) +
geom_histogram(mapping =aes(x = Purification), binwidth =0.5)
smaller <-fruits %>%
filter(Color Score’<0.5)
ggplot(data = smaller, mapping =aes(x = Purification)) +
geom_histogram(binwidth =0.1)
Output:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Comparative Discussion:
In Python, while analyzing data, we have to use libraries like pandas and matplotlib. Pandas
library is exclusive for data mining and data extraction whereas matplotlib is applicable for data
visualization. With these two libraries we can easily analyze and visualize the data. In R, ggplotis

the library basically employed for analysis and visualization purpose. For this execution,
tidyverse library is required. The histogram plot defines the amount of data is available for which
extent.
 Classification
Classification is a technique or better to say a set of algorithms to classify data from an entire
dataset. Several algorithms are available like Logistic Regression, Neive Bayas, Gradient
Descent, K-Nearest Neighbor, Decision Tree, Random Forest etc. They can be implemented
using python as well as R. Classification is the technique to classify the data with respect to some
parameter (Moshfeq et al. 2017)It is actually a Supervised Learning where data is known to us.
Here choose some parameter to separate and segmentize the data as per required and based on
the dataset. Let we see the code structure.
Using Python:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
link="<path>.csv"
file=pd.read_csv(link)
file.head()
collist=file.columns.tolist()
hd1=np.array(file[collist[0]])
hd1u=np.unique(hd1)
print(hd1u)hd2=np.array(file[collist[1]])

hd2u=np.unique(hd2)
print(hd2u)
hd3=np.array(file[collist[2]])
hd3u=np.unique(hd3)
print(hd3u)
Burglary=[]
CriminalDamage=[]
Drugs=[]
FraudorForgery=[]
OtherNotifiableOffences=[]
Robbery=[]
SexualOffences=[]
TheftandHandling=[]
ViolenceAgainstthePerson=[]
for i in range(len(hd1)):
if hd2u[0] == hd2[i]:
Burglary.append(i)
elif hd2u[1] == hd2[i]:
CriminalDamage.append(i)
elif hd2u[2] == hd2[i]:
Drugs.append(i)
elif hd2u[3] == hd2[i]:
FraudorForgery.append(i)
elif hd2u[4] == hd2[i]:
OtherNotifiableOffences.append(i)
elif hd2u[5] == hd2[i]:

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser

Robbery.append(i)
elif hd2u[6] == hd2[i]:
SexualOffences.append(i)
elif hd2u[7] == hd2[i]:
TheftandHandling.append(i)
elif hd2u[8] == hd2[i]:
ViolenceAgainstthePerson.append(i)
else:
pass
print("Burglary:\n",Burglary,"\n")
print("CriminalDamage:\n",CriminalDamage,"\n")
print("Drugs:\n",Drugs,"\n")
print("FraudorForgery:\n",FraudorForgery,"\n")
print("OtherNotifiableOffences:\n",OtherNotifiableOffences,"\n")
print("Robbery:\n",Robbery,"\n")
print("SexualOffences:\n",SexualOffences,"\n")
print("TheftandHandling:\n",TheftandHandling,"\n")
print("ViolenceAgainstthePerson:\n",ViolenceAgainstthePerson,"\n")
list1=file.iloc[Drugs[0]].tolist()[3:]
plt.figure(figsize=(30,10))
plt.title("Drugs Damage Comparative Analysis",fontsize=28)
for i in range(len(list1)-1):
list3=list1+file.iloc[Drugs[i]].tolist()[3:]
plt.plot(list3,"r",label='Criminal Damage(2016-2018)')
plt.xlabel("Damage quantity")
plt.ylabel("Type of Damage")
plt.legend(loc='upper right',prop={'size': 16})

plt.grid()
Using R:
library(caret)
set.seed(7267166)
trainIndex=createDataPartition(mydata$prog, p=0.7)$Resample1
train=mydata[trainIndex, ]
test=mydata[-trainIndex, ]
print(table(mydata$prog))
print(table(train$prog))
library(e1071) ## Classifier
NB=naiveBayes(prog~science+socst, data=train)
print(NB)
library(naivebayes)
newNB=naive_bayes(prog~sesf+science+socst,usekernel=T, data=train)
printALL(newNB)