logo

Principles of Data Science for Business: Assessment of Verdilan Horticulture Outline

   

Added on  2022-10-10

12 Pages2120 Words88 Views
1
RUNNINGHEAD: Principles of Data Science for Business
1. Assessment of Verdilan Horticulture Outline
In an ideal business setting, especially one that deals with a large number of operations such as
that of Verdilan Horticulture, it is often quite crucial that business activities are conducted with
adequate precision and with as much autonomy as possible. The advent of data collection and
introduction of data science into the modern day business practices have set a foundation and an
ultimate way that autonomy and precision can be based on thus improving the overall business
performance.
Flower Classification
The overall business objective of this paper is to propose a feasible flower classification based on
the original paper by Fisher in 1936. Hence, it will use a similar dataset obtained from
https://www.kaggle.com/abhijeetupadhyay/classifying-flowers-using-kNN. (Guru, et al., 2010)
In particular, the paper will use Excel’s xlstat add-in to conduct kNN classification of the
flowers.
2. Overview of Investigation
Our investigation follows the process of importing of data into the analysis environment before
drawing any insights regarding the results obtained. Since one of the main objectives was to
evaluate whether excel can be used as a classification tool, after some research, we found
XLSTAT ™ which is an excel add-in useful for statistical analysis including machine learning
which is classified under data science.
When implementing the kNN algorithm, a cross validation option was adopted which enables us
to train several models (in our case several kNN models) this has the potential of preventing
overfitting thus increasing the accuracy of the model.
3. Results of Analysis
Our objective is to evaluate the feasibility of flower classification as a means to base the firm’s
decisions as whether to repeat orders or not. To this end, we used a dataset of size 150 which
includes 4 variables. After our analysis, we obtained the following results:
Principles of Data Science for Business: Assessment of Verdilan Horticulture Outline_1
2
Principles of Data Science for Business
Descriptive Analysis
Table 1
Summary statistics:
Training set:
Variable CategoriesFrequencies %
species setosa 41 34.167
versicolor 41 34.167
virginica 38 31.667
The above table shows the descriptive statistics of our training data which will be used to train
the model. In the table, we note that the total number of Iris-Setosa variants are 41 while that of
Iris-Versicolor are 41 and Iris-Virginica are 38 which is a relatively presentative sample
selection. Further, the average length of the sepals and petals are 5.803 and 3.711 respectively
while the sepal and petal width are 3.039 and 1.194.
Given the data, our aim is to be able to classify the Iris flowers as falling to either class. The
results of the classification are given below:
Principles of Data Science for Business: Assessment of Verdilan Horticulture Outline_2
3
Principles of Data Science for Business
Table 2
Cross-validation:
Loss estimate using cross-valid
Results by class:
Class setosa versicolor
Objects 9 10
PredObs1 PredObs5
PredObs3PredObs11
PredObs4PredObs15
PredObs10PredObs18
PredObs14PredObs19
Earlier on, the data set was split in the ratio 80:20. When applying the classification model on the
test data, the classes of the Iris flower were classified as follows, the Setosa variant had 9
flowers, Versicolor has 10 flowers while Virginica has 11.
Interpretation of the kNN algorithm results
The error rate when using the kNN algorithm is 0.058 as given in table 3 below which is
relatively low hence our model has a good performance (SRIVASTAVA, 2018).
Table 3
Loss estimate using cross validation of the model.
Prediction set:
VariableObservationsObs. with missing data
sepal_leng 30 0
sepal_widt 30 0
Our model further indicates that the Iris-Virginica is the most observed variant of the flower
species hence, it has a higher probability of being ordered in the event that the firm deals with
Iris flowers only.
Based on our results and reference to study objectives, we can conclude that it is true a simple
plant classification is possible which however will require an extensive knowledge of statistics
Principles of Data Science for Business: Assessment of Verdilan Horticulture Outline_3
4
Principles of Data Science for Business
and implementation of data science. To fulfill the usage of classification, further suggestions
regarding the firm’s objectives as listed in points 3 and 4 are given in the subsequent sections.
4. Ethical and Security Considerations
Given that this project is most likely going to include the use of third party software, some of the
ethical concerns lie in the integrity of the collected data as well as security of such data.
Data Integrity
Collection of data on a large scale will necessitate that the firm integrates a data collection
system. As such, the integrity of the data will solely depend on how reliable the data collection
system is it is therefore crucial that the firm evaluates the suitability of such systems since the
outcome of the analysis in case the firm chooses to adopt kNN classification method will rely on
the collected data and ultimately influence the firm’s decision-making.
Security of Data
Excel, which is the software that is evaluated whether it will be useable does not in itself have
the ability to enable machine learning. The use of Xlstat as a subsidiary to excel might pose a
security threat to the data which is used for analysis. Hence, the firm should put in place security
measures to ensure the data cannot be manipulated.
5. Next Steps and Potential Solutions
In data science, the choice of machine learning tools often lies in the performance of such tools.
That is, it is possible that different tools have different performance metrics under different
circumstances. Either way, the whole process of integrating data science to business requires that
a number of factors be taken into consideration including data collection systems, machine
learning algorithm implementation and evaluation as well as interpretation of the results before
being adopted for decision-making. As such, for the integration of data science as a classification
tool for Verdilan Horticulture, we recommend the following to be taken for consideration by the
executive:
Data collection
Machine learning which is the main component of data science which can be used for plant
classification mainly deals with data. Thus, the firm should integrate a reliable data-collection
Principles of Data Science for Business: Assessment of Verdilan Horticulture Outline_4

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Data Science Study Material
|18
|2019
|80

(PDF) SVM Classification with Linear and RBF kernels
|5
|1826
|79

WEKA Data Analysis - Assignment
|20
|1581
|387

Object and Data Modelling
|30
|2071
|459