ProductsLogo
LogoStudy Documents
LogoAI Grader
LogoAI Answer
LogoAI Code Checker
LogoPlagiarism Checker
LogoAI Paraphraser
LogoAI Quiz
LogoAI Detector
PricingBlogAbout Us
logo

Principles of Data Science for Business

Verified

Added on  2023/01/13

|21
|4318
|23
AI Summary
This document discusses the principles of data science for business and how it can help in making effective decisions. It includes a case study on Itineract Travel Co, analyzing their data and providing recommendations. The document also covers ethical and security considerations in data science and the role of data scientists with industry-specific expertise.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Principles of Data Science
for Business

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
Section 1: Assessment of Itineract Travel Co briefing note:......................................................1
Section 2: Overview of investigation..........................................................................................2
Section 3: Analysis and results ..................................................................................................3
Section 4: Ethical and security considerations .........................................................................11
Section 5: Data Science in Next Steps and Potential Solutions:...............................................12
Report Appendix: Statistics and Methodology:........................................................................13
REFERENCES .............................................................................................................................16
Document Page
Document Page
ITINERACT TRAVEL CO – SEARCHABILITY CHALLENGE:
REPORT & RECOMMENDATIONS
Section 1: Assessment of Itineract Travel Co briefing note:
Data studies develop to be one of qualified experts that help in making various important
results which are beneficial for making effective decision. Now effective computer experts
recognize that they need to learn the conventional expertise in data analysis, data collection and
coding in large quantities (Data science, 2020). Data scientists need to monitor the full scope of
the data science development cycle and have a degree of independence and comprehension to
optimize return in every step of the way in which to discover valuable intelligence for certain
companies. Data scientists must be knowledgeable and results focused, including excellent
industry-specific expertise and interpersonal skills that enable themselves to clarify their multi-
technical peers technical findings. We have a clear theoretical track record in mathematics and
linear mathematics and computer expertise of data storage, mining and software modelling
emphasis. Customer data for the last 6 months has been extracted from the company database
and shared in the attached excel file. Each day companies struggle with gigabytes and yottabytes
of organized and binary formats in an environment which progressively becomes a digital room.
Cost cuts and better storage capacity is provided by emerging technology to store critical
information. For each customer, the extract contains information on their age, gender, favourite
cause, number of experiences purchased, total revenue from the customer, and whether they were
selected for the pilot. The company gives people the chance to engage in essential travel
destinations which improve the world as a better place. This was established only five years ago
Itineract travel co providing over 200 experiences for tourism for large number of student.
Therefore a fine balance to maintain a good customer experience on the Itineract website.
Hopefully, it is important to highlight the most appropriate interactions of each client and thus
the less suitable must be rendering to nearly transparent for visitors. This enhances the
complexity of size finding commodity appropriate to need and desire. If a clear information set is
confirmed, Itineract Travel Company must establish and manage a state-of-the-art
recommendation framework and build an internal data science team. Considering the above-
mentioned company vulnerability to numerous political causes, even a recommendation system
would be cautiously prepared and monitored. The multiple decision-making approaches include
a variety of parameters for consideration of choices. Strategic decision-making for the

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
performance of companies is also essential. As a data scientist in digital marketing and analytics
consultancy different decision policies must be followed according to the principles, risk
behaviours and the expectation of future results of the decision-makers (Provost and Fawcett,
2013). The policy-making mechanisms, for example the decision-maker, circumstance in
decisions and problem solving procedures, have particular characteristics. It was found that to
have a similar features, the central functional properties of the customer willingness to travel
specific place. A variety of options or preferences, decision criteria and the collection of
selection methods are three key features of the method.
The work involves data preparation, study of exploratory information and inferential
numerical analysis. Within this report, basic specifics are provided within the annex. This should
end by presenting potential answers to the findings of the study. Yes, it has been observed that
the dataset which is given for the Itineract Travel Company will be beneficial in determining
number of useful results. These key finding are effective enough to make valuable steps which
ensure in reaching the desired outcome. In order to properly understand the need of data science
technique various test have been performed in other part of report such as descriptive analysis,
Corr tabular presentation etc.
Section 2: Overview of investigation
The analysis started by presenting the information in a manner in which exploratory data
(EDA) can be analysed. It included the reorganization of data and any processing of data that we
thought may also induced partiality. As the business growth Itineract Travel Company strategies
focus on bringing the number of tourists to the website and the service provided to thousands of
people, it will become unbelievably complex to align correct interactions with each future
customer, while also time becoming crucial to meet the development targets of the organization.
EDA is "the tool for the standardized representation of all factors through the data visualization.
As a consultancy company, manager have defined trends through EDA which indicated the
manner in which the shift in travelling need of customer had occurred and possible explanations
these customer wants explore (Provost and Fawcett, 2013). The effect was a huge number of
excellent visualizations, demonstrating how the traffic problem evolved overtime. The study then
concentrated about whether the counts obtained matched an established statistical pattern on
which more analyses could be centred. It is also noticed that the results were not normally
distributed and the values were more nearer to the Poisson test. This allowed consultancy
Document Page
company to understand what sort of inferential figures manager would carry out in order to
make suitable decision. Inferential statistics were then done as a bootstrap. It provided for
manager to quantify confidence intervals to decide if statistically significant variations are
known, whether or not these discrepancies were the result of a change, or could have happened
within the customer preferences and the tourist destination. I do this so that they are more
confident of the improvements we have noticed and can therefore depend on them when manager
to make decisions. Eventually, they used these findings to consider possible approaches to the
problems and proposed methods for data science that could be adapted to their execution and
effectiveness. The different data set related to customer experience including age, favourite
reason for the rating for experience, the gender within 1000 observations. Moreover the data set
also includes the id code which was allotted for each individual and group of customer visiting
specific place in that time period as per their desire and requirement (Larson and Chang, 2016).
In addition the entire observation also includes the total revenue generated from specific location
and each customer were selected for the pilot or not. In order to make better results form all the
test the data related to Itineract was first categories into dependent variable and independent
variable so that results can be favourable. Moreover descriptive test and correlation between,
experiences purchased, age, id, total revenue is determine. In addition, correlation test is
performed between pilot, age, total revenue.
Section 3: Analysis and results
In order to perform proper and authentic analysis and determine the suitable results types
of regression models are used which help to define that the customer visiting a particular place
are satisfied or not.
Liner regression analysis is beneficial in determining the suitable values which support in
making proper recommendation regarding
Descriptive Statistics
Mean Std.
Deviation
N
experiences_purchased 1.66 2.037 1000
age 65.97 54.317 1000
id 499.50 288.819 1000
total_revenue 96.20 268.240 1000
Document Page
Correlations
experiences_
purchased
age id total_revenue
Pearson
Correlation
experiences_purchase
d 1.000 .011 -.023 .382
age .011 1.000 .030 -.012
id -.023 .030 1.000 -.012
total_revenue .382 -.012 -.012 1.000
Sig. (1-tailed)
experiences_purchase
d . .368 .229 .000
age .368 . .169 .355
id .229 .169 . .357
total_revenue .000 .355 .357 .
N
experiences_purchase
d 1000 1000 1000 1000
age 1000 1000 1000 1000
id 1000 1000 1000 1000
total_revenue 1000 1000 1000 1000
Model Summaryb
Model R R Square Adjusted R
Square
Std. Error of
the Estimate
1 .383a .146 .144 1.885
a. Predictors: (Constant), total_revenue, id, age
b. Dependent Variable: experiences_purchased
ANOVAa
Model Sum of
Squares
df Mean
Square
F Sig.
1
Regression 607.447 3 202.482 56.975 .000b
Residual 3539.657 996 3.554
Total 4147.104 999
a. Dependent Variable: experiences_purchased
b. Predictors: (Constant), total_revenue, id, age

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
B Std. Error Beta
1
(Constant) 1.415 .140 10.119 .000
age .001 .001 .016 .537 .591
id .000 .000 -.020 -.666 .506
total_revenu
e .003 .000 .382 13.043 .000
a. Dependent Variable: experiences_purchased
Residuals Statisticsa
Minimu
m
Maximu
m
Mean Std.
Deviation
N
Predicted Value 1.29 14.79 1.66 .780 1000
Residual -10.795 17.659 .000 1.882 1000
Std. Predicted
Value -.475 16.839 .000 1.000 1000
Std. Residual -5.726 9.367 .000 .998 1000
a. Dependent Variable: experiences_purchased
Document Page
The analysis above shows, that there is positive relation between these variables as the
significance level form the person correlation was 0.357 which is really lower than p<0.5.
Regression analysis between piolet, age and total revenue
Descriptive Statistics
Mean Std.
Deviation
N
pilot .33 .472 1000
age 65.97 54.317 1000
total_revenu
e 96.20 268.240 1000
Document Page
Correlations
pilot age total_revenue
Pearson
Correlation
pilot 1.000 .040 .080
age .040 1.000 -.012
total_revenu
e .080 -.012 1.000
Sig. (1-tailed)
pilot . .105 .005
age .105 . .355
total_revenu
e .005 .355 .
N
pilot 1000 1000 1000
age 1000 1000 1000
total_revenu
e 1000 1000 1000
Model Summaryb
Model R R Square Adjusted R
Square
Std. Error of
the Estimate
1 .090a .008 .006 .470
a. Predictors: (Constant), total_revenue, age
b. Dependent Variable: pilot
ANOVAa
Model Sum of
Squares
df Mean
Square
F Sig.
1
Regression 1.806 2 .903 4.079 .017b
Residual 220.638 997 .221
Total 222.444 999
a. Dependent Variable: pilot
b. Predictors: (Constant), total_revenue, age
Coefficientsa
Model Unstandardized
Coefficients
Standardized
Coefficients
t Sig.

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
B Std. Error Beta
1
(Constant) .297 .024 12.346 .000
age .000 .000 .041 1.288 .198
total_revenu
e .000 .000 .081 2.565 .010
a. Dependent Variable: pilot
From above data is has been analysed that mean of the variables pilot, age and total
revenues are .33, 65.97 and 96.20 respectively while standard deviation of such variables
are .472, 54.317 and 268.240 respectively. This shows that standard error of pilot is optimum as
compare to other variables. Company will gain most of the revenue from customers belong to
age group 18 to 45.
Classification analysis
Case Processing Summary
Cases
Document Page
Valid Missing Total
N Percent N Percent N Percent
age *
pilot 1000 100.0% 0 0.0% 1000 100.0%
Chi-Square Tests
Value df Asymp. Sig. (2-sided)
Pearson Chi-Square 173.357a 177 .563
Likelihood Ratio 202.130 177 .095
Linear-by-Linear
Association 1.572 1 .210
N of Valid Cases 1000
a. 290 cells (81.5%) have expected count less than 5. The minimum
expected count is .33.
Symmetric Measures
Value Asymp. Std.
Errora
Approx.
Tb
Approx.
Sig.
Interval by
Interval Pearson's R .040 .032 1.254 .210c
Ordinal by
Ordinal
Spearman
Correlation .044 .032 1.401 .161c
N of Valid Cases 1000
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.
Document Page
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
total_revenue *
experiences_purchased 1000 100.0% 0 0.0% 1000 100.0%
Chi-Square Tests
Value df Asymp. Sig.
(2-sided)
Pearson Chi-Square 5655.041a 476 .000
Likelihood Ratio 901.403 476 .000

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Linear-by-Linear
Association 145.720 1 .000
N of Valid Cases 1000
a. 497 cells (95.2%) have expected count less than 5. The
minimum expected count is .00.
Symmetric Measures
Value Asymp. Std.
Errora
Approx.
Tb
Approx.
Sig.
Interval by
Interval Pearson's R .382 .058 13.055 .000c
Ordinal by
Ordinal
Spearman
Correlation .393 .028 13.502 .000c
N of Valid Cases 1000
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
c. Based on normal approximation.
From the above analysis it has been determined that there have been a positive relation
between the independence variable and depended variable. The main reason for the acceptance
of the hypothesis is because of value from regression analysis which states that value of Pearson
correlation is below p<0.05.
Document Page
Section 4: Ethical and security considerations
Ethical and security considerations are factors which determines the overall security and
handling of personal data of clients and organisation. Personal data the term defines as
information related with an identifiable individual. At present time when technologies are change
constantly, government implement rigid polices on certain document by which they can collect
personal information of individuals (Van Der Aalst, 2016). Each traveller provides their personal
details to company for booking of flights and hotels arrangements. This is primary responsibility
of corporation to properly store all the information/data of client either in electric form or in
written.
According to the rules of ethical laws it is not essential to take personal information of an
individual without their consult it can be included in unethical activities. As per the guidelines of
GDPR corporations can only get personal data when it is essential requirement for their security
purpose otherwise they cannot influence person to take their personal information.
Document Page
Apart from that organisations only entitled to get personal data when their project is ethical. Due
to increase rate of cyber crime. To control activities related to hacking theft, and other cyber
crime activities government established council which help people to resolve their issue related
to cyber crime.
Councils of Europe needs to implement effective and strong hardware and software
system to collect and store personal data in a way that provides safety and security of
individuals personal data (Doan, Halevy and Ives, 2012). For this they need to change their cloud
policies and implemented strong and effective cloud software through which they can control
their rate of cyber crime.
Section 5: Data Science in Next Steps and Potential Solutions:
During this phase, a statistics system is explored to collect, clean, collect, verify and
examine data. Information is collected. Descriptive analytical methods generated through simple,
consistent analyses, data views and variables. Predictive modelling shall be completed by
defining a range of strategies and methods, determining the model better fit, analysing model,
and eventually using all stakeholders that take advantage of various machine learning, deeper
processing algorithms through R languages for companies in the manufacturing process of data
analytics project.
Lifecycle Consideration Potential Options
Objectives defined Develop Company's growth plan
Data preparation Counts are summarised within specific
intervals.
Data collection techniques to be used Data mining
EDA EDA is performed in excel applying data
through device network.
The analytical modelling process Applying APIs to assess sales and growth.
Communication and use of results Agreeing on defined objectives with
key stakeholders and growth outcomes with
regional community (Teo, 2012).

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Practical deployment of solutions Customise selling strategies according to
preference of customers.
How to evaluate success Percentage increase in sales.
Definition of failure Percentage decline in sales.
5.1 Assessing sales according to favourite cause:
Assessment of revenue as per favourite cause enables the development of growth plan.
Effective classification of sales based on respective favourite cause like Migrants, Environment-
worker's Rights, Democracy and Gender Justice is necessary for developing a systematic growth
plan as it shows the effect of favourite cause on corporation's sales. Also for better analysis these
can be further classified in age and gender. This help company to assess the average sales on a
specific favourable cause.
Also other independent variables like experiences_purchased and pilot can be considered
for more comprehensive analysis. Each element should be analysed with each variable to assess
percentage change and growth in sales. Company should research about new advertisement
techniques to enhance the existing sales level. Advanced advertisement is key aspect which can
boost performance and this is critical aspect in organisation as it ensures the increment in overall
revenue. Organisation should make expenditure towards the research and development of
advertisement techniques. Also in future adoption of new unique advertisement technology will
provide competitive advantages.
Report Appendix: Statistics and Methodology:
Statistics is a method of mathematical analysis, used for a particular category of empirical
data and specific studies in quantified model, depictions and synopsis. Stats tests techniques for
data collection, interpretation, analysis and drawing results. The some comparative measures are
mean, median, mode and variance. Statistics is a concept used by a researcher to sum up a
method used to describe a data-set. When the analysis is focused on a broader demographic
sample, researcher may mainly establish conclusions about the population depending on
statistical findings of sample. Statistical analysis includes the data collection and measurement
phase and statistical summarization of the results. Methodology represents a formal review of the
techniques used in a research area. It requires analytical review of body of knowledge-based
approaches and concepts.
Document Page
A1. Pre-processing and EDA:
The EDA methodology in analytics offers the study of datasets in order to outline their main
features, often by means of graphical methods. If it is possible or not to use a mathematical
model, specifically EDA should look at what data will offer us beyond structured modelling or
testing task. EDA relates to critical process of preliminary data analysis in order to establish
patterns, detect anomalies, test hypotheses, and track hypotheses using descriptive statistics
including graphic images/representations. Exploratory Data Analysis relates to a collection of
methods that John Tukey had initially established to show data in a manner that reveals
fascinating features. In comparison to traditional methods typically beginning with a supposed
data model, EDA strategies are used to allow data to propose correct modelling. Iternact travel
co. started by processing data in order to render analysis possible, as shown below:
EDA has mainly carried out through pivot tables, summing up data and utilizing average or
cumulative counts (Gitlin, Hayes and Weinstein, 2012). Such charts are plotted, as well as
related charts provided in third section 3.
A2. Statistical Distribution Investigation:
A distribution of probability is statistical method supplying probabilities of different outcomes in
any experiment. Distributions of probability are utilized to describe various forms of random
factors and determine on the basis of such models. Random variables exist in 2 kinds: discrete
and continuous. In conjunction with which group random variable falls in, a statistician could
choose a distinct equation correlated with Random Variable form to measure mean, median,
variances, likelihood or any other statistical formulas. Discrete distribution is being used to
design any discrete random variable as well as to show the probability for a random variables
with end results. In this scenario, total revenue from, the Poisson distribution provides a means to
depict non-uniformity of the result flow while counting random cases. Distribution of Poisson is
discrete function, which means the occurrence may be evaluated in entire numbers as a matter of
fact or not. Fractional events are not included in model. Poisson distribution (applying the
POISSON.DIST feature) is also compared (presuming that mean population is mean sample /
median-sample). Here data is tested from this distribution with counts against their
concentrations.
When median is considered to be pospulation average, and is distorted left once the
sample average is being used, the results follow similar direction to Poisson distribution. One
Document Page
explanation why the forms that vary when utilizing sample means is that these age, gender and
favourite cause do not meet Poisson's requirement that "variables are independent". This due to
that total sales is linked with age, gender and favourite cause.
A3. Bootstrapping
According to central theorem of limits, if bootstrapping doesn't matter, ". distribution of
mean of random sample... is essentially normal...regardless of how population is distributed."
The statistic on right is based on sample which have booted and shows how resampled means are
spread more closely than usual in our results. A confidentiality interval of 95% was established,
and a test was made to verify if total sales are overlapping. For a number of many other self-start
methods, word bootstrapping is now used. This explains the development in sequential and
interrelated phases of complicated software programmes. The phrase "boot up" can be derived by
bootstrapping to begin operating system of device. Hypothesis conclude that variation is mean
among favourite choices and total revenue. Entire data has been split out into favourite choice
types for each customer.
A4. Sampling Error and Bias:
A sampling error arises when analyst take any random sample rather than observing each
individual term which comprises population. Sampling error is statistical error when researcher
is not choosing a sample representing the entire datum population and findings in survey are not
results from the whole population (Brous, Janssen and Vilminko-Heikkinen, 2016). Sampling is
experiment that selects multiple items from the wider populations; both sampling errors and non-
sampling errors that occur in selections. A sampling bias is difference in the value of sample
over the actual population values due to fact that the sample does not constitute or partly
represent population. Even random samples may have sampling error since it is just act as
estimate of population. Sampling error is due to the fact that researchers select various subjects
from same group, but yet individual subjects vary. Please note that if any one take a sample, it's
just a sub-set of entire population, thus the sample will vary widely accordingly. Systematic bias
is most frequent consequence of sampling-error, in which the survey outcomes vary greatly from
that of population as a whole. Logically, if the survey does not reflect population as a whole, it is
most possible that its outcomes will vary from those of entire population. With two similar
studies, equivalent sampling procedures and same population, larger sample study is fewer
sampling error than smaller sample study. As sample size increases, this targets population as a

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
whole and thereby tackles more of the population's features, reducing sampling errors (Evergreen
and Metzner, 2013).
Document Page
REFERENCES
Books and Journals:
Provost, F. and Fawcett, T., 2013. Data Science for Business: What you need to know about data
mining and data-analytic thinking. " O'Reilly Media, Inc.".
Provost, F. and Fawcett, T., 2013. Data science and its relationship to big data and data-driven
decision making. Big data. 1(1). pp. 51-59.
Larson, D. and Chang, V., 2016. A review and future direction of agile, business intelligence,
analytics and data science. International Journal of Information Management. 36(5). pp.
700-710.
Van Der Aalst, W., 2016. Data science in action. In Process mining (pp. 3-23). Springer, Berlin,
Heidelberg.
Doan, A., Halevy, A. and Ives, Z., 2012. Principles of data integration. Elsevier.
Teo, B. K., 2012. EXAFS: basic principles and data analysis (Vol. 9). Springer Science &
Business Media.
Brous, P., Janssen, M. and Vilminko-Heikkinen, R., 2016, September. Coordinating decision-
making in data management activities: a systematic review of data governance
principles. In International Conference on Electronic Government (pp. 115-125).
Springer, Cham.
Evergreen, S. and Metzner, C., 2013. Design principles for data visualization in evaluation. New
Directions for Evaluation. 2013(140). pp. 5-20.
Gitlin, R. D., Hayes, J. F. and Weinstein, S. B., 2012. Data communications principles. Springer
Science & Business Media.
Online
Data science. 2020. [Online] Available Through:
<https://datascience.berkeley.edu/about/what-is-data-science/>
1 out of 21
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]