Data Analysis Project

Verified

Added on  2022/11/26

|11
|829
|378
AI Summary
This document is a data analysis project that covers various topics such as probability, mean, standard deviation, correlation coefficient, regression model, hypothesis test, and more. It provides solutions to different problems and includes numerical summaries, box plots, histograms, scatter diagrams, and regression outputs. The aim is to analyze and interpret data related to stock prices and their relationships. The document also discusses the normality distribution of stock returns and the prediction of future stock prices.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
DATA ANALYSIS PROJECT
STUDENT ID:
[Pick the date]
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Problem 1
(a) Probability of success (p) = 0.15 and n = 150
Mean=np=(150)(0.15)=22.50
Variance=np( 1 p ) =(150)(0.15)(10.15)=19.13
The discrete probability and cumulative probability of the Binomial Distribution is shown in
spreadsheet.
(b) Probability distribution Curve
2
Document Page
It can be said based on above that convergence is present and can be seen towards 1 for
cumulative probability.
Problem 2
Mean μ=50
Standard deviation σ =3.5
Xranges from 40 to 80 with a tick of 0.1.
Density Probability Plot of the Normal Distribution is shown below.
3
Document Page
Problem 4
The two stocks which have been selected are NFLX and Amazon. These two companies
have seen stupendous growth in the last decade or so. A common feature that links these two
companies is that their growth is based on widespread penetration of internet. Also, there
recent growth and future growth is dependent on international markets since the US market is
quite saturated for both these players.
Numerical summary.
4
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
(a) Box plot of stocks and lowest stock prices and highest stock prices
Amazon NFLX
Lowest Stock Price 429.23 89.15
Highest Stock Price 2012.71 391.43
5
Document Page
(b) Histogram for each stock
Scatter diagram for each stock
6
Document Page
c) The key difference in the two stock prices is reflected from the box plot where it is
apparent that Amazon stock is significantly priced higher in comparison to Netflix. As a
7
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
result, the mean price and standard deviation for Amazon stock price exceeds that of Netflix
stock price. However, it is noteworthy that during the recent times, both stocks have shown a
stellar run owing to which the distribution of price for both the stocks is skewed The extent
of skew is higher for Amazon stock is comparison with Netflix stock.
Problem 5
(a) The correlation coefficient of the two stocks has been determined through CORREL
function in excel.
Clearly the correlation coefficient is significant which highlights that there is a very strong
positive linear relationship between the two stock price movements.
(b) Regression Model
AMZON ($) = 184.737 + (4.596 * NFLXstock price)
(c) The slope coefficient is +4.60. It refers that when there is $1 increase in NFLX stock
price, then the corresponding Amazon stock price will also increase by $4.60. The
positive sign represents that the direction of change in the variable is same.
8
Document Page
(d) In the given model, the Amazon stock price is the dependent variable and the best price of
this variable can be estimated by putting the median value of Netflix stock in the
regression equation. The median value has been preferred over mean as the prices are
skewed and hence mean can be distorted.
Problem 6
Hypothesis Test
Null and alternative hypotheses
Null hypothesis H0 σNFLX 2 σAMA ZON2 = 0
Alternative hypothesis Ha σNFLX 2 σAMAZON 2 0
F-Test Two-Sample for Variances
NFLX Amazon
Mean 201.544 1110.943
Variance 11096.095 246411.430
Observations 50 50
df 49 49
F 0.045
P(F<=f) one-tail 0
F Critical one-tail 0.622
P value = 0.00
Significance level = 0.05
It is apparent from above that the p value (0.00) is lesser than the significance level.
Therefore, null hypothesis will be rejected and alternative hypothesis will be accepted.
Hence, the conclusion can be drawn that statistically significant difference is present in the
variance of NFLX stock prices and Amazon stock prices.
Problem 7
9
Document Page
The aim is to find whether the selected two stocks are categorised as normally distributed or
not. In this regard, the returns of the stock along with the z score have been determined which
then used to find the JB test to check the normality distribution of stocks.
Null hypothesis s Ho: The returns are termed as normally distributed.
Alternative hypothesis Ha: The returns are not termed as normally distributed.
JB statistic
JB statistic (For NFLX returns) = 11.44
JB statistic (For Amazon returns) = 9.08
Both the above test statistics tend to exceed the critical value of JB statistic at 5%
significance level.
Hence, reject null hypothesis and accept alternative hypothesis. The distribution of stock
prices is not normal in distribution.
Problem 8
For the prediction of future stock prices in the context of Netflix stock, the following
regression model has been run so as to analyse if the past and present prices tend to provide
reliable estimation of future prices or not.
10
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
The above regression output clearly highlights that the slope associated with P(t) is significant as the
p value is 0 and hence does not exceed the assumed level of significance. The significance of the
model can also be established from the significance F in the ANOVA table which is 0 and thereby
hints at atleast one slope coefficient being significant. The other slope coefficients pertaining to P(t-
1) and P(t-2) are not significant as their p values are higher than significance level. Further, the R 2
value is almost 1 which also highlights the high predictive capability of the model and the fact that
the model is a good fit.
11
chevron_up_icon
1 out of 11
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]