Statistical Analysis of Movie Downloads from an Internet Site
VerifiedAdded on 2023/06/04
|18
|1986
|65
AI Summary
This report presents the statistical analysis of movie downloads from an internet site. It includes numerical and graphical techniques, 95% confidence interval estimation, hypothesis testing, and regression analysis. The report concludes that the sample size is small and a larger sample size is required for more reliable results.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
FINANCIAL STATISTICS
STUDENT ID:
[Pick the date]
STUDENT ID:
[Pick the date]
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
TABLE OF CONTENT
Executive Summary.........................................................................................................................2
Introduction......................................................................................................................................2
Task 1...............................................................................................................................................2
Task 2...............................................................................................................................................2
Task 3.............................................................................................................................................10
Task 4.............................................................................................................................................11
Task 5.............................................................................................................................................11
Task 6 (Conclusion).......................................................................................................................12
Appendix........................................................................................................................................13
1
STUDENT ID:
STUDENT NAME:
Executive Summary.........................................................................................................................2
Introduction......................................................................................................................................2
Task 1...............................................................................................................................................2
Task 2...............................................................................................................................................2
Task 3.............................................................................................................................................10
Task 4.............................................................................................................................................11
Task 5.............................................................................................................................................11
Task 6 (Conclusion).......................................................................................................................12
Appendix........................................................................................................................................13
1
STUDENT ID:
STUDENT NAME:
Executive Summary
The aim of the report is to highlight the results obtained by the statistical analysis of the data
provided with regards to movie downloads from an internet site. From the population data, a
random sample of 50 consumers has been drawn. Using numerical and graphical techniques, a
summary of the various variables of the sample data has been presented. Further, 95%
confidence interval has been estimated for the purchases of Sci-Fi movie as the first choice and
also the average dollar amount spent. These are then compared with actual population means.
Also, hypothesis test has been used to test the claim whether the average spend on comedy as
first choice exceeds that of drama. This claim is not supported by the sample data. Also, the other
claim regarding purchase being gender dependent also lacked supported from sample data.
Besides, age of customer is not a significant variable influencing the dollar amount spent by the
customers as determined from correlation and regression analysis.
.
Introduction
Data has been provided with regards to the type of movies that have been downloaded during a
given year from an internet website. The population data comprises of 4815 customers but based
on the specific random customers assigned to me, a sample which consists of 50 customers has
been formed. A host of statistical techniques (descriptive & inferential) have been performed on
the sample data with the objective of ascertaining the sample summary and population
characteristics. The objective of this report is to present the various findings based on the
analysis conducted.
Task 1
Based on the random sample of 50 customers allocated to my student ID, the sample data
comprising of the specified customers has been formed using the population data provided and is
illustrated in the attached appendix.
Task 2
The requisite descriptive statistics of the various variables is presented below.
2
STUDENT ID:
STUDENT NAME:
The aim of the report is to highlight the results obtained by the statistical analysis of the data
provided with regards to movie downloads from an internet site. From the population data, a
random sample of 50 consumers has been drawn. Using numerical and graphical techniques, a
summary of the various variables of the sample data has been presented. Further, 95%
confidence interval has been estimated for the purchases of Sci-Fi movie as the first choice and
also the average dollar amount spent. These are then compared with actual population means.
Also, hypothesis test has been used to test the claim whether the average spend on comedy as
first choice exceeds that of drama. This claim is not supported by the sample data. Also, the other
claim regarding purchase being gender dependent also lacked supported from sample data.
Besides, age of customer is not a significant variable influencing the dollar amount spent by the
customers as determined from correlation and regression analysis.
.
Introduction
Data has been provided with regards to the type of movies that have been downloaded during a
given year from an internet website. The population data comprises of 4815 customers but based
on the specific random customers assigned to me, a sample which consists of 50 customers has
been formed. A host of statistical techniques (descriptive & inferential) have been performed on
the sample data with the objective of ascertaining the sample summary and population
characteristics. The objective of this report is to present the various findings based on the
analysis conducted.
Task 1
Based on the random sample of 50 customers allocated to my student ID, the sample data
comprising of the specified customers has been formed using the population data provided and is
illustrated in the attached appendix.
Task 2
The requisite descriptive statistics of the various variables is presented below.
2
STUDENT ID:
STUDENT NAME:
State
The distribution of customers across the six states does not seem even owing to the highest
number of customers from a state being 11 and the lowest being 2. The maximum customers for
the internet website tend to hail from FL and CA while the minimum customers tend to belong to
IN.
City
3
STUDENT ID:
STUDENT NAME:
The distribution of customers across the six states does not seem even owing to the highest
number of customers from a state being 11 and the lowest being 2. The maximum customers for
the internet website tend to hail from FL and CA while the minimum customers tend to belong to
IN.
City
3
STUDENT ID:
STUDENT NAME:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
.
The representation of the city in regards to the customers shows some variation considering the
fact that New Orleans has the maximum representation at 9 customers while Orlando and
Phoenix have just 2 customers each. However, there is no major dependence for the internet
website with regards to a particular city as the customers seem to be well distributed.
Gender
4
STUDENT ID:
STUDENT NAME:
The representation of the city in regards to the customers shows some variation considering the
fact that New Orleans has the maximum representation at 9 customers while Orlando and
Phoenix have just 2 customers each. However, there is no major dependence for the internet
website with regards to a particular city as the customers seem to be well distributed.
Gender
4
STUDENT ID:
STUDENT NAME:
From the above, it is apparent that for the given sample, the female customers tend to be higher
than the males. Assuming that the given sample is representative of the population of 4815
customers, it is apparent that a dominant portion of the customers of the internet website tend to
be female.
First Choice
5
STUDENT ID:
STUDENT NAME:
than the males. Assuming that the given sample is representative of the population of 4815
customers, it is apparent that a dominant portion of the customers of the internet website tend to
be female.
First Choice
5
STUDENT ID:
STUDENT NAME:
From the above, it is apparent that the distribution of first choice genre for customers seems to be
equally divided between action, comedy and Sci-Fi. The only aberration on the lower end if
drama. Assuming the sample to be representative of population, it is apparent that drama seems
to be least popular genre as the first choice.
Second Choice
6
STUDENT ID:
STUDENT NAME:
equally divided between action, comedy and Sci-Fi. The only aberration on the lower end if
drama. Assuming the sample to be representative of population, it is apparent that drama seems
to be least popular genre as the first choice.
Second Choice
6
STUDENT ID:
STUDENT NAME:
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
It is evident that a lot of customers tend to prefer drama as their second choice genre. The next in
line is comedy followed by Sci-Fi. However, the lowest share is commanded by action .
Age
7
STUDENT ID:
STUDENT NAME:
line is comedy followed by Sci-Fi. However, the lowest share is commanded by action .
Age
7
STUDENT ID:
STUDENT NAME:
The age distribution is clearly non-normal as the above histogram is not symmetric and
highlights the presence of skew. The maximum movies downloads are generated either from
very young customers i.e. age group 15-24 or from old age about retirement 55-64 years. This is
not surprising as these two segments have the time to watch movies.
Purchases
8
STUDENT ID:
STUDENT NAME:
highlights the presence of skew. The maximum movies downloads are generated either from
very young customers i.e. age group 15-24 or from old age about retirement 55-64 years. This is
not surprising as these two segments have the time to watch movies.
Purchases
8
STUDENT ID:
STUDENT NAME:
It is apparent that the given variable is not normally distributed which is represented from th
shapw of the histogram and also the fact that central tendency measures do not coincide. Also,
the extent of variation in the data is quite high driven by the coefficient of variance.
Dollar Amount
9
STUDENT ID:
STUDENT NAME:
shapw of the histogram and also the fact that central tendency measures do not coincide. Also,
the extent of variation in the data is quite high driven by the coefficient of variance.
Dollar Amount
9
STUDENT ID:
STUDENT NAME:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
It is apparent that the given variable is not normally distributed which is represented from th
shapw of the histogram and also the fact that central tendency measures do not coincide. Also,
the extent of variation in the data is quite high driven by the coefficient of variance.
Task 3
We are 95% confident that the average purchases for the Sci-Fi movies at the first choice would
lie between 28.65 and 39.49. The corresponding population average in this regards is 25.64.
Considering the fact that the given 95% confidence interval does not contain the actual
population mean, hence it may be concluded that the sample selected in this case does not
present a fair representation of the population of 4815 customers.
We are 95% confident that the average dollar spent by population of 4815 consumers on all types
of movies should lie between $ 147 and $ 188. The corresponding population average in this
regards is $166.71. Considering the fact that the given 95% confidence interval does contain the
actual population mean, hence it may be concluded that the sample selected in this case does
present a fair representation of the population of 4815 customers
10
STUDENT ID:
STUDENT NAME:
shapw of the histogram and also the fact that central tendency measures do not coincide. Also,
the extent of variation in the data is quite high driven by the coefficient of variance.
Task 3
We are 95% confident that the average purchases for the Sci-Fi movies at the first choice would
lie between 28.65 and 39.49. The corresponding population average in this regards is 25.64.
Considering the fact that the given 95% confidence interval does not contain the actual
population mean, hence it may be concluded that the sample selected in this case does not
present a fair representation of the population of 4815 customers.
We are 95% confident that the average dollar spent by population of 4815 consumers on all types
of movies should lie between $ 147 and $ 188. The corresponding population average in this
regards is $166.71. Considering the fact that the given 95% confidence interval does contain the
actual population mean, hence it may be concluded that the sample selected in this case does
present a fair representation of the population of 4815 customers
10
STUDENT ID:
STUDENT NAME:
Task 4
Hypothesis test is a type of inferential statistical technique which is deployed for estimating the
characteristics of the population based on the given sample data. For testing of hypothesis, a null
hypothesis is there coupled with an alternative hypothesis. Then using a suitable test statistics,
the test is performed to determine if the null hypothesis can be rejected or not. The rejection of
null hypothesis leads to acceptance of alternative hypothesis.
The results of the hypothesis test (highlighted in Appendix) do not support the claim that average
money spent in the Comedy first choice is greater than average money spent in the Drama first
choice. Also, hypothesis test (highlighted in Appendix) indicates that there is no significant
difference between the average purchases for the two genders.
Task 5
The scatter plot is illustrated as highlighted below.
The above scatter plot indicates that a weak and positive relationship tends to exist between the
customer age and the dollar amount spent. A confirmation of this can also be obtained from the
correlation coefficient which is computed as 0.2549. The coefficient of determination has come
11
STUDENT ID:
STUDENT NAME:
Hypothesis test is a type of inferential statistical technique which is deployed for estimating the
characteristics of the population based on the given sample data. For testing of hypothesis, a null
hypothesis is there coupled with an alternative hypothesis. Then using a suitable test statistics,
the test is performed to determine if the null hypothesis can be rejected or not. The rejection of
null hypothesis leads to acceptance of alternative hypothesis.
The results of the hypothesis test (highlighted in Appendix) do not support the claim that average
money spent in the Comedy first choice is greater than average money spent in the Drama first
choice. Also, hypothesis test (highlighted in Appendix) indicates that there is no significant
difference between the average purchases for the two genders.
Task 5
The scatter plot is illustrated as highlighted below.
The above scatter plot indicates that a weak and positive relationship tends to exist between the
customer age and the dollar amount spent. A confirmation of this can also be obtained from the
correlation coefficient which is computed as 0.2549. The coefficient of determination has come
11
STUDENT ID:
STUDENT NAME:
out as 0.0650 which implies that the customer age related variation can account for only 6.5% of
the corresponding variation in dollar amount spent by the customers.
As a result, the given regression model cannot account for 93.5% of the changes in the dollar
amount spent and hence these are other factors which tend to drive the same besides the age of
customers. The hypothesis test also supports the above conclusion since the linear relationship
between the two variables is not significant.
Task 6 (Conclusion)
In line with the above computation and discussion, it is apparent that the confidence interval
determined for the dollar amount spent by the customer does tend to contain the actual
population mean in this regards. However, this is not the case with regards to purchases with
regards to Sci Fi as the right choice. Besides, the application of hypothesis test on the sample
data indicates that the claim in regards to average spending being higher in Comedy as first
category when compared to Drama as first category is incorrect. Also, the average purchases are
not dependent on the gender. Further, the dollar spent by customers is not driven by the age of
customers since it is not a significant predictor variable. The key limitation for the given analysis
is the small sample size which is apparent from the confidence interval of population mean not
matching with the actual population mean. As a result, it is essential that a larger sample size
should be chosen as it would lead to sample more representative of population leading to more
reliable results.
12
STUDENT ID:
STUDENT NAME:
the corresponding variation in dollar amount spent by the customers.
As a result, the given regression model cannot account for 93.5% of the changes in the dollar
amount spent and hence these are other factors which tend to drive the same besides the age of
customers. The hypothesis test also supports the above conclusion since the linear relationship
between the two variables is not significant.
Task 6 (Conclusion)
In line with the above computation and discussion, it is apparent that the confidence interval
determined for the dollar amount spent by the customer does tend to contain the actual
population mean in this regards. However, this is not the case with regards to purchases with
regards to Sci Fi as the right choice. Besides, the application of hypothesis test on the sample
data indicates that the claim in regards to average spending being higher in Comedy as first
category when compared to Drama as first category is incorrect. Also, the average purchases are
not dependent on the gender. Further, the dollar spent by customers is not driven by the age of
customers since it is not a significant predictor variable. The key limitation for the given analysis
is the small sample size which is apparent from the confidence interval of population mean not
matching with the actual population mean. As a result, it is essential that a larger sample size
should be chosen as it would lead to sample more representative of population leading to more
reliable results.
12
STUDENT ID:
STUDENT NAME:
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Appendix
Task 1
The requisite sample data from the population is extracted and is shown below.
13
STUDENT ID:
STUDENT NAME:
Task 1
The requisite sample data from the population is extracted and is shown below.
13
STUDENT ID:
STUDENT NAME:
Task 3
The relevant excel output for the computation of 95% confidence interval is highlighted below.
14
STUDENT ID:
STUDENT NAME:
The relevant excel output for the computation of 95% confidence interval is highlighted below.
14
STUDENT ID:
STUDENT NAME:
a) 95% confidence interval for average number of purchases for first choice as SciFi Movies
b) 95% confidence interval for average dollar amount spent for all type of movies
Task 4
Hypothesis testing
1) Claim: Average amount spent for comedy (first choice) is higher than average amount spent
for drama (first choice).
H0: μComedy = μDrama
Ha: μComedy > μDrama
The relevant excel output for hypothesis testing is highlighted below.
15
STUDENT ID:
STUDENT NAME:
b) 95% confidence interval for average dollar amount spent for all type of movies
Task 4
Hypothesis testing
1) Claim: Average amount spent for comedy (first choice) is higher than average amount spent
for drama (first choice).
H0: μComedy = μDrama
Ha: μComedy > μDrama
The relevant excel output for hypothesis testing is highlighted below.
15
STUDENT ID:
STUDENT NAME:
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Level of significance = 0.05
The one tailed p value comes out to be 0.4043 which is more than level of significant. It shows
that insufficient evidence is present to reject the null hypothesis. Therefore, it cannot be said that
average amount spent for comedy (first choice) is higher than average amount spent for drama
(first choice).
2) Claim: average purchases differs for males and females
H0: μMale = μFemale
H1: μMale ≠ μFemale
The relevant excel output for hypothesis testing is highlighted below.
16
STUDENT ID:
STUDENT NAME:
The one tailed p value comes out to be 0.4043 which is more than level of significant. It shows
that insufficient evidence is present to reject the null hypothesis. Therefore, it cannot be said that
average amount spent for comedy (first choice) is higher than average amount spent for drama
(first choice).
2) Claim: average purchases differs for males and females
H0: μMale = μFemale
H1: μMale ≠ μFemale
The relevant excel output for hypothesis testing is highlighted below.
16
STUDENT ID:
STUDENT NAME:
Level of significance = 0.05
The two tailed p value comes out to be 0.0853 which is more than level of significant. It shows
that insufficient evidence is present to reject the null hypothesis. Therefore, it cannot be said that
average purchase differs for males and females.
Task 5
Regression model
1) Coefficient of Correlation (R) = 0.2549
Coefficient of Determination = R2 = (0.25492 = 0.0650
2) Hypothesis testing
H0: β= 0 i.e. slope can be assumed as zero
H1: β≠0 i.e. slope cannot be assumed as zero
Based on the ANOVA table, it can be seen that significance F comes out to be 0.0740 which is
higher than level of significance (0.05). It shows that insufficient evidence is present to reject the
null hypothesis. Therefore, the slope can be assumed to be zero and hence, no linear relationship
exists between age and dollar amount spent by customer.
17
STUDENT ID:
STUDENT NAME:
The two tailed p value comes out to be 0.0853 which is more than level of significant. It shows
that insufficient evidence is present to reject the null hypothesis. Therefore, it cannot be said that
average purchase differs for males and females.
Task 5
Regression model
1) Coefficient of Correlation (R) = 0.2549
Coefficient of Determination = R2 = (0.25492 = 0.0650
2) Hypothesis testing
H0: β= 0 i.e. slope can be assumed as zero
H1: β≠0 i.e. slope cannot be assumed as zero
Based on the ANOVA table, it can be seen that significance F comes out to be 0.0740 which is
higher than level of significance (0.05). It shows that insufficient evidence is present to reject the
null hypothesis. Therefore, the slope can be assumed to be zero and hence, no linear relationship
exists between age and dollar amount spent by customer.
17
STUDENT ID:
STUDENT NAME:
1 out of 18
Related Documents
Your All-in-One AI-Powered Toolkit for Academic Success.
+13062052269
info@desklib.com
Available 24*7 on WhatsApp / Email
Unlock your academic potential
© 2024 | Zucol Services PVT LTD | All rights reserved.