Statistical Analysis of UK Netflix Viewers: Descriptive and Inferential Statistics
Verified
Added on 2023/06/12
|10
|1411
|89
AI Summary
This coursework provides a statistical analysis of UK Netflix viewers using descriptive and inferential statistics. It includes mean, range, standard deviation, coefficient of variation, median, correlation matrix, regression equation, network diagram, critical path, and challenges faced during the analysis.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
INDIVIDUAL COURSEWORK
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
TASK 1 1. a) Mean = Sum / Number of observation = 4702 / 115 = 40.8 Interpretation: The above mean value indicates the average and most common age of UK Netflix viewers in the collection of data. b) Range = Maximum – Minimum 65 = 80 – Minimum Minimum = 80 – 65 = 15 Interpretation: The above value indicates that minimum of 15 age of subscribers are UK Netflix viewers. c) Standard deviation = √ variance = √163.4 = 12.8 Interpretation: This indicate 12.8 dispersion of dataset in relative to its means and average. d) Coefficient of variation = Standard deviation / mean = 12.8 / 40.8 = 0.3 Interpretation: The above value indicates lower coefficient of variation which further means that there is lower level of dispersion around the mean. The coefficient of variation below 5% sounds good (Clifto and Clifton, 2019).
2. After comparing the mean and median value of ungrouped frequency distribution, it is analysed and identified that the average number of UK Netflix viewers is 40.8 while the median value of UK Netflix viewers is 43 (Calle, 2019). As in the given case, the mean is less than the median so the distribution of data is skewed to the left. TASK 2 1. a) Table 1 Age CategoryFrequency % Frequencies Cumulativ e frequencies % Cumulativ e frequencies Under 251513%1513% 25 to 342522%4035% 35 to 443026%7061% 45 to 542522%9583% 55 and over2017%115100% Total Count115 b) Table 2 Age Category Frequency (f) midpoint (x)fx (x- mean) (x- mean)2 f(x- mean)2 Under 251512.5187.5-43.9131928.35528925.33 25 to 3425421050-14.413207.73585193.396 35 to 44305717100.5869570.34451810.33554 45 to 542572180015.58696242.95326073.83 55 and over2087174030.58696935.561918711.24 Total Count1156487.558914.13 Mean (6487.5 / 115)56.4 Variance (58914.13/ 115-1)58800.1304 Standard deviation (√58800.1304)242.487382
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
2. The mean and standard deviation calculated from raw data i.e., ungrouped data is more accurate as compared to grouped data. It is because mean and standard deviation from grouped data is computed based on assumption that all the observation within a class interval have a common value equal to the mid-point of class interval. This includes some error. 3. Position of median (Q2) = Sum of frequency / 2 = 115 / 2 = 57.5 value Q2 – 35 / 57.5 – 40 = 44 – 35 / 70 – 40 Q2 = 10 / 30 * 17.5 + 35 = .33 * 17.5 + 35 = 5.775 + 35 = 40.775 So, estimation of median as per linear interpolation is 40.8 4. Cumulative frequency curve
TASK 3 1. Job Separation* (Probability )Age Gross Hourly Pay (£) Commuting Time (Minutes) Average Weekly Hoursof Work Gender (1= Male; 2 = Female) JobSeparation* (Probability)1 Age - 0.8509335081 GrossHourly Pay (£) - 0.7804254440.5261562571 Commuting Time (Minutes)0.862105058 - 0.782546734 - 0.6824701541 AverageWeekly Hours of Work - 0.180338317 - 0.0295514260.3774666620.0173148341 Gender(1= Male;2= Female)0.2259669980.007549337 - 0.4197573530.004531325 - 0.9649485441 Interpretation: On the basis of above correlation matrix, the variable with highest correlation is consider as best predictor. After analysing the above table, it is identified that the variable with highest correlation is commuting time (minutes) and Job separation (probability) with the value of 0.8621 (Sun, Zhu and Zhou, 2020). Thus, the commuting time is a variable which is best predictor as the relation between this two variable is highest amount the all other variable correlation.
2. 3. The correlation coefficient value of variable job separation and commuting time is 0.8621 which is highest among the all. The positive and highest correlation coefficient between this two variable indicate that there is a strong relation between the commuting time and job separation variable. While on the other hand, the coefficient of determination between the two variable is 0.7432 which indicate that if the commuting time of workers increases or decreases, the value of job separation variable will also increase or decreases (Lachin, 2020). It is because of the positive coefficient of determination between the two variable whose correlation are highest among the all. 4. The equation of regression: Y = a + bX Where, Y is dependent variable X is independent variable b is the slope of the line and a is the y intercept
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The regression equation of current variables is Y = 75.097x + 1.356 which state that there is a positive relation between the dependent and independent variable. It means, with the change in the commuting time variable (i.e., increase in the commuting time), the dependent variable such as job separation will also change or increase (Marees and et.al., 2018). TASK 4 1. Network diagram 2. Path A = 1 + 2 + 4 + 6 Duration = 7 + 4 + 6 + 4 = 21 weeks Path B = 1 + 3 + 5 + 6 = Duration = 7 + 9 + 3 + 4 = 23 weeks Hence, the path with higher duration is known as critical path. Thus, in the given case the critical path is Path B with duration 23 weeks. 3. The difference between critical and non-critical activities is such that critical activities start and finish time is strictly defined while start time of non-critical activities are freely deciding.
TASK 5 1. I am able to perfectly create network diagram and identify critical path in this project. I also do well in computing missing value of ungrouped descriptive statistics. 2. I faced challenge in computing the mean and standard deviation of the grouped descriptive statistics. It is because of my lack of attention during the learning session. 3. When I will get the same project in future, I will use the excel function to make sure that my end result of the data is right or wrong.
REFERENCES Books and journals Calle, M. L., 2019. Statistical analysis of metagenomics data.Genomics & informatics.17(1). Marees, A. T. and et.al., 2018. A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis.International journal of methods in psychiatric research.27(2). p.e1608. Sun, S., Zhu, J. and Zhou, X., 2020. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies.Nature methods.17(2). pp.193-200. Lachin, J. M., 2020. Nonparametric statistical analysis.Jama.323(20). pp.2080-2081. Clifton,L.andClifton,D.A.,2019.Thecorrelationbetweenbaselinescoreandpost- intervention score, and its implications for statistical analysis.Trials.20(1). pp.1-6. 1