Parametric Statistics: Descriptive Tools for Sun Coast Data
Verified
Added on 2022/11/03
|9
|1364
|341
AI Summary
This article describes the Sun Coast data using descriptive tools of parametric statistics. The data is analyzed using excel tool pack and appropriate interpretations are made. The article covers correlation, simple linear regression, multiple linear regression, independent t-test, paired sample t-test, and one-way ANOVA test.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
MBA 5652-unit IV scholarly activity Student Name Institution Name 1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Introduction Parametric statistics is a branch of data analysis that takes into account the assumption that a sample data is obtained from a population that can significantly be modeled using a probability distribution with a fixed set of parameters. In this task the objective is to describe the Sun Coast data using descriptive tools. For data to be summarized using the parametric tools it is assumed to adhere to a number of assumptions(Murphy, 2012). Some of the assumptions are listed below; The sample was obtained randomly and are independent of each other The data is normally distributed The sample data contains no outliers The data have a homogeneous variance The sample size is large enough to warrant a parametric test The data will be described using the excel tool pack. For each of the excel tab an appropriate tool will be used to describe the observations and an appropriate interpretation made. Correlation Under this description, the interest is to analyze the association between two sets of data (Mahdavi, 2013 (). Inthis case will apply the excel tool pack to study the relationship between microns and the mean annual sick days per employee. The result is summarized by the tale below Correlation micronsmean annual sick days per employee microns1 mean annual sick days per employee-0.7159841851 From the table the value of the correlation is -0.71598. This is interpreted as a strong negative correlation, an increase in the number of microns will have a negative impact on the mean annual sick days per employee. Simple linear regression 2
Simple linear regression is a model that is used to predict the values of the dependent variable given the independent variable. In this study the independent variable is the lost time hours while the dependent variable is the safety training expenditure(Rencher & Christensen, 2012). The studyobjective is to evaluate how the changes in the lost time hours do impact the safety training expenditure. Using the excel tool pack the model developed is as described in the table below. SUMMARY OUTPUT Regression Statistics Multiple R0.939559324 R Square0.882771723 Adjusted R Square0.882241279 Standard Error161.302987 Observations223 ANOVA dfSSMSFSignificance F Regression143300521.4343300521.431664.2106877.6586E-105 Residual2215750122.45126018.65362 Total22249050643.88 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept1753.60213330.3629622357.754645942.5647E-1351693.7641351813.4401321693.7641351813.440132 lost time hours-6.1573943650.150935993-40.794738487.6586E-105-6.45485242-5.85993631-6.45485242-5.85993631 According to the developed model the two variables are described by the relationship y=1753.60−6.1574x Where y is the safety training expenditure and x the lost time hours. The value of the multiple R is given as 0.9396, This is interpreted as a strong positive association. A change in the values of lost time house do increase the values of the safety training expenditure. In addition, the R square value of 0.8828 means that 88.28% of all the changes in the values of y are as a result of the changes in the values of x. looking at the predictive equation given above the intercept of 1753.60 shows that in case there is no time lost, the safety expenditure will be 1753.60. The relevance of the model is described by the F statistics, a value less than 0.05 as in the model proves that at a 95% level of significance, the model is appropriate in estimating the safety training expenditure. Multiple linear regression The model summary is highlighted in the table below 3
SUMMARY OUTPUT Regression Statistics Multiple R0.583706496 R Square0.340713274 Adjusted R Square0.338511248 Standard Error2564.049485 Observations1503 ANOVA dfSSMSFSignificance F Regression550861519141017230383154.72714711.1645E-132 Residual149798418015966574349.763 Total150214927953510 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept32243.941721307.24084524.665647385.2672E-11329679.7235334808.159929679.7235334808.1599 Angle in Degrees-86.4596200717.1989247-5.0270363735.58105E-07-120.1961696-52.72307055-120.1961696-52.72307055 Chord Length-741.55591911361.861656-0.5445163360.586167308-3412.9155541929.803715-3412.9155541929.803715 Velocity (Meters per Second)42.060937514.2998938999.7818547396.02337E-2233.6264809450.4953940833.6264809450.49539408 Displacement-65093.432458026.089764-8.110229811.0415E-15-80837.00825-49349.85665-80837.00825-49349.85665 Decibel-241.109719210.26502865-23.488460424.0652E-104-261.2450855-220.974353-261.2450855-220.974353 Looking at the model in general, the value of the R square is observed as 0.3407. This shows that the independent variables that is angle in degrees, chord length, velocity, displacement and decibel explain 34.71% of the variations in frequency(Warne, 2011). The model is significancein general as the value of the F statistics is way below 5% or 1%. Hence at either 90% or 95% level of significance the model can be concluded to be statistically significance. Looking at the p values for the individual independent variables, angle, velocity, displacement and decibel are significant for the estimation of the frequency while chord length is statistically insignificant. An increase in the angle, displacement and decibel leads to reduction in the frequency while increase in velocity results in an increase in the value of the frequency. Independent t Test T test is carried out to test if the mean of two groups of data is similar. In this case an independent t Test will be carried out since the two sets of data are from two different observations. The hypothesis being tested is; H0:μ0=μ1 Vs H1:μ0≠μ1 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Whereμ0is the mean of the group A andμ1the mean of group B. The results are obtained as below t-Test: Two-Sample Assuming Unequal Variances Group A Prior Training ScoresGroup B Revised Training Scores Mean69.7903225884.77419355 Variance122.00449526.96456901 Observations6262 Hypothesized Mean Difference0 df87 t Stat-9.666557191 P(T<=t) one-tail9.69914E-16 t Critical one-tail1.662557349 P(T<=t) two-tail1.93983E-15 t Critical two-tail1.987608282 From the results of the t test is observed that the p value from the two tailed test is be lower than 0.05, we therefore reject the null hypothesis and conclude that the mean of the two sets of data are not equal. The revised training scores do have a higher mean compared to the prior training scores(Derrick, et al., 2017). Using thestatistical findings, it is therefore realized that the training has an impact on the scores. The results after training do have a higher mean compared to those observed prior to undertaking training sessions. Paired sample t test It is also referred to as dependent sample t test, the test is a parametric test that is applied to evaluate whether two data sets have equal means. In this type of test the observations are paired that is to say an observation was measured twice 5
t-Test: Paired Two Sample for Means Pre-Exposure μg/dLPost-Exposure μg/dL Mean32.8571428633.28571429 Variance150.4583333155.5 Observations4949 Pearson Correlation0.992236043 Hypothesized Mean Difference0 df48 t Stat-1.929802563 P(T<=t) one-tail0.029776357 t Critical one-tail1.677224196 P(T<=t) two-tail0.059552714 t Critical two-tail2.010634758 The table above gives a summary of the test results from the excel tool pack. The hypothesis tested is; H0:μ0=μ1 Vs H1:μ0≠μ1 Looking at the p value for the two tailed tests, its observed to be higher than 0.05, hence at 95% level of significance we fail to reject the null hypothesis and conclude that the two data sets have an equal mean statistically. The exposure in this case has no statistical impact and the two observations are no difference from each other. One-way ANOVA test ANOVA test is also a parametric test, it is done to compare the means of two or more independent groups to examine if there is statistical evidence to prove that the associated population means have a significance difference. Using excel tool pack the results are summarized in the table below Anova: Single Factor 6
SUMMARY Groups Cou nt Su mAverageVariance A = Air201788.9 9.3578947 37 B = Soil201829.1 3.0421052 63 C = Water201407 6.6315789 47 D = Training201085.4 1.4105263 16 ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups 182. 83 60.933333 33 11.923103 33 1.75888E -06 2.7249439 2 Within Groups 388. 476 5.1105263 16 Total 571. 279 The hypothesis being tested is H0:μ1=μ2=μ3=μ4 Vs H1:thereisatleastoneinequality 7
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
The p value of the test is way below 0.05 hence we reject the null hypothesis and conclude that the mean of the four variables are not all equal. By observation water and training have relatively lower average values compared to the means of air and soil. 8
References Derrick, B., Toher, D. & White, P., 2017. How to compare the means of two samples that include paired observations and independent observations: A companion to Derrick, Russ, Toher and White (2017).The Quantitative Methods for Pschology,13(2), p. 120–126. . Mahdavi, D. B., 2013 (. The Non-Misleading Value of Inferred Correlation: An Introduction to the Cointelation Model.Wilmott Magazine,2013 (67), p. 50–61. Murphy, K., 2012.Machine Learning: A probabilistic perspective,s.l.: MIT Press. Rencher, A. C. & Christensen, W. F., 2012.Methods of Multivariate Analysis.3rd ed. s.l.:John Wiley & Sons. Warne, R. T., 2011. Beyond multiple regression: Using commonality analysis to better understand R2 results.Gifted Child Quarterly,55(4), p. 313–318. 9