This document provides an overview of descriptive statistics and regression analysis. It includes tables and figures to illustrate the concepts and provides explanations of the results. The document also includes references for further reading.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
STATISTICS Question One Table1: Descriptive Statistics Table Descriptive Statistics ScoreHotel StarsMember YearsNo. of Reviews Mean4.1230164.1428574.3511948.13095 Median44423.5 Std. Dev1.0073020.7744872.9322574.99643 Minimum1301 Maximum5513775 Q143.5212 Q355654.25 IQR11.5442.25 No. of Outliers0000 Question Two From the descriptive data analysis results presented inTable 1: Descriptive Statistics Table, all the four variables; Score, Hotel Stars, Member Years and No. of Reviews have not outlier values in their observations. The average values for the variables are as follows: 4.123016 for the Score, 4.142857 for the Hotel Stars, 4.35119 for the Member Years and 48.13095 for the No. of Reviews. 1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
STATISTICS Question Three 050100150200250300 0 50 100 150 200 250 300 350 400 No. Hotel Reviews Against No. of Helpful Votes No. Hotel Reviews No. of Helpful Votes Figure1: Scatterplot of No. Hotel Reviews against No. of Helpful Votes Table2: Correlation Coefficient for No. Hotel Reviews and No. of Hopeful Votes Correlation Coefficient0.7643222 3 From the plot inFigure 1: Scatterplot of No. Hotel Reviews against No. of Helpful Votes, the data points appear to generally follow a positive diagonal trend. This implies that the relationship between the No Hotels Reviews and No. of Helpful Reviews is a positive relationship. This is also evident from the value of the correlation coefficient = 0.76432223 fromTable 2: Correlation Coefficient for No. Hotel Reviews and No. of Hopeful Votes. If the value of the correlation coefficient is positive, the relationship of interest is also positive in nature(Barbara & Susan, 2014; Everitt & Skrondal, 2010). 2
STATISTICS The relationship can however not be described as linear, since from the plot the data points do not follow a linear trend. Despite this, the relationship can be described as relatively strong with the value of the correlation coefficient being significantly high at 0.76432223. Question Four Hypothesis: H0:μc≤μnc H1:μc>μnc Where μcis the mean score for hotels with casinos while μncis the mean score for the hotels without casinos. The t-test independent two sample test compares the means of two different categories in relation to another variable(Howitt & Cramer, 2010; Norman, 2010).results from the t-test are inTable 3: T-test Two Sample Outputbelow: Table3: T-test Two Sample Output 3
STATISTICS The test-statistic, fromTable 3: T-test Two Sample Outputis equal to -4.21145 and the p-value (one tailed) = 3.01E-05. Considering the level of significance = 0.05, then the p-value = 3.01E- 05 < 0.05, we thus reject the null hypothesis H0and conclude that the average score of the hotels with casinos is significantly higher than that of hotels without casinos. Question Five The regression analysis provides statistical information on the relationship between variables of interest(Cortes & Mohri, 2014; Tri & Jugal, 2015). The results from the regression analysis are below: Table4: Regression Summary Output 4
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
STATISTICS The regression equation is (2dp): Score=1.53−7.70(E−03)No.Reviews+2.00(E−02)No.HotelReviews+5.20(E−05)HelpfulVotes+0.36HotelSt The p-value for the regression = 5.2E-08 < 0.05 level of significance. Thus at 0.05 level of significance the regression is significant. The p-values of the Hotel Stars, No. of Rooms, Travel Type and Casino variables are less than 0.05 level of significance, hence they are significant for this case. A unit change in the Number of Rooms variable results in a -0.13E-03 change in the Score variable while a unit change in the Casino variable results in a 0.66 change in the Score variable. The value of the R2= 0.1104 (4dp), this implies that the regression model explains up to 11.04% of the relationship between the variables. Question Six Logistic regression is a form of regression whereby the main variable of interest to a researcher (dependent variable) is measured on ordinal or nominal scales and therefore categorical (Hosmer, 2013; Jorge, et al., 2013). The results of the logistic regression yielded the following results for the parameters: 5
STATISTICS Table5: Logistic Regression Parameters The coefficients of No. of Rooms and Casino are b5 and b12 respectively. Observing the respective Odds Ratio (OR), then, a unit change in the Casino variable results in a 167% change in the Score_b variable while a unit change in the No. of Rooms variable result in a 100% change in the Score_b variable. 6
STATISTICS References Barbara, I & Susan, D 2014,Introductory Statistics,1st edn, OpenStax CNX, New York. Cortes, C & Mohri, M 2014, 'Domain Adaptation and Sample Bias Correction Theory and Algorithm for Regression',Theoretical Computer Science ,vol.5, no.7,pp. 103-126. Everitt, BS & Skrondal, A 2010,Cambridge Dictionary of Statistics,4th edn, Cambridge University Press, London. Hosmer, D 2013,Applied Logistic Regression,1 edn, Wiley, Hoboken, New Jersey. Howitt, D & Cramer, D 2010,Introduction to Descriptive Statistics in Psycology,5th edn, Prentice Hall, New York. Jorge, AA, Angela, A & Edson, ZM 2013, 'Robust Linear Regression Models: Use of Stable Distribution for the Response Data',Open Journal of Statistics,Vol.3, no.1, pp. 3-5. Norman, G 2010, 'Likert Scales, Levels of Measurement and the Laws of Statistics',Advances in Health Science Education ,vol.15. no.5, pp. 625-632. Tri, D & Jugal, K 2015,Select Machine Learning Algorithms Using Regression Models,s.l.: 2015 IEEE Conference. 7