Comprehensive Data Analysis: Social Media's Impact on Society

Verified

Added on  2021/06/17

|6
|1832
|44
Project
AI Summary
This mini-research project investigates the influence of social media on society by analyzing data collected from 3000 individuals through questionnaires. The study examines the time spent on social media apps, the frequency of app visits, and the number of posts made. The data includes variables like age, gender, and preferred social media platforms (Facebook, Twitter, Snapchat, Pinterest, Instagram, and others). Descriptive statistics, including frequency distributions and histograms, are used to summarize the data. Inferential tests, such as Kolmogorov-Smirnov and Shapiro-Wilk tests, reveal that the numerical variables are not normally distributed. Chi-square tests are used to find a significant relationship between gender and the most used social media application. Regression analysis is performed to explore relationships between numerical variables, and non-parametric tests are used to compare median values among genders and social media app groups. The study finds interesting facts, suggesting the need for further analysis using non-parametric and non-homogeneous tests, and the potential for data transformation and outlier treatment.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Influence of Social Media on society
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Introduction:
Whatever the information maybe, these days, it is travelling faster than light: social media happening
thing now. News, updates, pictures, entertainment and what not, social media gives you everything. It
took every other industry like a wave. It can created billionaires. This is one side of the coin.
On the other side, there are the users. Social media applications like
Facebook, Instagram, Snapchat and others are the most common ones we find in any smart phone these
days. We spend hours on them, feeding on the prioritized information. In a way it has become a habit
rather than a pass time. It seems to be an addiction from an angle of perception, but it is that the modes
or channels of passing on the information took a great shift onto social media and it is being used as a
vehicle which is faster than any other. We cannot deny the argument of addiction, but there is a greater
good happening parallel.
At any time of the day, we login, logout, and repeat whenever we are
notified by the app. The user’s information is exploited by the social media firms most of the times
which makes it look like a parasitic relationship: between the user and the firm, but it is a mutually
beneficial one. Social media analysis with reference to statistical data can uncover this answer to us.
In this mini-research project we restrict ourself to study how much
time people spend on the social media apps, number of times one opens the app and the number of
posts.
Data Collection:
A small questionnaire is prepared where we ask the individual about the
i) Age
ii) Gender
iii) Most used social media application(Media app)
iv) How many minutes they spend on that particular social media app per day
v) The number of times they visit the app per day
vi) The number of posts they make per week on the app.
The data is collected by me in person in the crowded areas. The individuals are chosen randomly. It
takes nearly 2-4 minutes to collect one sample. A total 3000 samples are collected and out of which
2979 are considered for our research because of many missing values in the removed 21 samples.
Data Variables:
i) Age is a Numerical variable.
ii) The variable of gender is a categorical variable (nominal).
iii) Social media application names is categorical variable (nominal).
Document Page
Here we categorize the social media applications as Facebook, Twitter, Snapchat, Pin
interest, Instagram and others.
iv) Time spent is a Numerical variable.
v) Number of visits is a Numerical variable.
vi) Number of posts is again a Numerical variable.
Descriptive Statistics:
1) Age:
In Table 1, we presented summary of the data corresponding to Age. We can see lot of variation
in the data. But most of the sample falls in 18 years to 40years of individuals. The frequency
distribution is presented in Table2 and the histogram in Figure1.The data does not look to be
normal from the histogram
2) Gender:
In Table 4, we presented summary of the data corresponding to Gender(categorical variable).
Only 1.2 percent of data comes from other category and 59.3 percent of individuals are females.
Figure 3 shows the bar graph distribution for the Gender.
3) Social media application:
In Table 5, we presented summary of the most used social media application. 36.1 percent of
the individuals are using Facebook and only 2.6 percent are using pin interest mostly in a day.
Figure 4 presents the bar graph for the social media application distribution of the collected
sample.
4) Time spent on the favorite application:
In Table 6, we presented summary of the time spent on the favorite application variable
(numerical). It is clear from the table that there is so much variation in the time spent on
favorite application. Some people spend nearly 1000 minutes per day on the application which
looks strange. In Table 7, we have frequency distribution and Figure 5 we have the histogram for
the time spent on the favorite application (in minutes). The data does not looks to be normal
and in fact it looks like positively skewed. Its skewness value is more than 3.
5) No of visits:
In Table 9, we presented the summary of the number of times the individual visits their favorite
app (per day). Apart from large variation in data, the skewness value is again greater than 3,
deviating from normality. Table 10 shows the frequency distribution and Figure 7 presents the
histogram of the numerical data which clearly shows, the data does not follow normality.
6) No of posts:
Document Page
In Table 12, we presented the summary of the number of posts made by individual on their
favorite app (per week). A part from large variation, it has skewness value as 6.37 which means
the data deviates from normality. In Table 13 and Figure 9 we have frequency distribution and
pictorial representation(histogram).
Inferential Tests:
The first thing I am interested to test is whether the numerical variables are normal. For that I used
Kolmogorov-Smirnov and Shapiro-Wilk tests. For both the tests the null hypothesis that the data is
normally distributed and the alternative hypothesis is that the data is not normally distributed. The
variables of Age, Time-spent on favorite social media app, number of visits per day and number of posts
per day are tested for the normality and SPSS output is presented in Tables 3,8,11 and 14.In the outputs
we have p-value less than 0.05 which means the variables are not normally distributed. We also
presented the Q-Q plots in the Figures 2,6,8,10 and 20 which shows the same result of the tests.
The next thing I am interested to test is whether there is relationship between the most used social
media application and the gender. For this we use the chi-square test of independence where the null
hypothesis is that there is no relationship between gender and the most used social media app and the
alternative hypothesis is that there is a relationship between gender and the most used social media
app. The test result from SPSS is presented in Table 16, where the p-value is less than 0.05 which means
we have enough evidence to reject the null hypothesis and conclude that there is significant relationship
between gender group and most used social media application group. Also Phis and Cramer V test p-
value are also less than 0.05 which gives strong result.
A homogeneity of variance test is conducted for the 3 numerical variables among the gender group.
Suppose I want to test if the variability of number of visits is same in males and females. Similar
hypothesis test is conducted for the other 2 numerical variables and SPSS output is presented in Table
19. Levens test is used and p-values of all the test are less than 0.05 which means they are non-
homogenious.
To see the relationship between the numerical variables, scatter plot matrix is presented in Figure 11.
The relationships between the numerical variables is not clear from the picture. Going ahead performed
a regression analysis on numbers of posts vs number of visits and time spent, the results of which are
presented in Table 17. The ANOVA table shows that the repressors are significant and further t-tests for
significance of constant and 2 repressors show that the constant term is insignificant(p-value=0.082) and
other repressors are significant at the 0.05 level of confidence. The R square value is only 0.162 which
means only 16 percent of the variability is explained by the repressors. When age is included as a
repressor and regression analysis is performed, we get significant constant along with significant
repressors. There is no much change in the R square value (0.166).
Now each numerical variable is tested for difference in the median values among genders using non
parametric tests and the output for the result is presented in Table 18. The test result shows that there
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
is significant difference in the median values for all the 3 numeric variables among gender. Also the
difference for the medians are tested among favorite social media app and the output is presented.
Similar to above result there is significant difference in the median values of all 3 numeric variables
among the groups of favorite social media app.
The case summaries are presented in Table 21 for 3 numerical variables with respect to gender group
and social media app group. We can see that the other gender than male and female spend a lot of time
on social media apps and with high variability. The time spent of all other apps is less than those
specified. When studied deeply we can see the other gender spends lot of time, visits more number of
times and posts more when compared to male and female. Also the other apps than specified has less
number of visits and less number of posts. These results are concluded from the sample collected.
Conclusion:
Many interesting facts are obtained from the test results and there is a need for further analysis
exploring non parametric and non-homogeneous tests. We can further convert the numerical variables
into categorical variables based on present analysis and then test for independence and homogeneity of
variables using combinations of 5 variables. They are some outliers in the study for example some
samples have 1000 minutes of spending on social media app. So for further analysis I may try trimming
concept.
Document Page
References
Anderson, T. W., & Finn, J. D. (1996). The new statistical analysis of data. New York: Springer.
Mendenhall, W., & Sincich, T. (2003). A second course in statistics: Regression analysis. Upper
Saddle River, NJ: Pearson Education.
Pretorius, T. B. (1995). Inferential statistics: Hypothesis testing and decision-making. Cape
Town: Percept.
Rohatgi, V. K., & Saleh, A. K. (2015). An introduction to probability theory and statistics.
Hoboken, NJ: John Wiley & Sons.
chevron_up_icon
1 out of 6
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]