ProductsLogo
LogoStudy Documents
LogoAI Grader
LogoAI Answer
LogoAI Code Checker
LogoPlagiarism Checker
LogoAI Paraphraser
LogoAI Quiz
LogoAI Detector
PricingBlogAbout Us
logo

Business Intelligence and Data Visualization: Crowd Funding and Great Eastern University Analysis

Verified

Added on  2023/06/12

|15
|3883
|83
AI Summary
This article explores the analysis of crowd funding data set using statistical software's and provides insights for success of project. It also discusses the benefits of statistical data analysis using different software's and advantages of using advanced data analytics tool at Great Eastern University. The article includes graphical and statistical analysis of the data set, along with descriptive and inferential statistics. The subject is Business Intelligence and Data Visualization, and the course code and college/university are not mentioned.

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
Business Intelligence and Data Visualization
Assignment 2

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Table of Contents
Abstract................................................................................................................................3
Part I. Crowd Funding.......................................................................................................4
a. Introduction....................................................................................................................5
b. Variables.........................................................................................................................4
c. Graphical Analysis.........................................................................................................5
d. Statistical Analysis.........................................................................................................3
e. Results and Conclusions.................................................................................................3
Part II. Great Eastern University.....................................................................................4
a. Benefits of Statistical Data analysis using different Software’s......................................5
b. Advantages of using advanced data analytics tool at Great Eastern University..............4
c. Process of Using Analytic tools at Great Eastern University..........................................4
d. Meaning of quote by Ken Rudin.....................................................................................4
e. Implementation of Ken’s suggestion at Great Eastern University..................................4
References............................................................................................................................5
Appendix..............................................................................................................................5
2 | P a g e
Document Page
Assignment 2
Business Intelligence and Data Visualization
Abstract
Data analysis for the crowd funding data set was performed by using different statistical
software’s. It is observed that duration for most of the project is given as 30 days. It is observed
that the frequency for the less project update count is more and as the project update counts are
increasing the frequency is decreasing. It is observed that the major categories used by the people
for their projects are music, film and video, publishing, and arts. It is observed that most of the
projects have their own facebook pages. It is observed that about 14368 projects are categorized
as failed, while about 14079 projects are categorized as success. Average number of days for
completion of projects is given as 32.75 days with the standard deviation of 10.98 days. There is
sufficient evidence to conclude that there is a statistically significant difference exists between
the average goal amounts for the different duration periods in days. There is sufficient evidence
to conclude that the average number of count of pledges for failed projects and succeed projects
is not same.
Part I
Crowd Funding
a. Introduction
Business intelligence and Data visualization are the most important scenarios in
today’s world of businesses. Data visualization consists of the different techniques for
exploration of the data by using different tools and techniques of statistical analysis.
Here, we have to analyse the data set related to the fund collection for different types
of projects. For the analysis of this fund data set, we have to use different statistical
tools and techniques. After analysis of this fund data set, we have to find the facts that
would be helpful for obtain money via crowd funding to fund a creative project. We
will get advices for success of project and we will get general idea about the crowd
funding for creation of different projects. This data analysis work will be useful for
the people who want to create similar crowd funding projects. In terms of project
succeeds, we have to find out the most important and significant attributes for crowd
3 | P a g e
Document Page
funding data set. We will use the software’s like BigML and SPSS for the analysis of
the given data sets. For the statistical analysis of the given data set, we will use basic
descriptive statistics, graphical analysis, and inferential statistical analysis by using
SPSS and other software’s. Let us see this research study in detail.
b. Variables
For this research study, we have to analyse the crowd funding data set by using
different statistical software’s. Data for this research study is downloaded from the
blackboard. The data set PleaseFundThis.xlsx have many variables such as project
name, date launched, duration days, goal $, percent raised, project state, amount
pledged $, major category, minor category, etc. The list of all variables with scale of
variables is summarised as below:
No. Variable Scale
1 project_name Nominal
2 date_launched Nominal
3 duration_days Ratio
4 goal_$ Ratio
5 percent_raised Ratio
6 project_state Nominal
7 amt_pledged_$ Ratio
8 major_category Nominal
9 minor_category Nominal
10 project_updated_count Ratio
11 city Nominal
12 region Nominal
13 number_of_pledgers Ratio
14 comments_count Ratio
15 avg_amt$_per_pledger Ratio
16 project_has_video Nominal
17 project_has_facebook_page Nominal
18 facebook_friends_count Ratio
19 project_has_pledge_rewards Nominal
20 lowest_pledge_level_$ Ratio
21 highest_pledge_level_$ Ratio
22 total_count_of_pledge_levels Ratio
23 success Nominal
4 | P a g e

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
We have to use descriptive statistical analysis and inferential statistical techniques for
the analysis of above listed variables.
c. Graphical Analysis
In this section, we have to see the graphical analysis of the variables included in the
crowd funding data set. First of all we have to see some histograms for the variables
included in the given data set. Required histograms are given as below:
From above histogram for the variable duration of project in days, it is observed that
duration for most of the project is given as 30 days. Also, this histogram indicated
that the median duration of the different projects is 30 days. So, it is recommended
that the duration of the new project should be 30 days or near to 30 days. A 30 day
project duration is most popular project duration for any project and most people
preferred this time period for short as well as long projects. Given data is collected
from all over the world for the different projects including short films, dramas, etc.
and therefore this finding would be applicable all over the world. Also, it is observed
that maximum duration taken by the projects is not more than 60 days. So peoples are
abandoned to complete their projects within one or two months for succeed in their
fund collection.
Now, we have to see the histogram for the variable project update count. Required
histogram is given below.
5 | P a g e
Document Page
From this histogram, it is observed that the frequency for the less project update count
is more and as the project update counts are increasing the frequency is decreasing.
This variable is right skewed in nature. From this histogram, it is revealed that the
project update count would be minimize for getting highest frequency.
d. Statistical Analysis
In this section, we have to see statistical analysis of the given data set for crowd
funding. First of all we have to see some frequency distributions for the variables
which are categorical in nature. The frequency distribution will provide us the general
idea about the distribution of different categories under the given variables. The
frequency distribution for the variable major category of the project is given as
below:
Tally for Discrete Variables: major_category
major_category Count
Art 2577
Comics 886
Dance 378
Design 1475
Fashion 1265
Film & Video 5967
Food 1334
Games 2091
6 | P a g e
Document Page
Music 6160
Photography 775
Publishing 3672
Technology 705
Theater 1162
N= 28447
From above frequency distribution for the variable major category for project, it is
observed that the major categories used by the people for their projects are music,
film and video, publishing, and arts. So, it is better to select the new project under
these categories for getting more success.
Now, we have to see the frequency distribution for the variable minor category.
Required frequency distribution is given as below:
Tally for Discrete Variables: minor_category
minor_category Count
Animation 268
Art 565
Art Book 272
Board & Card Games 294
Children's Book 651
Classical Music 305
Comics 886
Conceptual Art 103
Country & Folk 589
Crafts 242
Dance 378
Design 195
Digital Art 80
Documentary 1634
Electronic Music 180
Fashion 1265
Fiction 1022
Film & Video 1135
Food 1334
Games 266
Graphic Design 166
Hardware 229
Hip-Hop 353
Illustration 154
Indie Rock 813
Jazz 261
Journalism 149
Mixed Media 290
Music 1885
Narrative Film 754
Nonfiction 887
Open Hardware 44
Open Software 115
Painting 272
Performance Art 296
Periodical 169
Photography 775
Poetry 134
Pop 462
Product Design 1114
7 | P a g e

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Public Art 381
Publishing 388
Rock 1075
Sculpture 194
Short Film 1465
Tabletop Games 555
Technology 317
Theater 1162
Video Games 976
Webseries 711
World Music 237
N= 28447
Some more frequency distributions for the categorical variables included in the given
data set are summarised as below:
project_has_video Count project_has_facebook_page Count
FALSE 4440 No 7969
TRUE 24007 Yes 20478
N= 28447 N= 28447
project_has_pledge_rewards Count project_success Count
Yes 28447 FALSE 14368
N= 28447 TRUE 14079
N= 28447
It is observed that about 4440 project don’t have video, while 24007 projects have
video. From the given statistical analysis it is also revealed that about 7969 projects
don’t have their own facebook page while 20478 projects have facebook page. So, it
is important to create facebook page for our project for getting more success. So, it is
recommended to create profiles on different social media sites for getting contacted
with people. It is seen that all projects has pledge rewards. From the data analysis it is
observed that about 14368 projects are categorized as failed, while about 14079
projects are categorized as success.
Now, we have to see some descriptive statistics for the variables included in the
crowd funding data set. First of all we have to see the descriptive statistics for the
variable duration in days. Required descriptive statistics for this variable is given as
below:
Descriptive Statistics: duration_days
Variable N Mean Median TrMean StDev SE Mean
duration 28447 32.750 30.000 32.383 10.980 0.065
8 | P a g e
Document Page
Variable Minimum Maximum Q1 Q3
duration 1.000 60.000 30.000 35.000
Average number of days for completion of projects is given as 32.75 days with the
standard deviation of 10.98 days.
Descriptive statistics for the variable goal amount in $ is given as below:
Descriptive Statistics: goal_$
Variable N Mean Median TrMean StDev SE Mean
goal_$ 28447 20575 5000 9186 241016 1429
Variable Minimum Maximum Q1 Q3
goal_$ 1 21474836 2000 12000
Some more descriptive statistics for the variables included in the given data set are
summarised below:
Descriptive Statistics: percent_raised
Variable N Mean Median TrMean StDev SE Mean
percent_ 28447 121 73 68 1758 10
Variable Minimum Maximum Q1 Q3
percent_ 0 240716 5 113
Descriptive Statistics: amt_pledged_$
Variable N Mean Median TrMean StDev SE Mean
amt_pled 28447 10196 1710 3999 91367 542
Variable Minimum Maximum Q1 Q3
amt_pled 0 8596475 290 5675
Descriptive Statistics: project_update_count
Variable N Mean Median TrMean StDev SE Mean
project_ 28447 3.219 1.000 2.467 5.228 0.031
Variable Minimum Maximum Q1 Q3
project_ 0.000 147.000 0.000 4.000
Descriptive Statistics: number_of_pledgers
Variable N Mean Median TrMean StDev SE Mean
9 | P a g e
Document Page
number_o 28447 133.2 28.0 53.6 1124.9 6.7
Variable Minimum Maximum Q1 Q3
number_o 0.0 91584.0 6.0 80.0
Descriptive Statistics: comments_count
Variable N Mean Median TrMean StDev SE Mean
comments 28447 30.3 0.0 2.1 740.1 4.4
Variable Minimum Maximum Q1 Q3
comments 0.0 59463.0 0.0 3.0
Descriptive Statistics: facebook_friends_count
Variable N N* Mean Median TrMean StDev
facebook 17886 10561 479.22 221.00 354.24 777.86
Variable SE Mean Minimum Maximum Q1 Q3
facebook 5.82 0.00 5358.00 0.00 596.00
Descriptive Statistics: total_count_of_pledge_levels
Variable N Mean Median TrMean StDev SE Mean
total_co 28447 9.2036 8.0000 8.7629 5.2298 0.0310
Variable Minimum Maximum Q1 Q3
total_co 1.0000 31.0000 6.0000 11.0000
Now, we have to see some inferential statistics for checking some claims about the
variables involved in the given data set. First of all we have to check the claim
whether the average goal amount in $ same for different duration period in days or
not. For checking this hypothesis or claim we have to use one way analysis of
variance or one way ANOVA F test. The null and alternative hypotheses for this one
way ANOVA F test are summarised as below:
Null hypothesis: H0: There is no any statistically significant difference exists between
the average goal amounts for the different duration periods in days.
Alternative hypothesis: Ha: There is a statistically significant difference exists
between the average goal amounts for the different duration periods in days.
We consider 5% level of significance for this test. Required ANOVA table for this
test is given as below:
10 | P a g e

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
One-way ANOVA: goal_$ versus duration_days
Analysis of Variance for goal_$
Source DF SS MS F P
duration 59 6.836E+12 1.159E+11 2.00 0.000
Error 28387 1.646E+15 5.797E+10
Total 28446 1.652E+15
The p-value for this ANOVA test is given as 0.00 < alpha value 0.05, so we reject the
null hypothesis that There is no any statistically significant difference exists between
the average goal amounts for the different duration periods in days.
There is sufficient evidence to conclude that there is a statistically significant
difference exists between the average goal amounts for the different duration periods
in days.
Now, we have to test one more claim or hypothesis whether the average number of
count of pledges for failed projects and succeed projects are same or not. For
checking this hypothesis we have to use the two sample t test for the population
means. The null and alternative hypotheses for this test are summarised as below:
Null hypothesis: H0: There average number of count of pledges for failed projects and
succeed projects are same.
Alternative hypothesis: Ha: The average number of count of pledges for failed
projects and succeed projects are not same.
We consider 5% level of significance for this test.
Output for this test is given as below:
Two-Sample T-Test and CI: total_count_of_pledge_levels, project_success
Two-sample T for total_count_of_pledge_levels
project_ N Mean StDev SE Mean
FALSE 14368 8.37 4.76 0.040
TRUE 14079 10.06 5.54 0.047
Difference = mu (FALSE) - mu (TRUE )
Estimate for difference: -1.6859
11 | P a g e
Document Page
95% CI for difference: (-1.8060, -1.5657)
T-Test of difference = 0 (vs not =): T-Value = -27.50 P-Value = 0.000 DF = 27652
The p-value for this test is given as 0.00 which is less than alpha value 0.05, so we
reject the null hypothesis that there average number of count of pledges for failed
projects and succeed projects are same.
There is sufficient evidence to conclude that the average number of count of pledges
for failed projects and succeed projects is not same.
e. Results and Conclusions
From the analysis of the given data set we find out so many facts regarding different
variables. Some important results from this data analysis are summarised as below:
1. It is observed that duration for most of the project is given as 30 days. Also, this
histogram indicated that the median duration of the different projects is 30 days.
2. It is observed that the frequency for the less project update count is more and as the
project update counts are increasing the frequency is decreasing. This variable is right
skewed in nature.
3. It is observed that the major categories used by the people for their projects are music,
film and video, publishing, and arts.
4. It is observed that about 4440 project don’t have video, while 24007 projects have video.
From the given statistical analysis it is also revealed that about 7969 projects don’t have
their own facebook page while 20478 projects have facebook page. So, it is important to
create facebook page for our project for getting more success.
5. It is observed that about 14368 projects are categorized as failed, while about 14079
projects are categorized as success.
6. Average number of days for completion of projects is given as 32.75 days with the
standard deviation of 10.98 days.
7. There is sufficient evidence to conclude that there is a statistically significant difference
exists between the average goal amounts for the different duration periods in days.
8. There is sufficient evidence to conclude that the average number of count of pledges for
failed projects and succeed projects is not same.
Part II
12 | P a g e
Document Page
Great Eastern University
Task A
Benefits of Statistical Data analysis using different Software’s
As we know that peoples and organizations uses the excel spreadsheets for maintaining their
data. Great Eastern University is a very big university located in Melbourne Australia also uses
the excel spreadsheets for maintaining their data of 20000 current students and millions of past
students. The Great Eastern University should used different statistical and analytics tools such
as Power Pivot, Power BI, Tableau, BigML, SPSS, Geospatial tools, Google Analytics, Minitab,
Matlab, SAS, R, IBM Watson, etc. These softwares provide much more reliability with statistical
data analysis. Excel do not provide advanced analysis and it needs add on or extensions for
advanced work. Excel spreadsheets do not provide suitable outputs or tables, but other statistical
softwares provides very excellent outputs with proper tables. Excel spreadsheets unable to
perform advanced statistical tests and most of the time we need to use manual commands for
completion of analysis. We know that BigML, SPSS, etc. are premium software products that are
used for a wide variety of statistical analysis. This analysis includes the data compilation,
preparation, graphics, modelling and analysis. These statistical software products play an
important role in the market research, surveying, healthcare and social sciences. If your business
or organization is using Microsoft excel spreadsheet for market research or any other type of
business related research, and then you would consider using SPSS instead. AS compared to
excel spreadsheet, other statistical software products have an easier and quicker access to basic
functions such as descriptive statistics in pull down menus. These software products consist of
wide range of charts and graphs to choose from and also there is faster access to statistical tests.
These statistical software products made machine learning easy and comfortable.
Task B
Advantages of using advanced data analytics tool at Great Eastern University
If we use the advanced statistical software products such as Power Pivot, Power BI, Tableau,
BigML, SPSS, Geospatial tools, Google Analytics, Minitab, Matlab, SAS, R, IBM Watson, etc.,
there are so many benefits. If we use these software products in the Great Eastern University,
then there would be so many benefits. The Great Eastern University will be save their time and
cost by using these products. Also, they would represent the results in a proper and attractive
way. Data analysis work will become more reliable and easy. Different tables for the analytical
study would be easily available. Data keeping and handing would be easier as compared to
spreadsheets. By using these software products, University will represent all types of information
in a click.
13 | P a g e

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Task C
Process of Using Analytic tools at Great Eastern University
By using the above discussed statistical software products; the university could be able to
increasing the number of students. Also, university can analyse the results from different social
media, analytic tools, and Geospatial tools for improving the student experience for all students.
The university will understand the different facts after all types of data analysis related to the
student. University may take decisions by using the results from these data analytics. So, these
types of analytics work will help in increasing the student retention at university.
Task D
Meaning of quote by Ken Rudin
We know that, Ken Rudin, the Director of Analytics at Facebook, mentioned that organizations
must “focus on impacts, not insights”. This statement explains the importance of focusing on
impact rather than insights. During the data analytics work, it is necessary to focus on the
impacts of the different factors or variables included in the data analytics and there is no need to
focus on insights regarding different treatments, factors, variables, etc.
Task E
Implementation of Ken’s suggestion at Great Eastern University
According the Ken Rudin, organizations must focus on the impacts and not insights. For
implementation of Ken’s suggestion at Great Eastern University, it is required to use the
advanced statistical software products for the data analysis and management team or
administration should be focus on the impacts of this analysis and more discussion other than the
results obtained from this analysis should be avoided.
References
Antony, J. (2003). Design of Experiments for Engineers and Scientists. Butterworth Limited.
14 | P a g e
Document Page
Babbie, E. R. (2009). The Practice of Social Research. Wadsworth.
Beran, R. (2000). React scatterplot smoothers: Superefficiency through basis economy. Journal
of the American Statistical Association.
Bickel, P. J. and Doksum, K. A. (2000). Mathematical Statistics: Basic Ideas and Selected
Topics, Vol I. Prentice Hall.
Casella, G. and Berger, R. L. (2002). Statistical Inference. Duxbury Press.
Cox, D. R. and Hinkley, D. V. (2000). Theoretical Statistics. Chapman and Hall Ltd.
Degroot, M. and Schervish, M. (2002). Probability and Statistics. Addison - Wesley.
Dobson, A. J. (2001). An introduction to generalized linear models. Chapman and Hall Ltd.
Evans, M. (2004). Probability and Statistics: The Science of Uncertainty. Freeman and
Company.
Hastle, T., Tibshirani, R. and Friedman, J. H. (2001). The elements of statistical learning: data
mining, inference, and prediction: with 200 full-color illustrations. Springer - Verlag Inc.
Hogg, R., Craig, A., and McKean, J. (2004). An Introduction to Mathematical Statistics.
Prentice Hall.
Liese, F. and Miescke, K. (2008). Statistical Decision Theory: Estimation, Testing, and
Selection. Springer.
Pearl, J. (2000). Casuality: models, reasoning, and inference. Cambridge University Press.
Ross, S. (2014). Introduction to Probability and Statistics for Engineers and Scientists. London:
Academic Press.
15 | P a g e
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
[object Object]

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]