Analysis of Data Handling and Business Intelligence: LCBB5000 Report

Verified

Added on  2023/01/10

|15
|3185
|2
Report
AI Summary
This report delves into data handling and business intelligence, exploring current trends in data warehousing, business intelligence, and data mining. It analyzes Superstore data using Microsoft Excel for preprocessing, analysis, and visualization, determining a decline in sales and identifying contributing factors like unit price changes. The report then investigates Audi dealership data using Weka, employing K-means clustering to understand customer behavior and preferences. It details the workings of Weka, interprets clustering results, and compares Weka's capabilities to Excel's, highlighting the advantages and disadvantages of each tool. The analysis reveals key insights into sales trends, data analysis methods, and the effectiveness of different analytical tools in extracting valuable information from datasets.
tabler-icon-diamond-filled.svg

Contribute Materials

Your contribution can guide someone’s learning journey. Share your documents today.
Document Page
DATA HANDLING AND
BUSINESS INTELLIGENCE
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Contents
Contents...........................................................................................................................................2
INTRODUCTION...........................................................................................................................1
Part 1................................................................................................................................................1
Current trends in data warehousing, business intelligence and data mining...............................1
Determining the decline in sales over the years while evaluating the use of Excel for pre-
processing the data, analysing the data and visualising the data.................................................2
Part 2................................................................................................................................................7
Workings of Weka with explanation of results...........................................................................7
Common data mining methods....................................................................................................9
Advantages/disadvantages of Weka over Excel........................................................................10
CONCLUSION..............................................................................................................................11
REFERENCES..............................................................................................................................12
Document Page
Document Page
INTRODUCTION
Data handling and business intelligence are the branches of data analysis in which decisions
are made by using a technology driven process where the most important asset to an organisation
is their data (Ataman, Kulick and Sim, 2011). The present report is developed with the aim of
gain an understanding of using statistical analysis tools such as Excel and Weka. For this aim,
this report is divided into two sections. The first section of this report will include current trends
in the world of data mining and warehousing. Along with it, this section will include analysis of
Superstore data in which use of Microsoft Excel will be analysed including pre processing,
analysis and visualisation of data.
Second section of this report will include the analysis of Audi dealership data. This data will
be evaluated using the software application of Weka. This application and data will be used to
analyse certain patterns and behaviour of data. This section will also include analysis of data
mining methods that are used in real life. Lastly, this section will include advantages and
disadvantages of Weka over Microsoft Excel.
Part 1
Current trends in data warehousing, business intelligence and data mining
Different trends in data warehousing
Complex Data Marts Will Define the Future Business Models - With changing time period,
needs and requirement of information is also changing. So, Data Marts will be using business
models with new specification and focused area zones. Moreover, in in this modern time,
everyone is focused towards flow of speed so Data Marts will improve the speed of functionality
of their data higher scale which will also leads to higher and better efficiency.
Column-based Storage is on the Rise - Another trend which is rising is retrieve of data by
adopting column based storage. This is because row based storage tools time to get or processed
the credentials. But with add of column it will be easier to retrieve data analytics.
Mixed Workloads Are Becoming Common - Further comes the mixed workloads amongst
data warehousing. In relevance with Data warehousing they includes several types of workloads
like operational BI, data mining etc. Therefore, they as these workloads occurs new causes and
issues also rises. So, mixed workloads are in trends with new structure (El-Sappagh and et.al,
2011).
1
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
Trends in business intelligence
The rise of explainable IT - This helps in managing information and data related to
artificial intelligence.
Data management enhancement - This is involved in managing the data effectively.
There is need of evaluating the clustering of data for making good results.
Determining the decline in sales over the years while evaluating the use of Excel for pre-
processing the data, analysing the data and visualising the data
Microsoft Excel is a software application which is used to storing the data and analysing it
with the help of functions, formulas and tools. The current data set of Superstore data is analysed
using three steps of data analysis which are data pre-processing, data analysis and data
visualisation. All these steps are analysed below to determine the decrease of sales of Superstore
over years:
Data pre processing
This step is dedicated to process the raw data so that it can be used for further analysis.
This process includes few tasks such as data cleaning, data transforming and data reduction. The
data set of superstore which is provided has two types of data variables; one of such type has
numeric variables and other category has string variables that includes text information. It must
be noted that non numeric variables or string variables are hard to be analysed in Excel as they
are first required to be coded in numbers. Considering such limitation, all numeric data is first
given priority to be used for the analysis of sales of Superstore.
The superstore data is first cleaned by using the option of “SORT”. This option can be
used via Data >> Sort >> Sort by >> Order date. By using this option, entire data was sorted by
order date in which the orders which were dated first were shown first in data set (Held, Moriarty
and Richardson, 2019). When the entire data set was sorted and cleaned, the data was then
reduced in another worksheet of same Excel file. All numeric data variables were considered
which are Order Quantity, Discount, Unit Price, Shipping Cost and Sales. After the data cleaning
and data reduction process was done, the reduced data was then transformed using PIVOT table.
A PIVOT table is an advanced Excel function which was used to transformed the data into rows
and columns where all four years are presented in rows and all variables are presented in
columns with their sum calculated (Hertrich and Mayrhofer, 2016).
2
Document Page
By the above process of data pre processing which has done by using Excel, it can be
evaluated that Excel application can be effectively used to pre process a dataset.
Data analysis and data visualisation
The second and third step in the process of coming to a conclusion for superstore are data
analysis and data visualisation. In the step of data analysis, the Excel function of SUM() will be
used and for data visualisation, Excel tables and graphs will be used.
The aim of analysing the Superstore data is to determine the decrease in sales due to which
first variable analysed here is “Sales” (Walkenbach, 2019). The data gained from PIVOT table is
used to generate a graph attached below:
The above graph provides a clear overview that the sales of Superstore are decreasing in
year 2010 and 2011 but in year 2012, sales of this year starts to enhance again. The maximum
sales of Superstore were in the year of 2009 which decreased in 2010 and then decreased even
more in 2011 and then slightly increased in 2012; So, it can be said that Sales of Superstore are
in a decreasing trend. Once it is estimated that the sales are decreasing, it is now important to
know why the sales of this company are decreasing (Team, 2016). For determining the reasons
behind the decrease of sales of Superstore, all numeric variables of the data set will be analyzed
to understand the pattern or trend of such variables and any variable which is reflecting similar
trend as of sales will be considered as the factor due to which sales of this company are
decreasing. The evidence of such variables is given below in the form of tables and graphs.
Row Sum of Order Sum of Sum of Unit Sum of Sum of
3
Document Page
Labels Quantity Discount Price Shipping Cost Sales
2009 54508 105.39 232830.98 28481.76 4209896.85
2010 54379 105.81 162467.59 27354.26 3560087.04
2011 51413 101.67 159653.11 24939.85 3429944.98
2012 54480 104.32 195467.55 27055.17 3715671.95
The above graph is prepared for order quantity, the values of such variables is showing
similar pattern as sales as order quantity was at its maximum in 2009 which decreased till 2011
and then again increased in 2012. So, it can be said that order quantity is a viable factor or reason
due to which sales of superstore are decreasing.
4
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Another numeric variable is analyzed here which is discount, this variable is somewhere
showing a similar pattern as the discount in 2011 was at its minimum and such aspect is similar
in both discount and sales.
The above graph is for unit price variable and the trend of this variable is highly similar
to the pattern of sales. The unit price variable is the variable reflecting unit price of the products
which are marketed and sold by Superstore and such prices were maximum in year 2009 and
were minimum in 2011 just like sales of this company.
5
Document Page
The last numeric variable analyzed in this case is shipping cost. This variable is also
showing similar pattern as sales which is the decreasing pattern till 2011 and then slightly
increasing in 2012. By above analysis, it can be seen that somewhere all four variables are
showing similar trends to sales, so in order to determine most important factor impacting sales of
Superstore, the Excel tool of correlation is used. Correlation helps in determining that whether
two variables are related or not and if they are related then what is the level of their relationship.
Order Quantity Discount Unit Price Shipping Cost Sales
Order Quantity 1
Discount 0.933976426 1
Unit Price 0.572287359 0.444588395 1
Shipping Cost 0.916714438 0.916913187 0.764093544 1
Sales 0.607858343 0.554020074 0.976545347 0.837509053 1
The above correlation table is representing correlation coefficients for relationship among
each variable. The order quantity has a significant relationship with sales as it is showing
correlation coefficient of 0.6, discount is showing 0.55 correlation coefficient, unit price has 0.97
and shipping cost has 0.83. This analysis evidently proves that unit price has strongest
relationship with sales and unit price is the factor due to which sales of superstore are decreasing.
Superstore has decreased their unit prices of products in 2011 due to which, sales of this
company have also observed to decreased.
6
Document Page
The above analysis evidently concludes that Microsoft Excel can be effectively used for
pre processing, analysis and visualisation of data as this software application is successfully used
for determining the decrease of sales in superstore and the reasons assisting this decrease.
Part 2
Workings of Weka with explanation of results
Weka is a statistical software application which is used for objectives such as clustering
and data analysis (Zubi and Mahmmud, 2013). This software is used here for analysing the Audi
dealership data and method of clustering used is K means clustering. The Audi dealership data
has various variables and the data is in the form of binary codes in which 0 represents negative
and 1 represents positive. The results of clustering are attached below:
=== Run information ===
Scheme:weka.clusterers.SimpleKMeans -N 2 -A "weka.core.EuclideanDistance -R first-last" -I
500 -S 10
Relation: audidealership2
Instances: 100
Attributes: 8
Dealership
Showroom
InternetSearch
RS7
A4
TT
Financing
Purchase
Test mode:evaluate on training data
=== Model and evaluation on training set ===
7
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
kMeans
======
Number of iterations: 6
Within cluster sum of squared errors: 160.2980769230769
Missing values globally replaced with mean/mode
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(100) (48) (52)
=================================================
Dealership 0.54 0.3333 0.7308
Showroom 0.64 0.6667 0.6154
Internet Search 0.39 0.4375 0.3462
RS7 0.53 0.2917 0.75
A4 0.55 0.8125 0.3077
TT 0.5 0.5833 0.4231
Financing 0.6 0.3333 0.8462
Purchase 0.38 0.0417 0.6923
Time taken to build model (full training data) : 0.02 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 48 ( 48%)
8
Document Page
1 52 ( 52%)
From the above screenshots, it can be seen that out of 100 people 54 has entered into
dealership store and 46 have returned from outside. There are two clusters in this analysis, the
first cluster is 0 with overall mean of 48 and second cluster is 1 with overall mean of 52. Each
variable screenshot is reflecting certain step; the last step is of purchase which is showing that
only 38 out of 100 people has made to the final step of purchase of a car (Kallner, 2016). To
analyze the reason that why only 38 people have purchase the car, the screenshot of car models is
considered. Most of the people has showed their interest in A4 version of car which implies Audi
dealership should market and advertise this version of car as it will help them to increase their
sales (Bernard and Senjayawati, 2019).
Common data mining methods
Data mining is defined as the practise which is used for evaluation and analysis of pre
existing databases used in generation of new data and information. It is very essential for
completing tasks and projects effectively. Data mining is known as the extraction of useful
information from raw data available within the company. Various companies use this for
9
Document Page
conversion of raw data into useful information. There are following types of data mining
methods -
Decision trees – This is defined as useful method in which there is development of root
nodes, leaf node and the branches. In this, every node is providing information about results of
previous node. There is a tree structure formed in this method and every raw data is converted
into finished information. There are different organisations using this method for conversion.
This is helpful for creating and managing financial data within company (Thusoo and et.al,
2010).
Clustering - It is another data mining method in which data is divided into smaller groups.
These are called as chunks and they are solved effectively for making desired results. There is
distribution of data and this is managed for developing the similarities and managing the data
effectively. This method is used for analysing new trends of clusters within business.
Correlation – It is known as the analysis of relationship in which two variables are selected.
There is analysis of correlation between these variables and the result is analysed. Clustering is
done within the area in which there are various frameworks used for analysing correlation among
two or more variables. This is managed for developing good results and determining
determinants of correlation properly. It is an important method used by companies for
evaluating the business activities and sales.
Tracking sequential patterns – This is known as the most important method or technique
that is used for managing the work and analysing importance of data mining. There is need to
develop and manage series of activities in order to recognise transactional data. It is an objective
of data mining that different activities are managed properly. This is an important method used
by large organisations. It is best suited for retail sector companies to analyse the situations of real
world. This method is useful for evaluating the scene of companies about the sales and develop
effective strategies (Kimball and et.al, 2008).
Advantages/disadvantages of Weka over Excel
Weka and Microsoft Excel are both statistical software and both are appropriate for certain
conditions of data. Weka is a software which is usually used for clustering and Excel is an
application which is used not only used for data analysis but is also used for pre-processing the
raw data and visualising the findings from the data. Advantages and disadvantages of Weka over
excel are analysed below:
10
tabler-icon-diamond-filled.svg

Paraphrase This Document

Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Document Page
Advantages
Weka saves both time and money at calculation
Viruses generally attach to Excel files but not in Weka
It provides different built-in 'visualization' panels which is not in case of Excel.
Disadvantages
Weka can only handle small datasets unlike Excel
WEKA is an open source data and does not contain high information due to this many big
organisations cannot use this system software and cannot make trust upon this.
CONCLUSION
From the above report, it has been concluded that business intelligence and data handling
is the future of business analytics as these help in analysing the data and further leads to effective
decision making. It has been also concluded from above report that Microsoft Excel software is
better than WEKA when the dataset is large and WEKA is appropriate for smaller and binary
data.
11
Document Page
REFERENCES
Books and Journals
Ataman, K., Kulick, G. and Sim, T., 2011. Teaching decision tree classification using Microsoft
Excel. INFORMS Transactions on Education. 11(3). pp.123-131.
Bernard, M. and Senjayawati, E., 2019. Developing the Students’ Ability in Understanding
Mathematics and Self-confidence with VBA for Excel. JRAMathEdu (Journal of
Research and Advances in Mathematics Education). 1(1). pp.45-56.
El-Sappagh, S.H.A., and et.al, 2011. A proposed model for data warehouse ETL processes.
Journal of King Saud University-Computer and Information Sciences, .23(2). pp.91-
104.
Held, B., Moriarty, B. and Richardson, T., 2019. Microsoft Excel Functions and Formulas.
Stylus Publishing, LLC.
Hertrich, S. and Mayrhofer, U., 2016. Audi and the Chinese Market: A Success Story. In Market
Entry in China (pp. 37-44). Springer, Cham.
Kallner, A., 2016. A study of simulated normal probability functions using Microsoft
Excel. Accreditation and Quality Assurance. 21(4). pp.271-276.
Kimball, R., and et.al, 2008. The data warehouse lifecycle toolkit. John Wiley & Sons.
Team, R.C., 2016. foreign: read data stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka,
dBase. R package version 0.8-67.
Thusoo, A., and et.al, 2010, March. Hive-a petabyte scale data warehouse using hadoop. In 2010
IEEE 26th international conference on data engineering (ICDE 2010), .(pp. 996-1005).
IEEE.
Walkenbach, J., 2019. Microsoft? Excel? 2016 bible.
Zubi, Z.S. and Mahmmud, A.A., 2013. Using data mining techniques to analyze crime patterns
in the libyan national crime data. In Proceedings of the 1st WSEAS Interna-tional
Conference on Image Processing and Pattern Recognition (pp. 79-85).
12
chevron_up_icon
1 out of 15
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]