This report discusses data visualisation and analysis techniques, different types of data, software for statistical analysis, and problems with visualisation. It provides insights into data analysis and interpretation.
Contribute Materials
Your contribution can guide someone’s learning journey. Share your
documents today.
Data Analysis and Visualisation-2 1
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
Table of Contents INTRODUCTION...........................................................................................................................3 MAIN BODY..................................................................................................................................3 CONCLUSION..............................................................................................................................11 REFERENCES..............................................................................................................................12 2
INTRODUCTION Data visualisation is process of analysing and interpreting data in graphical way through charts and tables. Through that, it becomes easy to develop graphs, tables, etc. and understand data in effective way. also, with that it becomes easy to find out trends, patterns, etc. in data and then doing analysis. In this report it will be described about data taken of 50 patients of covid 19 infected in Bame and white people (Ahmed, and Lugovic, 2019). Also, it will be discussed on various types of data and methods for showing it in graphical way. Moreover, problem related with visualisation will be evaluated. MAIN BODY There are various types of data and info available which enable in doing analysis and interpreting. Also, there are different tests which is applied in it such as frequency, descriptive, normal curve, etc. these all allow in gathering of data and then analysing outcomes. In addition to that, there are different types of graphs available as well. The type of graph developed depends on data available. Software for statistical analysis and data visualisation It has been evaluated that there are various software which is available for statistical analysis. The use of software depends in scholar willingness and type of data. Thus, the type of software is as below SPSS- it is most common software used for data visualisation. In this there are many tests which is applied within data. it enables in making things easy to analyse data and write findings. Also, in this graphs and tables are automatically created and outcomes are obtained. Tableau-this is also a software for data visualisation. it also helps in understanding data and obtaining relevant outcomes. it creates a wide range of data visualisation to present data in proper way (Al-Saqaf, 2016). Datawrapper- it is also a software for creating graphs and charts. However, it is used only for developing charts. This is open source of data that is used to generate results. Different types of data Data typesare importantconceptin statistics.Thisisbecause itisnecessary to understand data in proper way so that assumptions are mad properly and there is no change in it. besides, data types needs to same in order to obtain results. furthermore, having a good 3
understanding of data enable in doing exploratory data analysis. On basis of it right visualisation is done. So, different data is defined as below: Categorical data – it is a type of data that represent characteristics. this means it shows gender, age, income level, etc. basically, it is named as numerical value that is either 0 and 1 but there is no meaning in it. Nominal data – they represent discrete values and is used in labelling of variables. it does not contain any quantitative value. there is no change in value even when order is changed. Ordinal data- it represent discrete and order units. Thus, it is same as nominal data. Numerical data Discrete data – in this type of data the values are distinct and separate. However, data is not measured but count. Here info represent can be categorised into various classes (Bertoni, and et.al., 2020). Continuous data – this type of data represent measurement and it does not count. But can be easily measured. Interval data – the data where difference between values or units is same. Also, it contains numeric value which is ordered in specific way. there are many ways of how things are done and exact different between value is identified. for example- temperature = -5, -10, -15, etc. the main issue in this data is it does not have any true value. so, due to that no descriptive and inferential stats are applied in it. Ratio data – in this units have same difference in them. they are same as interval value but there they do not have an absolute zero. For example- 5, 10, 15 etc. Display methods for differing datatype Along with that, there are many methods and ways by which data is visualised. This makes it easy to find out trends and pattern in it. However, it is essential to use appropriate data type so that accordingly results is obtained (Boddy, and et.al., 2017). Thus, various methods are defined as follows : Frequency – it defines rate at which something occur within dataset in particular time period. Proportion – it can be easily calculated by dividing frequency with number of events. Percentage – in this pie chart or bar chart can be used in order to visualise data in effective way. this shows percentage of total frequency of each data element in it. 4
Paraphrase This Document
Need a fresh take? Get an instant paraphrase of this document with our AI Paraphraser
Graphs- this is way in which data is represented in graphs and charts. They can be of different types like pie, etc. Tables-the data is presented in table form where many columns are made. It changes way of making things easy and understanding data. Histogram – it is graphical display of data by using charts of different heights. Usually, it is similar to bar chart. Here, group number are presented into range. thus, height of bar shows how many units are there into it (Coudriau, Lahmadi, A. and Francois, 2016). Boxplots- it is used as plot for by which data is summarised and measures on interval scale. thus, it is used in explaining data analysis. the use of boxplot is done to shape distribution, central value and variation. Scatterplots- this used as mathematical diagram in which cartesain coordinate is presented to display value of two variables. Bar chart- it is a rectangular column in which there is change in length of values which it shows. When there is long bar it shows big number. On X axis it shows category whereas on Y axis it shows discreate value. Parallel coordinate plots – it enables in comparing features of various series on set of numeric value. here, each bar represents a variable as it is having its own scale. Maps – the data can be represented through map in which there are different areas that is highlighted in it with colours. This usually shows location, states, regions, etc. Here colour can also be done to find out value of metric in it. It is important to extract data properly so that meaning can be identified from it. The arranging of data makes it easy to analyse it and obtain results. besides that, there are certain methods and technique by which extraction of data is done. It depends on nature and type of data. Distortion is a technique in which data is classified into various groups and then sorting is done. This can also distort by giving fake outcomes and stats. It is done to segregate complex data in effective way (Hiriyannaiah, and et.al., 2018). Identifying problems with visualisation There are several problems which is being faced in visualisation. The problem needs to be determined so that it can be solved by taking proper measures. When issue occurs then it may lead to impact on outcomes. Thus, they are defined as below : 5
Oversimplification of data –It is common problem which occur in it as data simplifying is complex tasks in it. The data points need to be defined as it led to unfound conclusion. This basic thing is not done in it (Katina, Vittert, and Bowman, 2020). Human limitation of algorithms –In this algorithms are made by human so it reduces data outcomes as algo made may be flawed. Therefore, most algo are based on national scale thus they are not fit into all algorithms. Hence, it does not address needs of individuals. Overreliance on visuals-There is more reliance on data visuals to obtain outcomes. thus, conclusion obtained from it may be false which not be applicable in data outcomes. So, it requires to draw conclusion based on practical way. This will make it easy to interpret data. Inevitability of visualisation –There are many data model available for analysing data. Thus, company may develop product before visualisation. hence, it may affect on over reliance and it results in limitation of human error in developing algo (Leal and et.al., 2016). There is need to develop hypothesis in order to find out relationship between ratio of covid 19 on white and bame people. This is because it will enable in obtaining relevant outcomes and testing significance value of P. Thus, hypothesis is formed as below Hypothesis– is there any difference in ratio of covid 19 infection in white and Bame patient in UK. Frequency table Statistics whitepatientbame N Valid5050 Missing00 Mean36.180036.7400 Median36.000034.5000 Mode20.00a50.00 Std. Deviation9.419249.31711 Variance88.72286.809 a. Multiple modes exist. The smallest value is shown 6
Interpretation- From above table it can be identified that mean of white patient is 36.18 and median is 36. Also, mode is 20 and SD is 9.41. However, in Bame patient the mean is 36.74 and median is 34.5. Similarly, mode is 50 and SD is 9.31. Frequency Table whitepatient FrequencyPercentValid PercentCumulative Percent Valid 20.0048.08.08.0 22.0024.04.012.0 25.0024.04.016.0 26.0012.02.018.0 27.0024.04.022.0 28.0012.02.024.0 29.0012.02.026.0 30.0024.04.030.0 32.0012.02.032.0 33.0048.08.040.0 34.0024.04.044.0 35.0036.06.050.0 37.0024.04.054.0 38.0024.04.058.0 40.0048.08.066.0 42.0012.02.068.0 43.0048.08.076.0 44.0036.06.082.0 45.0012.02.084.0 48.0024.04.088.0 49.0012.02.090.0 50.0048.08.098.0 55.0012.02.0100.0 Total50100.0100.0 7
Secure Best Marks with AI Grader
Need help grading? Try our AI Grader for instant feedback on your assignments.
bame Pearson Correlation.0721 Sig. (2-tailed).621 N5050 Interpretation– By analysing the above table it can be found that significance value obtained is P= .621 which is more than P= 0.05. thus, it means that null hypothesis is accepted. That means there is difference in ratio of covid 19 infection in white and Bame patient in UK. In both patient ratio of covid 19 infection varies. However, there is need to include many other factors as well in dataset by which results has affected. It will enable in finding out relationship between both patient and how infection ratio varies. In dataset there is no other feature which may appear. It is because only data of infection rate was included in it. CONCLUSION It has been concluded that there are different types of software available for data visualisation that is SPSS,Tableau, Datawrapper, etc. besides that, there are various types of data such as nominal, ordinal, etc. and methods for display data types like histogram, maps, boxplots, etc. the problems with visualisation are oversimplification of data, overreliance on visuals, etc. 11