logo

Analysis of Flights Dataset from Desklib

   

Added on  2023-04-25

11 Pages2460 Words55 Views
Section 1: Introduction
a)
(Minton, 2018)
b) This dataset is a primary data where no statistical method has been applied to modify the
dataset. The data gives the information of different flights which run between Australia and
different international cities in the world in the different months of the year for a period of
15 years from 2003 to 2018. The record is keeping track of how many flights are leaving the
country and how many are coming in which indirectly will give a count of people travelling
across different countries.
The variables involved are:
Variable Description Values
In-Out Airlines comes in or
goes out
I for in and O for out
Australian City Which Australian city
airline lands or Flies
out.
Australian city names
International City Which international
city airline lands or
flies out
International city
names
Airlines Name of the airline Name of the airline
Route Via which airport
airlines flies
Short forms of various
airports
Port country Which country airlines
belongs to
Name of the country
Port Region Which region airline
belongs
Region name
Service country Which country do the
service
Country name
Stops Number of stops
airlines have
0,1,2
All Flights Number flight in or out Number in integer

in the month
Max seat Number of maximum
seats
Number in integer
Year Which year Number in the year
Month Number Which month Number of the month
From the list of variables we can think few variables which will have an impact. So we can consider
the column All Flights as the response variable and Max seat, Month Number, Stops, Airlines, Routes
are some of the independent variables that we can think of
i) The most important case that can be considered is that which airlines have the all flight
count maximum or else all flight counts depends on the airlines.
ii) Also we can think that how can number of stops affect the all flight.
iii) Whether the maximum number of seats is also one of the factor for determining the
type of airlines and as we have mentioned in 1st case that the type of airlines affects the
all flights concept so indirectly we can say that maximum number of seats is affecting
the variable all flights.
c) We have picked a random sample of 1000 from the initial dataset1. Yes it is a secondary
dataset as we have processed the data.
From the list of variables we can think few variables which will have an impact. So we can consider
the column All Flights as the response variable and Max seat, Month Number, Stops, Airlines, Routes
are some of the independent variables that we can think of.
i) The most important case that can be considered is that which airlines have the all flight
count maximum or else all flight counts depends on the airlines.
ii) Also we can think that how can number of stops affect the all flight.
iii) Whether the maximum number of seats is also one of the factor for determining the
type of airlines and as we have mentioned in 1st case that the type of airlines affects the
all flights concept so indirectly we can say that maximum number of seats is affecting
the variable all flights.
Section 2: Analysis of single variable in Dataset 1
a)
31 28 23 93 49 33 46 81 73 88 69 144115 97 105
0
2000
4000
6000
8000
10000
12000
14000
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
Histogram
Frequency
Cumulative %
All_Flights
Frequency

The histogram plot shows that the frequency of the all flights and the frequency of all flights above
100 are less so it signifies that most of the frequency happens in the all flights whose values are
within 32. So from the table we can also see that in the excel sheet.
0 - 9 10 -
19 20 -
29 30 -
39 40 -
49 50 -
59 60 -
69 70 -
79 80 -
89 90 -
99 100 -
109 110 -
119 120 -
129 130 -
139 140 -
149 150 -
159 160 -
169
0
5000
10000
15000
20000
25000
30000
Frequency
All_Flig
hts
Freque
ncy
0 - 9 20601
10 - 19 24032
20 - 29 13950
30 - 39 23715
40 - 49 2977
50 - 59 2098
60 - 69 2912
70 - 79 385
80 - 89 482
90 - 99 913
100 -
109 148
110 -
119 193
120 -
129 645
130 -
139 221
140 -
149 210
150 -
159 183
160 -
169 6
If we draw the histogram with the range we can see that it follows a right skewed distribution so
the tail is in the right side of the plot.

0 - 19 20 - 39 40 - 59 60 - 79 80 - 99 100 - 119 120 - 139 140 - 159 160 - 179
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
Fr equency
All_Flig
hts
Freque
ncy
0 - 19 44633
20 - 39 37665
40 - 59 5075
60 - 79 3297
80 - 99 1395
100 -
119 341
120 -
139 866
140 -
159 393
160 -
179 6
From this classification we can see that the histogram follows a Poisson distribution
b)
H0 : The average number of flights came in and flew out to Australia in a month between September
2003 and September 2018 is more than 30
Ha : The average number of flights came in and flew out to Australia in a month between September
2003 and September 2018 is less than equal to 30
So we are going to conduct a one sided z test for comparing the means to 30 at a 95% confidence
interval.
All_Flights

End of preview

Want to access all the pages? Upload your documents or become a member.

Related Documents
Analysis of Single and Two Variables in Aviation Industry Dataset
|15
|2790
|62

Analyzing Airlines Services Data: Insights from Dataset Analysis and Discussion
|8
|2251
|190

Analysis of the Airline Industry in Australia
|13
|2296
|383

Airline Analysis
|8
|1185
|306

Evaluating Airlines Frequencies and Passenger Satisfaction of Australian International Airports
|10
|2222
|238

Analysis of International Flights in Australia
|9
|1734
|256