logo

Join Strategies and Cost Estimation in SQL Query

5 Pages1467 Words162 Views
   

Added on  2020-06-03

Join Strategies and Cost Estimation in SQL Query

   Added on 2020-06-03

ShareRelated Documents
Question 1 (5 marks) Consider two relations A and B. A is of size 10,000 disk pages, and B is of size 1,000 pages. Consider the following SQL statement: SELECT * FROM A, B WHERE A.a = B.a; We wish to evaluate an equijoin between A and B, with an equality condition A.a = B.a. There are 502 buffer pages available for this operation. Both relations are stored as simple heap files. Neither relation has any indexes built on it. Consider alternative join strategies described below and calculate the cost of each alternative. Evaluate the algorithms using the number of disk I/O's as the cost. For each strategy, provide the formulae you use to calculate your cost estimates. a)Page-oriented Nested Loops Join. Consider A as the outer relation. (1 mark) There are 2 options1. A is outer, B is inner relation2. B is outer, A is inner relationBuffer utilizationOutput: 1 pageInner relation: 1 pageOuter relation: 50 pages1. A is outer, B is inner relationCost to read outer = 10,000 I/OsJoin = 1,000 I/OsTotal cost = (10,000 + 1,000) = 11,000 I/Osb)Block-oriented Nested Loops Join. Consider A as the outer relation. (1 mark) 2. B is outer, B is inner relationCost to read outer = 500 I/OsJoin = 1,000 I/OsTotal cost = (500 + 1,000) = 1500 I/OsBest case is the first optionTherefore cost is 1500 I/OsTo minimize the cost of the join, the buffer size should be 202 pages so that we can store entire smaller relation in the buffer.
Join Strategies and Cost Estimation in SQL Query_1
c)Sort-Merge Join (1 mark) Sorting PhaseSince we have a 502 page buffer, we have enough space to hold sorted sublists = 21 sublists < 502Cost to sort R = 2*( 1,000 + 1,000 ) = 4,000 I/OsCost to sort S = 2* ( 500 + 500 ) = 2000 I/OsMerging phaseSince b is the primary key for S, we do not have to worry about duplicates.Cost for merging = 1,000 + 500 = 1,500 I/OsTotal Cost = (4,000 + 2000 + 1,500) = 7,500 I/Osd)Hash Join (1 mark) Partitioning PhaseI/O cost = 2*1000(read and write) + 2*500(read and write) = 3,000 I/OsProbing PhaseNeed to check whether the buffer size is enough to hold all buckets. It should satisfy the followingconditionB > f M/(B − 1)Since 502 > 1.1 1, 000/(502 − 1) = 2.1, the buffer size is adequate.Therefore, I/O cost = 1000(read) + 500(read) = 1,500 I/OsTotal cost = 3,00 + 1,500 = 4,500 I/Ose)What would the lowest possible I/O cost be for joining A and B using any join algorithm and how much buffer space would be needed to achieve this cost? Explain briefly. (1 mark) Lowest possible I/O cost be 1,500 I/OsQuestion 2 (5 marks) Consider a relation with the following schema: Executives (id: integer, name:string, title:string, level: integer) The Executives relation consists of 100,000 tuples stored in disk pages. The relation is stored as simple heap file and each page stores 100 tuples. There are 10 distinct titles in the Executives hierarchy and 20 distinct levels ranging from 0-20. 2
Join Strategies and Cost Estimation in SQL Query_2

End of preview

Want to access all the pages? Upload your documents or become a member.