Database Systems: Query Optimization, Indexing, and Join Algorithms

Verified

Added on  2020/06/03

|5
|1467
|162
Homework Assignment
AI Summary
This assignment solution addresses database query optimization techniques, focusing on join algorithms and indexing. Question 1 analyzes the cost of different join strategies (Nested Loops, Block-oriented Nested Loops, Sort-Merge, and Hash Join) for joining two relations, calculating disk I/O costs for each and determining the optimal buffer size. Question 2 explores query plan costs with various index types (clustered and unclustered B+ tree and Hash indexes) for a given SQL query, calculating result sizes and costs based on different indexing scenarios. Question 3 extends this analysis to a more complex schema involving employees, departments, and finances, calculating result sizes and costs for different query plans, including Nested Loops Join, considering index usage and system statistics to evaluate the efficiency of various query execution strategies. The solution provides detailed calculations and explanations for each scenario, demonstrating a comprehensive understanding of database optimization principles.
Document Page
Question 1 (5 marks)
Consider two relations A and B. A is of size 10,000 disk pages, and B is of size 1,000 pages.
Consider the following SQL statement:
SELECT *
FROM A, B
WHERE A.a = B.a;
We wish to evaluate an equijoin between A and B, with an equality condition A.a = B.a. There are
502 buffer pages available for this operation. Both relations are stored as simple heap files. Neither
relation has any indexes built on it.
Consider alternative join strategies described below and calculate the cost of each alternative.
Evaluate the algorithms using the number of disk I/O's as the cost. For each strategy, provide the
formulae you use to calculate your cost estimates.
a) Page-oriented Nested Loops Join. Consider A as the outer relation. (1 mark)
There are 2 options
1. A is outer, B is inner relation
2. B is outer, A is inner relation
Buffer utilization
Output: 1 page
Inner relation: 1 page
Outer relation: 50 pages
1. A is outer, B is inner relation
Cost to read outer = 10,000 I/Os
Join = 1,000 I/Os
Total cost = (10,000 + 1,000) = 11,000 I/Os
b) Block-oriented Nested Loops Join. Consider A as the outer relation. (1 mark)
2. B is outer, B is inner relation
Cost to read outer = 500 I/Os
Join = 1,000 I/Os
Total cost = (500 + 1,000) = 1500 I/Os
Best case is the first option
Therefore cost is 1500 I/Os
To minimize the cost of the join, the buffer size should be 202 pages so that we can
store entire smaller relation in the buffer.
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
c) Sort-Merge Join (1 mark)
Sorting Phase
Since we have a 502 page buffer, we have enough space to hold
sorted sublists = 21 sublists < 502
Cost to sort R = 2*( 1,000 + 1,000 ) = 4,000 I/Os
Cost to sort S = 2* ( 500 + 500 ) = 2000 I/Os
Merging phase
Since b is the primary key for S, we do not have to worry about duplicates.
Cost for merging = 1,000 + 500 = 1,500 I/Os
Total Cost = (4,000 + 2000 + 1,500) = 7,500 I/Os
d) Hash Join (1 mark)
Partitioning Phase
I/O cost = 2*1000(read and write) + 2*500(read and write) = 3,000 I/Os
Probing Phase
Need to check whether the buffer size is enough to hold all buckets. It should satisfy
the following
condition
B > f M/(B − 1)
Since 502 > 1.1 1, 000/(502 − 1) = 2.1, the buffer size is adequate.
Therefore, I/O cost = 1000(read) + 500(read) = 1,500 I/Os
Total cost = 3,00 + 1,500 = 4,500 I/Os
e) What would the lowest possible I/O cost be for joining A and B using any join algorithm and
how much buffer space would be needed to achieve this cost? Explain briefly. (1 mark)
Lowest possible I/O cost be 1,500 I/Os
Question 2 (5 marks)
Consider a relation with the following schema:
Executives (id: integer, name:string, title:string, level: integer)
The Executives relation consists of 100,000 tuples stored in disk pages. The relation is stored as
simple heap file and each page stores 100 tuples. There are 10 distinct titles in the Executives
hierarchy and 20 distinct levels ranging from 0-20. 2
Document Page
Suppose that the following SQL query is executed frequently using the given relation:
SELECT E.ename
FROM Executives
WHERE E.title = “CEO” and E.level > 15;
Your job is to analyze the query plans given below and estimate the cost of the best plan utilizing the
information given about different indexes in each part.
a) Compute the estimated result size and the reduction factor (selectivity) of this query (1 mark)
Reduction factor (RF) = 1/20 = 0.05
Result size = product of cardinalities of involved relations (FROM) * product
of reduction factors (WHERE).
=100000*0.05 = 5000
b) Compute the estimated cost of the best plan assuming that a clustered B+ tree index on (title,
level) is (the only index) available. Suppose there are 200 index pages, and the index uses
Alternative 2. Discuss and calculate alternative plans. (1 mark)
Index only scan using the clustered B+ tree index on title.
Cost = 2 (finding the first title) + 100,000*0.1*0.2 (getting 10% of data pages) +
200 * 0.1 (scanning the index) = 2022.
Index only scan using the unclustered B+ tree index on level.
Cost = 2 (finding the level) + 100,000*0.1*0.1 (getting 10% of data pages) +
200 * 0.1 (scanning the index) = 1022.
c) Compute the estimated cost of the best plan assuming that an unclustered B+ tree index on
(level) is (the only index) available. Suppose there are 200 index pages, and the index uses
Alternative 2. Discuss and calculate alternative plans. (1 mark)
Index only scan using the unclustered B+ tree index on Level.
Cost = 2 (lookup) + 100,000*0.1*0.2 =2002.
d) Compute the estimated cost of the best plan assuming that an unclustered Hash index on
(title) is (the only index) available. The index uses Alternative 2. Discuss and calculate
alternative plans. (1 mark)
Unclustered Hash index on (title).
Cost = 1.2 (lookup) + 1 = 2 or 3.
e) Compute the estimated cost of the best plan assuming that an unclustered Hash index on
(level) is (the only index) available. The index uses Alternative 2. Discuss and calculate
alternative plans. (1 mark)
Unclustered Hash index on (level)
Cost = 1.1 (lookup) + 1 = 2 or 3.
Document Page
Question 3 (10 marks)
Consider the following relational schema and SQL query. The schema captures information about
employees, departments, and company finances (organized on a per department basis).
Emp(eid: integer, did: integer, sal: integer, hobby: char(20))
Dept(did: integer, dname: char(20), floor: integer, phone: char(10))
Finance(did: integer, budget: real, sales: real, expenses: real)
Consider the following query:
SELECT D.dname, F.budget
FROM Emp E, Dept D, Finance F
WHERE E.did=D.did AND D.did=F.did
AND E.sal ≥ 59000 AND E.hobby = ‘yodeling’; 3
The system’s statistics indicate that employee salaries range from 10,000 to 60,000, and employees
enjoy 200 different hobbies. There are a total of 50,000 employees and 5,000 departments (each
with corresponding financial record in the Finance relation) in the database. Each relation fits 100
tuples in a page. Suppose there exists a clustered B+ tree index on (Emp.did) of size 50 pages.
a) Compute the estimated result size and the reduction factors (selectivity) of this query (2
marks)
Reduction factor (RF) = 1/50 = 0.02
Result size = product of cardinalities of involved relations (FROM) * product of
reduction factors (WHERE).
=60000*0.02 = 1200
tabler-icon-diamond-filled.svg

Secure Best Marks with AI Grader

Need help grading? Try our AI Grader for instant feedback on your assignments.
Document Page
b) Compute the cost of the plans shown below. Assume that sorting of any relation (if required)
can be done in 2 passes: 1st pass to produce sorted runs and 2nd pass to merge runs.
Similarly hash join can be done in 2 passes: 1st pass to produce partitions, 2nd pass to join
corresponding partitions. NLJ is a Page-oriented Nested Loops Join. Assume that did is the
candidate key, and that 100 tuples of a resulting join between Emp and Dept fit in a page.
Similarly, 100 tuples of a resulting join between Finance and Dept fit in a page. (8 marks, 2
marks per plan)
Cost of the plans
chevron_up_icon
1 out of 5
circle_padding
hide_on_mobile
zoom_out_icon
logo.png

Your All-in-One AI-Powered Toolkit for Academic Success.

Available 24*7 on WhatsApp / Email

[object Object]