Joins are very common and very expensive in databases.
Equi joins are very common because of normalization and primary key foreign key relationship.
Assume,
R -> Smaller relation
S -> Bigger relation
Lets discuss different joining algorithms
Block Nested Join
Equi joins are very common because of normalization and primary key foreign key relationship.
Assume,
R -> Smaller relation
S -> Bigger relation
Lets discuss different joining algorithms
Block Nested Join
- Bring ‘b’ part of smaller relation into memory
- Bring each page of larger relation and scan ‘b’ and produce output
- Generally b = B – 2, 2 pages are left 1 for S and 1 for output tuple
- Once whole S is scanned then bring next ‘b’ pages of small relation.
- Ideal: if whole R can fit into memory then join can be done in 1pass. And IO will be |R| + |S|
- I/O cost
- |R / b| * S + R
- It has very good sequential behavior
Sort Merge Join
- Create sorted runs of R and S and then merge
- It works in 2 passes
- Pass1: If we use tournament sort then we can produce sorted runs of size 2B.
- Pass 2: Use N way merge to produce output
- # of runs after pass 1 = (R/2B) + (S/2B)
- as S > R so we can say at the max we will have (S/2B) + (S/2B) sorted runs after pass 1 i.e. S/B runs
- So if you want to finish joins in 2 pass then S/B <= B i.e. If B >= sqrt(S) = sqrt(large relation) then join can be done in 2 pass.
- I/O Cost
- 3|R + S|
- Output is sorted order so you can easily perform group by operation