Database Notes

Joins are very common and very expensive in databases.
Equi joins are very common because of normalization and primary key foreign key relationship.

Assume,
R -> Smaller relation
S -> Bigger relation

Lets discuss different joining algorithms
Block Nested Join

Bring ‘b’ part of smaller relation into memory
Bring each page of larger relation and scan ‘b’ and produce output
Generally b = B – 2, 2 pages are left 1 for S and 1 for output tuple
Once whole S is scanned then bring next ‘b’ pages of small relation.
Ideal: if whole R can fit into memory then join can be done in 1pass. And IO will be |R| + |S|
I/O cost

|R / b| * S + R

It has very good sequential behavior

Sort Merge Join

Create sorted runs of R and S and then merge
It works in 2 passes

Pass1: If we use tournament sort then we can produce sorted runs of size 2B.
Pass 2: Use N way merge to produce output

# of runs after pass 1 = (R/2B) + (S/2B)
as S > R so we can say at the max we will have (S/2B) + (S/2B) sorted runs after pass 1 i.e. S/B runs
So if you want to finish joins in 2 pass then S/B <= B i.e. If B >= sqrt(S) = sqrt(large relation) then join can be done in 2 pass.
I/O Cost

3|R + S|

Output is sorted order so you can easily perform group by operation

Database Notes

Friday, May 16, 2014

Join Algorithms (Join Processing in Database with Large Main Memories)