Disjoint Set : Worst case for union-by-rank and path compression

Tags:

From DATA STRUCTURE AND ALGORITHM ANALYSIS (ADDISON-WESLEY)

LEMMA 8.1.
When executing a sequence of Union instructions, a node of rank r must have at least 2^r decendants (including iteself).

pf)
(1) If r=0, obvious.
(2) Suppose the rank of T is r and T has the fewest decendants. Suppose last union was occurred between T1 and T2 and T1’s root is X and the X is root of T. If T1’s rank is r, then T1 should have less decendants than T because T2 will be merged to T1. This contradicts the presumption. So, T1’s rank should be less than or equal to r-1. T2’s rank <= T1'rank. Since T's rank is r, it follows that rank of T1 and T2 should be r-1 and each has decendants of 2^r-1 by induction hypothesis. So we can conclude that T has 2^r decendants. LEMMA 8.2.
The number of nodes of rank r is at most N/2^r

pf) obvious (because of the definition of rank)

LEMMA 8.3.
At any given point in the Union/Find algorithm, the ranks of the nodes on a path from the leaf to a root increase monotonically.

pf)
Obvious, if there is no path compression. If, after path compression, some node v is a decendant of w, then clearly v must have been a descendant of w when only Unions were considered. Hence the rank of v is less than the rank of w.

Amoritized analysis – Accounting gimmick.
1. If v is the root, or if the parent of v is the root, or if the parent of v is in a different rank group from v, then charge one unit under this rule. This deposits an American penny into the kitty.
2. Otherwise deposit a Canadian penny into the vertex.

LEMMA 8.4.
For any Find (v), the total numer of pennies deposited, either into the kitty or into a vertex, is exactly equal to the number of nodes on the path from v to the root.

pf)
obvious

LEMMA 8.5.
Over the entire algorithm, the total deposits of American pennies under rule 1 amount to M(G(N)+2).

pf)
Two penny is always deposited in root and its child. Only G(N) is charge by rule 1. Thus, during any one find, at most G(N)+2 is charged.

I think this is not true. Depending on rank group definition, rank 1 and 2 can be in another group. If this was the case, at most G(N)+1 is charged. Note that this lemma roughly calculate the cost of find operation. Since path-compressino is used, it is unfair to say that all find operations cost G(N).

LEMMA 8.6.
The number of vertices, V(g), in the rank group g>0 is at most N/2^F(g-1).

pf)
V(g) <= sigma(from F(g-1)+1 to F(g)) N/2^r <= sigma(from F(g-1)+1 to infinite) N/2^r <= N sigma(from F(g-1)+1 to infinite) 1/2^r <= N/(2^F(g-1)+1) sigma (from 0 to infinite) 1/2^s <= 2N/2^F(g-1)+1 <= N/2^F(g-1) LEMMA 8.7.
The maximum number of canadian pennies deposited to all vertices in rank group is at most NF(g)/2^F(g-1).

pf)
Each vertex in the rank group can receive at most F(g)-F(g-1) <= F(g) Canadian pennies while its parent stays in its rank group. (Image a situation that all other vertices are visited by find-set operation before the last vertex is visited. The last vertex can be remain until F(g)-F(g-1) vertices are visited.) This lemma considers the path compression. In LEMMA 8.5, however, we did not considered path compression to keep things simple. To supply with strict boundary of worst case, Candadian pennies are used.

LEMMA 8.8.
The total deposit under rule 2 is at most N sigma(from g=1 to G(N)) F(g)/2^F(g-1) Canadian pennies.

pf)
By summation of each term which can be elicited by LEMMA 8.7, we can get the result.

Thus, we have the deposits under rules 1 and 2. The total is M(G(N)+2) + N sigma(from g=1 to G(N)) F(g)/2^F(g-1).

We should minize this. Minimizing G(N) will be helpful. However, if G(N) is too small, then the F became too large. Obviously, F(0)=0, F(i)=2^F(i-1) is a good choice. (Since the F(g)/2^F(g-1) term). This gives G(N)=1+ceil(log*N).

THEOREM 8.1.
The running time of M Unions and Finds is O(Mlog*N). (i.e., approximately a constant.)

pf)
If we use the above definition of F, then American pennies paid is O(MG(N))=O(Mlog*N). The total number of Canadian pennies is NG(N) = O(Nlog*N). Since M=omega(N), the bound follows.

We can say this is almost a constant because log*N <= 4 for practical N. Note that each union operation costs constant time. So we can ignore the Union.

Definion of log*N can be found at the previous article.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *