Some applications involve grouping
distinct objects into a
collection of disjoint sets. Two important operations are then
finding which set a given object belongs to and uniting the two
sets.
A
disjoint set data structure maintains a collection
of disjoint
dynamic sets. Each set
is identified by a representative, which usually is a member in the
set.
1 Operations on Sets
Let
be an object. We wish to support the following operations.
- MAKE-SET(
) creates a new set whose only member is
pointed by
; Note that
is not in the other sets.
-
UNION(
) unites two dynamic sets containing objects
and
, say
and
, into a new set that
, assuming that
;
-
FIND-SET(
) returns a pointer to the representative of
the set containing
.
-
INSERT(
) inserts an object
to
, and returns
.
-
DELETE(
) deletes an object
from
, and returns
.
-
SPLIT(
) partitions the objects of
into two sets
and
such that
,
and
.
-
MINIMUM(
) returns the minimum object in
.
2 Applications of Disjoint-set Data Structures
Here we show two application examples.
- Connected components (CCs)
- Minimum Spanning Trees (MSTs)
2.1 Algorithm for Connected Components
CONNECTED-COMPONENTS
1 for each
2 do MAKE-SET
3 for every edge
4 do if FIND-SET
FIND-SET
5 then UNION
SAME-COMPONENTS
1 if FIND-SET
FIND-SET
2 then return TRUE
3 return FALSE
Figure 1: A graph with four connected components:
,
,
, and
.
| Edge processed | Collection of disjoint sets |
| initial sets
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
| |
|
|
|
|
|
|
| | |
|
| |
|
|
|
|
|
|
| | |
|
| |
| |
|
|
|
| | | |
|
| |
| |
|
|
|
| | | |
| | |
| |
|
|
|
| | | |
| | |
| |
|
Table 1: This table shows the state of the collection of disjoint
sets as each edge is processed. The processed edge is listed on the
left and the rest of the columns show the state of the collection.
3 The Disjoint Set Representation
3.1 The Linked-list Representation
A set can be represented by a linked list. In this representation,
each node has a pointer to the next node and a pointer to the first
node.
Figure 2: Linked-list representation. Each disjoint set can be
represented by a linked-list. Each node has a pointer to the
head node and a pointer to the next node.
Consider the following operation sequence:
MAKE-SET
,
MAKE-SET
,
,
MAKE-SET
,
MAKE-SET
,
UNION
,
UNION
,
,
UNION
.
What's the total time complexity of the above operations?
A weighted-union heuristic: Assume that the representative of
each set maintains the number of objects in the set, and always
merge the smaller list to the larger list, then
Theorem: Using the linked list representation of disjoint sets
and the weighted-union heuristic, a sequence of
M
AKE-S
ET, Union, and Find_Set operations,
of which are
M
AKE-S
ET, takes
time.
Hint: observe that for any
, after
's
representative pointer has been updated
times, the resulting set containing
must have at least
members.
4 Disjoint Set Forests
A faster implementation of disjoint sets is through the rooted
trees, with each node containing one member and each tree
representing one set. Each member in the tree has only one parent.
Figure 3: Rooted tree representation of disjoint sets. Each tree has
a "representative"
for the left tree and
for the
right tree.
4.1 The Heuristics for Disjoint Set Operations
- union by rank.
- path compression
The idea of
union by rank is to make the root of the tree with
fewer nodes point to the root of the tree with more nodes. Rather
than explicitly keeping track of the size of the subtree rooted at
each node, for each node we maintain a
rank that approximates
the logarithm of the subtree size and is also an upper bound on the
height of the node.
Time complexity:
, assuming that there are
union
operations.
Path compression is quite simple and very effective. We use
this approach during F
IND-S
ET operations to make each node
on the path point directly to the root. Path compression does not
change any ranks.
Figure 4: Path compression takes place during the
FIND-SET operation. This works by recursing from the
given input node up to the root of the tree, forming a
path. Then the root is returned and assigned as the
parent for each node in path. The parent of
after
FIND-SET
is
.
Time complexity:
if
and
otherwise, assume that there are
M
AKE-S
ET operations and
F
IND-S
ET operations.
MAKE-SET
1
2
FIND-SET
1 if
2 then
FIND-SET
3 return
UNION
1 LINK( FIND-SET
, FIND-SET
)
LINK
1 if
2 then
3 else
4 if
5 then
where
is the height of
in the tree. If both of the
above methods are used together, the time complexity is
.
The Rank properties
-
- for any tree root
,
(Link operation)
- for any integer
, there are at most
nodes of rank
- each node has rank at most
, assuming
there are at
objects involved.