Section notes for Week 9: Heaps and Disjoint Sets

Section Notes for Week 9: Heaps and Disjoint sets

Story thus far

  1. Conversion of Java into C++
  2. C++ I/O
  3. C++ Preprocessor
  4. Pointers, local variables, and the stack
  5. new and delete
  6. Code Review
  7. More Example Code
  8. Reference Variables and Method invocation
  9. Const keyword
  10. Constructors, copy constructors, and destructors
  11. Operator Overloading and the string class
  12. Big 3
  13. Templates and the array class
  14. Discussion of the List ADT
  15. A look at an array list implementation
  16. A look at the Linked List implementation
  17. Brief Discussion of Doubly Linked Lists
  18. Iterators
  19. Function Objects
  20. Generic Programming
  21. Trees
  22. Binary Search Trees
  23. Tree Traversal
  24. Trees with Back Pointers
  25. Exam Review
  26. Motivation
  27. Cases
  28. Rotations
  29. The Big Picture
  30. Iterators in AVL

    FIXME: Broken

Today:

  1. Priority Queues
  2. Disjoint Sets

Priority Queues

Right. So we have a queue that is basically a line at the supermarket. But what if we wanted a queue that gave priority to more important items. Like print jobs, or processes on a computer. Then we need a priority queue, which works just like a regular queue, except it that instead of returning the longest waiting element on a "pop", we return the highest priority element.

The priority queue implements the following three interface functions:

  1. FindMin
    return min without removing
  2. DeleteMin
    remove and return min
  3. Insert
    insert an element

Jason went through the existing data structures that could optimize these ops. Which one seemed like the best choice before the actual solution was presented?

So as it turns out, we can do better using a slightly different data structure known as a partially-ordered complete tree, where every child is greater than it's parent. pg 300 of the book has pictures of such a data structure.

However, the fact that the tree is complete (every level is filled in before a new one is created) allows us to implement this as an array with the following rules

  1. LeftChildIndex(ParentIndex) = 2*ParentIndex
  2. RightChildIndex(ParentIndex) = 2*ParentIndex + 1
  3. ParentIndex = ChildIndex/2

Note that these rules are based on the assumption that your array starts at 1. This is fine, even for C++ arrays, because we can store a dummy value that is less than everything in the tree at index 0. This has the added benefit of saving us from providing a special case for the root node.

Ok, so lets have a look at the .h file, as per usual. Not much of interest, so on to the algorithms.

FindMin

Duh.

DeleteMin

So deleteMin in code is going to work a little different than on paper.

Recall that the intuitive understanding is that we place the last element in the first slot in the array, and then we swap it with the smaller of the two children, and continue down in this fashion until we hit the end, or it's greater than both children.

Well the code does the optimization Jason discussed in lecture. It's merely going to shift the smallest child of i into i's place until we hit the bottom of the tree, or the last element is no longer larger than the two children of i (this can happen before the end). At this point, we just place the last element into the hole or "bubble" that we have thus shifted down.

  • So lets trace through this code.

    Insert

    Recall that insert works the opposite way. We put the element at the end of the heap, and then compare it to its parent. If it's less than the parent, we swap, and move up. Otherwise we stop.

    Again, the actual code is going to avoid all these swaps by simply moving the parents of i down that are greater than the new value. Once we get to the root, or a position where the parent of i is not greater than the new value, we stop, and place the new value into this index i (the hole or "bubble" that we have been moving up).

  • Code.

    Build_Heap

    Ok, so this is brand new, so pay attention. What Build_Heap does is it creates a heap from an array. Here's a pseudocode sketch:

    Questions:

    1. Why can we start at size/2?
    2. Can anyone guess the running time of this dealy?
    3. Buh?? It runs in O(n)?? What is the intuition behind this?
  • Code.

    Disjoint Sets

    So disjoint sets are used to model equivalence relations. What the hell is an equivalence relation?

    The 3 properties of an equivalence relation are...

    So every member in a set is then related to every other member of the set, and if we add a relation between any two members of separate sets, these sets are then merged (by transitivity and symmetry).

    Thus, the main operations on disjoint sets are Union (which adds an equivalence relation between two elements, and thus joins two sets) and Find (which determines the equivalence class of an element).

    Ok, so as Jason discussed in class, the best way to implement these beasts is with something called an up-tree, where sets are named by the root node of the tree. (See pg 309). Well, as it turns out, we can also flatten out this up-tree into an up-tree array.

    So once we have this notion of an uptree, implementing find and union is pretty straight forward - just simply traversing up the tree until you get to the root, or adding a new parent to the root.

  • Quick overview of array implementation

    The interesting stuff comes in when you consider various optimizations to keep the average height of an up-tree low.

    Union by Height

    Again, very simple, upon performing the union, we just attach the set with the smaller height to the set with the larger height.

  • Trace through

    Union By Size

    This is very simple. When we perform the union, we simply attach the set with smaller size to the set with larger size.

  • Trace through code

    So given intelligent unions, what is the running of Find()?

    Path compressing Find

    Ok, so if we have a path compressing find, which union type do we want to use? What is the running time?

    Exam Mini-review

    This test is going to be picture/algorithm based for the most part. Less code to look at than the previous exam, though there will be a couple of coding questions, and a debugging question. Make sure you know your tree balancing algorithms.

    Also, I swear to you that it is easier than MT2 from the summer. Summer MT1 vs this MT1 may have been a close call, but there was less coding on ours, so I thought it would be easier. For sure, this exam is shorter and simpler than Summer MT2. Also, this semester, Sets are not considered part of the exam, so don't worry about problem 4 on summer MT2.

    Do think about how you can do various tree operations with the different recursive tree traversals.

    If you haven't been to lecture, be SURE to cover Sparse Arrays, as we have not gone over them in section!

    Be sure to keep writing the whole time, and remember the basic principle of rational thought. I want to see lots of pictures on your exams! If you don't have a picture, you better be damn well sure the code is perfect, or you will get plenty of scornful remarks all over your exam.