Section notes for Week 6 (AVL's!)

Section Notes for Week 6 (AVL's!)

Story thus far

  1. Conversion of Java into C++
  2. C++ I/O
  3. C++ Preprocessor
  4. Pointers, local variables, and the stack
  5. new and delete
  6. Code Review
  7. More Example Code
  8. Reference Variables and Method invocation
  9. Const keyword
  10. Constructors, copy constructors, and destructors
  11. Operator Overloading and the string class
  12. Big 3
  13. Templates and the array class
  14. Discussion of the List ADT
  15. A look at an array list implementation
  16. A look at the Linked List implementation
  17. Brief Discussion of Doubly Linked Lists
  18. Iterators
  19. Function Objects
  20. >Generic Programming
  21. Trees
  22. Binary Search Trees
  23. Tree Traversal
  24. Trees with Back Pointers
  25. Exam Review

Today:

  1. Motivation
  2. Cases
  3. Rotations
  4. The Big Picture
  5. Iterators in AVL

Motivation

Ok, so we've seen these BST critters which allow us to find, insert, and remove in O(lg n) time on average.. But if we do something stupid (like insert the elements in order), we end up with O(n) running time. How do we avoid this?

Solution One: Don't do that. We could always try to build our BST's using a known balanced post-order traversal. Clearly this is not going to work for the vast majority of cases.

Solution Two: What if after each insertion/deletion we did some work to try to balance out the tree to prevent the O(n) case from ever developing?

This seems like a plan.. But what do we need to keep track of?

Well, we're concerned with balancing here. So for starters how about we keep track of the heights of each node. We can update this properly on inserts and removes right after the recursive calls return. Then we have:

Blance = height(right) - height(left)

So this means that the balance will be negative if the left subtree is higher, and positive if the right subtree is higher.

So how do we use this information?

Cases

First, let's consider the different possible ways that a tree can become unbalanced during insertion and removal. First off, how bad must a tree get before we consider it "unbalanced"?

If we have a balance of +1 or -1, is there really anything we can do about it? (Short of removing a node)?

So we are only concerned with balances of +2 or -2. How many ways can we have these mis-balances? As it turns out, there are 2 ways each. Let's examine these on the ol' dry erase computer (2 of them are on pgs 224 and 225 of the green book for you bums who don't go to section).

So, how do we fix each of these?

Rotations

A rotation is an operation on a tree node where you make the current node and one of it's children essentially switch places while still preserving the order of the tree.

There are two classes of rotations: left rotations, and right rotations. Left rotations handle the case where you have too much height on the right side (they rotate the tree left), and right rotations handle the case where you have too much height on the left side.

  1. Example of a right rotation and a left rotation on the dry erase computer
  2. Same example again with parent pointers
  3. Example of a right and left rotation in the AVL tree code (Go through right, rewrite left)

So if you sit and think about this for a second while staring at the four cases on the dry erase computer, you can see that two of the misbalanced cases will be handled quite easily by these rotations, where as the two others don't work out so well.

The problem comes in when the inner child of the heaver branch has the extra node. This causes a single rotation to be useless, because it merely shifts the misbalance from one side to the other (observe). How do we fix this?

The solution is to get this inner child over to the outside first, and then perform the proper single rotation. How do we do this?

Another rotation! (In the opposite direction).

So if we have a +2 balance on a node (the right side is heavy), and the right side's LEFT child is heavy (ie the right side has a -1 balance), then we first do a single right rotation on the right side to get the right side's inner left child weight over to the right side's outer right child, and then we perform a single left rotation on the main node to correct the +2 balance by transfering it over to the left side.

  • Woah, that was a mouthfull.. How about an example?

    Alternatively, you can think in terms of the signs of the balances of the nodes instead of this inner, outer relationship. If the sign of the balance of the child is different than the sign of the balance of the parent, then a double rotation is needed, else single.

    1. Example with parent pointers
    2. Example of the double rotation code
    3. Example of the coded logic to check for which rotation is required

    Question: How do rotations affect the height of the tree?

    The Big Picture

    Ok, so this AVL tree is a big crazy mess. If I was a motivated individual, I'd really really want to clean this thing up a bit. But since this is clearly not the case, lets just peruse it and try to figure it out as is.

  • Overview of the AVL .h file

    Some things to notice:

    Find

    There's not much too special about find, other than the fact that it takes a keytype and returns an iterator into the tree. Otherwise it works the exact same.

  • Code.

    Insert

    So insert inserts the node into the tree as per normal, but we have to pass in the parent at each step along the way, and handle this in our base case.

    However, upon returning from the recursive call, we have to consider the balancing that needs to be done. Assuming the height of the children is properly updated already (which it is, as we'll see in a moment), we need to check the balance, and if imbalanced, we need to check which of the 2 subcases we hit, and perform the appropriate rotation.

    If we rotate, do we have to update the height of the current node? Why/Why not?

    So if NO rotation is done, we DO have to update the height at the current node, because the subtree may have gone from balance 0 to a balance +-1, which requires a height update at this node.

  • Code.

    Remove

    Ok, first off, the general pattern for Remove is going to be EXACTLY the same as in the BST case. Recall how that works. This is important to understand, because this code is going to get VERY scary :)

    The extra code in this implementation of remove exists for two reasons:

    1. To support AVL balancing operations
    2. To support iterators and not let the remove from an internal node case screw up what value an iterator refers to.

    The AVL balancing operations are actually straight forward compared to the iterator code. The AVL balancing code needs to check the balance after each recursive remove call and adjust the height just like we saw in the insert case.

    The iterator support code comes into play when we need to remove a node with two children and replace it with the inorder sucessor. Why will this break iteration?

    So the basic plan of the support code is to fetch the inorder successor as usual, but instead of copying its value into the soon-to-be deleted node, we want to actually move that node up into the deleted node's place. We ALSO want to recursively adjust the balance on each step. So in order to do this, Jason decided to simply put a new node into the IOS's place, and then call remove to remove that node. (He could have saved himself an allocation and deletion by using TN instead of allocating a new node, but oh well).

  • So brace yourselves, we're going in.

    Usage

  • A glance at the main.C file, or perhaps some interactive usage of avltree

    Iterators

    So our iterators have to perform an inorder traversal of the tree, but without the benefit of recursion. This is made possible through through the parent pointers that caused us so much trouble in Insert and Remove.

    begin()

    Begin is defined as returning an iterator to the leftmost node in the tree (headerNode->left). Makes sense.

    end()

    End is a bit more strange.. Again, we have to provide for this "one more than the last" idea, and this is done by providing a iterator to headerNode.

    operator++

    Operator++ is going to want to find the inorder sucessor of the current node, and return it. (Remember when we talked about IOS and IOP last week). There are basically two cases we need to consider here:

    1. The node has a right child
      If the node has a right child, the IOS is defined in the usual way: Go right, then all the way left.
    2. The node doesn't have a right child.
      If this is the case, then the IOS is the next node that would print in an in-order traversal. So if we are on someone's left child, we just go up one. If we are someone's right child, we go up until the current node is someone's left child, and then go up one more.

  • Code.

    operator--

    Operator-- will basically be the mirror image of the above, as we want to find the inorder predecessor of the current node. Again, the cases:

    1. The node has a left child.
      The inorder predecessor is defined in the usual way: Go left, then all the way right
    2. The node doesn't have a left child
      In this case, if we are someone's right child, we just go up one. Otherwise, we want to go up until we are the right child of some other node, and then go up one more.