Section notes for Week 5 (BST's!)

Section Notes for Week 5 (BST's!)

Story thus far

  1. Conversion of Java into C++
  2. C++ I/O
  3. C++ Preprocessor
  4. Pointers, local variables, and the stack
  5. new and delete
  6. Code Review
  7. More Example Code
  8. Reference Variables and Method invocation
  9. Const keyword
  10. Constructors, copy constructors, and destructors
  11. Operator Overloading and the string class
  12. Big 3
  13. Templates and the array class
  14. Discussion of the List ADT
  15. A look at an array list implementation
  16. A look at the Linked List implementation
  17. Brief Discussion of Doubly Linked Lists
  18. Iterators
  19. Function Objects
  20. Generic Programming

Today:

  1. Trees
  2. Binary Search Trees
  3. Tree Traversal
  4. Trees with Back Pointers
  5. Exam Review

Intro

So ok, we have these nice data structures to allow us to access data sequentially and insert in constant time (lists), and we have other nice data structures to allow us to access data randomly in constant time, but for them insertion is slow.. And even then, to find an element in these structures is slow. Wouldn't it be nice if we could get all this (constant find, constant access, constant insert, and sorted data) all in once nice data structure? And maybe with a cool name like The UltraStor??

Well as it turns out we can't. But we can come up with reasonable approximations of this ultimate data structure for the task at hand. And that's what this course is all about.

So first things first. Lets see what we can do about the fact that it takes O(n) time to find an element in an array. What did we learn in 125 about sorted arrays and finding stuff quickly?

That's right! Binary Search. Recall that binary search used the fact that our data was ordered to throw away half of it at a time, thus achieving a blazing O(lg n) search time (on average!). But insertion into this data structure was still ass slow, because it was an array, and if we try to use a list, we can't throw away half the elements in one "swoop" because we don't have constant element access...

What would be nice is if we could find a way to maintain this ability to throw away half the elements at each search step and still keep a relatively quick insertion time... What to do.. what to do..

So I'll let you guys think about that one while I introduce a seemingly unrelated topic, namely trees ;)

Trees

So trees are simple creatures. They are just like lists, in fact, very much like doubly linked lists internally.. Except instead of having a pointer for the head, they have a pointer for the root. And instead of each node having a pointer for the previous node and a pointer for the next node, they have a pointer to the "left" node, and the pointer to the "right" node.

Lets draw one on the board.

In fact, some trees can have more than two branches at each node. The amount of branching is called the tree width.

Ok, so that was a pointless diversion... Or maybe....

Binary Search Trees

DUN DUN DUN! DUN DUN! DUN!

So what if.. What if we inserted nodes less than the value of the current node to its left, and nodes greater than the current node to its right?

GOOD GOD MAN! We can then search in O(lg n)! (Is anyone really surprised?)

However, we don't ALWAYS search in O(lg n).. If the tree is misbalanced, searches can take as long as O(n).

  • Binary search tree definition header file.

    Find and Insert

    How is this done? Recursion! Yay! Lets look at some code.

  • Find() code in BST.

    Ok, so we can search in O(lg n) time.. Anyone want to guess on how long it takes to insert a node?

  • Example code for Insert.

    Question: Do these have to be recursive?

    Remove

    Remove is complicated. Sure it's easy if the node we want to delete is a leaf, but what if it's not? We have to have something there.. But what?

    The easy case is if the node only has one child. In that case, we just move the child into the current node's place, and we're set.

    But what if the current node has two children? Well, we still must maintain order in the tree. So we can't just pick the left or right node and move it up. We need to preserve the property that all nodes to the left will still be less than the new node and all to the right will be greater.

    This leaves us with two options. We either need to find the next greatest element in the tree (the In Order Successor of the current node), or the next least element in the tree (the In Order Predecessor of the current node).

    The IOS is obtained by going right (greater than the current node), and then going left until we hit NULL. This finds the node that is the smallest out of all nodes greater than the current node.

    The IOP is obtained similarly, by going left (less than the current node), and then going right, thus finding the node that is the greatest out of all nodes less than the current node.

    So.. Does it matter which we pick?

  • Code for the remove.

    What is the running time for this code? Best Case? Worst Case? Average Case?

    Quick look at the big three

    So our main concern with the big three is how do we delete a tree, and how do we copy a tree. These are actually quite simple if we define them recursively on the treeNodes as follows:

    Delete:

    1. If current is null, stop.
    2. Delete the left subtree
    3. Delete the right subtree
    4. Delete the current node.

    Copy:

    1. If the current node is null, stop
    2. Allocate a new node, copy the current node's value over
    3. Copy the left node and point our left to this copy
    4. Copy the right node and point our right to this copy
    5. Return the new node.

  • Code.

    What is the running time for this code?

    Tree Traversals

    Ok, so we have this whole friggin tree that we can insert, remove and search from fast. But how do we do something simple like print out all the elements in the tree?

    The answer: Recursion!

    Those of you unfortunate enough to have had me for CS125 will recall my recursion template: All recursive methods in that class had the following structure:

    1. Base case
    2. Work done forwards
    3. Recursive call
    4. Work done backwards

    And when we have two (or more) recursive calls:

    1. Base case
    2. Work done "pre-order"
    3. Recursive call
    4. Work done "in-order"
    5. Recursive call
    6. Work done "post-order"

    The question is, how does the work get done here?

    Unfortunately, I can't think of a general cookie cutter approach here.. But luckily for trees things are easy. When we are doing tree traversals, you pretty much always want to do work in no more than one of these three positions.

    Work done in the pre-order case works with the node BEFORE recursing left and right off of it. So if we were printing out a list of the nodes in a pre-order traversal, we are basically printing out the parent, and then it's left sub tree, and then its right sub tree.

    Work done in the in-order case for a BST works with a node AFTER hitting all of the nodes less than it, but BEFORE hitting all of the nodes greater than it. Hence a print statement here would print out the values of a tree "in order".

    The post order case is handy for deletion. We saw it in the tree destructor above. It also prints out a node only after traversing down both its left and right subtrees. This is a little bit harder to conceptualize.

    So which of these will enable us to easily reproduce a BST? How? Can we reproduce a regular binary tree in this same way? (Those of you playing at home will have to STFW..)

    Back Pointers & Iterators

    As it turns out, binary trees can actually have THREE pointers in each node. The third one can work just like a "prev" pointer in the doubly linked list.

    This is required to make iterators possible. (Well not really, you could include a stack in your iterator to keep track of things, but that gets clumsy). Why is this?

    In order to have these prev pointers behave properly for our iterators, we need to modify a couple methods of the BST.

    1. Copy constructor
    2. Insert
    3. Delete

    I'll draw the steps for these on the board, and then leave them as an exercise to you guys to implement them and an iterator for the BST.

    Review

    So everything covered in discussion sections prior to todays will be on the exam. This includes stacks and queues and generic programming, but not trees or binary search trees.

    What should you expect from the test? Well, something very similar to the fall and spring midterm 1's. The problems themselves will be different, but their general thrusts will all be the same.

    Things you should know.. Well of the stuff we haven't touched on in section, DEFINITELY read over Jason's notes on Stacks and Queues. Luckily those are pretty nicely done. Know those ADTs well. You are expected to have the queue and stack operations memorized, but the list and array we will give to you. Stacks in general are useful for the same types of things as recursion is useful for (reversing things, place holding, storing intermediate results to subproblems, etc).

    Also, make sure you have a firm grasp of pointers, and templates of pointers. If you didn't understand MP3 and 4 100%, you damn well better before the exam. Know the Big 3, and think about how they have to work on templates with pointers.

    Know the basics behind iterators and generic programming, function objects, etc. If you understand everything I showed you in last discussion section, you're in very good shape in that regard.

    Know how to use lists, and how to move their pointers and nodes around. Don't panic, remember this just boils down into carefully drawing out the diagrams and numbering each step.

    The material we've covered so far is pretty straight forward. The hard part is going to be keeping a level head. If a particular approach to a problem isn't working, leave the problem and come back to it later, and try a whole different approach. You should at least look at the hard problems first (ie the last couple) and if you can't come up with a solution or a sketch right away, go back to the beginning and let the problem "fester" in the back of your mind. Note that you should only have one problem in the back of your mind at a time while working on the easy ones. Your brain can probably handle two things at once, but any more than that and you will just start to panic.

    And in the case of emergency, always remember:

    DON'T PANIC!