So ok, we have these nice data structures to allow us to access data sequentially and insert in constant time (lists), and we have other nice data structures to allow us to access data randomly in constant time, but for them insertion is slow.. And even then, to find an element in these structures is slow. Wouldn't it be nice if we could get all this (constant find, constant access, constant insert, and sorted data) all in once nice data structure? And maybe with a cool name like The UltraStor??
Well as it turns out we can't. But we can come up with reasonable approximations of this ultimate data structure for the task at hand. And that's what this course is all about.
So first things first. Lets see what we can do about the fact that it takes O(n) time to find an element in an array. What did we learn in 125 about sorted arrays and finding stuff quickly?
That's right! Binary Search. Recall that binary search used the fact that our data was ordered to throw away half of it at a time, thus achieving a blazing O(lg n) search time (on average!). But insertion into this data structure was still ass slow, because it was an array, and if we try to use a list, we can't throw away half the elements in one "swoop" because we don't have constant element access...
What would be nice is if we could find a way to maintain this ability to throw away half the elements at each search step and still keep a relatively quick insertion time... What to do.. what to do..
So I'll let you guys think about that one while I introduce a seemingly
unrelated topic, namely trees ;)
So trees are simple creatures. They are just like lists, in fact, very much
like doubly linked lists internally.. Except instead of having a pointer for
the head, they have a pointer for the root. And instead of each node having a
pointer for the previous node and a pointer for the next node, they have a pointer to the "left" node, and the pointer to the "right" node.
Lets draw one on the board.
In fact, some trees can have more than two branches at each node. The
amount of branching is called the tree width.
Ok, so that was a pointless diversion... Or maybe....
DUN DUN DUN! DUN DUN! DUN!
So what if.. What if we inserted nodes less than the value of the current
node to its left, and nodes greater than the current node to its right?
GOOD GOD MAN! We can then search in O(lg n)! (Is anyone really surprised?)
However, we don't ALWAYS search in O(lg n).. If the tree is misbalanced,
searches can take as long as O(n).
How is this done? Recursion! Yay! Lets look at some code.
Ok, so we can search in O(lg n) time.. Anyone want to guess on how long it
takes to insert a node?
Question: Do these have to be recursive?
Remove is complicated. Sure it's easy if the node we want to delete is a
leaf, but what if it's not? We have to have something there.. But what?
The easy case is if the node only has one child. In that case, we just move
the child into the current node's place, and we're set.
But what if the current node has two children? Well, we still must maintain
order in the tree. So we can't just pick the left or right node and move it
up. We need to preserve the property that all nodes to the left will still be
less than the new node and all to the right will be greater.
This leaves us with two options. We either need to find the next greatest
element in the tree (the In Order Successor of the current node), or the
next least element in the tree (the In Order Predecessor of the current
node).
The IOS is obtained by going right (greater than the current node), and
then going left until we hit NULL. This finds the node that is the smallest
out of all nodes greater than the current node.
The IOP is obtained similarly, by going left (less than the current node),
and then going right, thus finding the node that is the greatest out of all
nodes less than the current node.
So.. Does it matter which we pick?
What is the running time for this code? Best Case? Worst Case? Average
Case?
So our main concern with the big three is how do we delete a tree, and how
do we copy a tree. These are actually quite simple if we define them
recursively on the treeNodes as follows:
Delete:
Copy:
What is the running time for this code?
Ok, so we have this whole friggin tree that we can insert, remove and
search from fast. But how do we do something simple like print out all the
elements in the tree?
The answer: Recursion!
Those of you unfortunate enough to have had me for CS125 will recall my
recursion template: All recursive methods in that class had the following
structure:
And when we have two (or more) recursive calls:
The question is, how does the work get done here?
Unfortunately, I can't think of a general cookie cutter approach here.. But
luckily for trees things are easy. When we are doing tree traversals, you
pretty much always want to do work in no more than one of these three
positions.
Work done in the pre-order case works with the node BEFORE recursing left
and right off of it. So if we were printing out a list of the nodes in a
pre-order traversal, we are basically printing out the parent, and then it's
left sub tree, and then its right sub tree.
Work done in the in-order case for a BST works with a node AFTER hitting
all of the nodes less than it, but BEFORE hitting all of the nodes greater
than it. Hence a print statement here would print out the values of a tree "in
order".
The post order case is handy for deletion. We saw it in the tree destructor
above. It also prints out a node only after traversing down both its left and
right subtrees. This is a little bit harder to conceptualize.
So which of these will enable us to easily reproduce a BST? How? Can we
reproduce a regular binary tree in this same way? (Those of you playing at
home will have to STFW..)
As it turns out, binary trees can actually have THREE pointers in each
node. The third one can work just like a "prev" pointer in the doubly linked
list.
This is required to make iterators possible. (Well not really, you could
include a stack in your iterator to keep track of things, but that gets
clumsy). Why is this?
In order to have these prev pointers behave properly for our iterators, we
need to modify a couple methods of the BST.
I'll draw the steps for these on the board, and then leave them as an
exercise to you guys to implement them and an iterator for the BST.
So everything covered in discussion sections prior to todays will be on the
exam. This includes stacks and queues and generic programming, but not trees
or binary search trees.
What should you expect from the test? Well, something very similar to the
fall and spring
midterm 1's. The problems themselves will be different, but their general
thrusts will all be the same.
Things you should know.. Well of the stuff we haven't touched on in
section, DEFINITELY read over Jason's
notes
on Stacks and Queues. Luckily those are pretty nicely done. Know those
ADTs well. You are expected to have the queue and stack operations memorized,
but the list and array we will give to you. Stacks in general are useful for
the same types of things as recursion is useful for (reversing things, place
holding, storing intermediate results to subproblems, etc).
Also, make sure you have a firm grasp of pointers, and templates of
pointers. If you didn't understand MP3 and 4 100%, you damn well better before
the exam. Know the Big 3, and think about how they have to work on templates
with pointers.
Know the basics behind iterators and generic programming, function objects,
etc. If you understand everything I showed you in last discussion section,
you're in very good shape in that regard.
Know how to use lists, and how to move their pointers and nodes around.
Don't panic, remember this just boils down into carefully drawing out the
diagrams and numbering each step.
The material we've covered so far is pretty straight forward. The hard
part is going to be keeping a level head. If a particular approach to a
problem isn't working, leave the problem and come back to it later, and try a
whole different approach. You should at least look at the hard problems first
(ie the last couple) and if you can't come up with a solution or a sketch
right away, go back to the beginning and let the problem "fester" in the back of
your mind. Note that you should only have one problem in the back of your mind
at a time while working on the easy ones. Your brain can probably handle two
things at once, but any more than that and you will just start to panic.
And in the case of emergency, always remember:
Trees
Binary Search Trees
Find and Insert
Remove
Quick look at the big three
Tree Traversals
Back Pointers & Iterators
Review
DON'T PANIC!