Section notes for Week 4 (Iterators & Function Objects)

Section Notes for Week 4 (Iterators & Function Objects)

Story thus far

  1. Conversion of Java into C++
  2. C++ I/O
  3. C++ Preprocessor
  4. Pointers, local variables, and the stack
  5. new and delete
  6. Code Review
  7. More Example Code
  8. Reference Variables and Method invocation
  9. Const keyword
  10. Constructors, copy constructors, and destructors
  11. Operator Overloading and the string class
  12. Big 3
  13. Templates and the array class
  14. Discussion of the List ADT
  15. A look at an array list implementation
  16. A look at the Linked List implementation
  17. Brief Discussion of Doubly Linked Lists

Today:

  1. Iterators
  2. Function Objects
  3. Generic Programming

Iterators

Ok, iterators are a simple idea, but they have a a lot of code behind them. It's not complicated code conceptually, but full understanding requires a mastery of C++.

Luckily, Jason has some simplified iterators for a watered down implementation of <vector> and <list> from the Standard Template Library. If you are really interested in learning C++ in its entirety - and you should be - I highly suggest supplementing this material with either a perusal of <iterator.h> and/or (preferably and) a sllooow read of Chapters 13, 18.4-18.5, and 19.1-19.4 of Bjarne Stroustrup's The C++ Programming Language. A quick perusal of the STL index couldn't hurt either. This material is optional, but wading through it will give you a good solid understanding of C++, as well as a good deal of respect for its power, speed, and flexibility.

So anyways, backing up.. First, let's discuss the basic motivation behind iterators. Remember when we talked about encapsulation and abstraction in the CS125 days? Well iterators are just C++ taking this idea and applying it to pointers. If you think of an iterator as a kind a "pointer-object" that simply overloads all the operators that are valid on pointers and implements them in terms of the member data of the class, you wouldn't be too far off.

Ok what does it mean to "iterate"? Basically in high-level pie in the sky abstract terms, iterators define an iteration order over an abstract data type. They allow you to hit each element of a data type in a well defined order, and provide certain basic pointer-like operations on the data held by the data type.

What iterators require

So to do this, iterators require access to the internal member variables of a class that they iterate over. This is accomplished in a somewhat convoluted way. If you take a glance at the iterator code for his vector class for example, you should notice the following:

  1. All the old operators and insertion are gone!
    This functionality is now provided by iterators.
  2. A new keyword! typedef
    typedef allows you to declare a new type as an alias for another more complicated one. It works exactly like a variable declaration. All overloaded operators make use of these types instead of using the template type of the class. There are reasons for this, which we will mention briefly later.
  3. Another new keyword! (sort of) friend
    "Friends can touch your private parts!"
    A friend of a class is allowed access to the private member data of that class. So in this case, vector has access to the private member data of its iterator as well as, and more importantly, access to its constructor.

This last one is especially important. It is how the iterator has access to the private member data of vector. Instead of the iterator being listed as a friend of vector, vector is listed as a friend of iterator and when someone requests a iterator for vector, vector sends its member data to its iterator via the iterators private constructor. Pretty slick, eh?

This is done for two reasons: safety and performace. If the iterator contained a pointer to an object, its easier for the iterator to accidentally change the state of the object it is iterating over. This also saves us on a pointer dereference each time we want to use the iterator to access member data.

Types of iterators

So the idea behind iterators is to provide abstraction for a pointer. However, abstraction is a bad idea if a user is tricked by it into thinking that a very slow operation on a class is fast. For example, it would be a bad idea for a user to believe he can use an iterator to do listIter--; on a singly linked list, and have it take the same amount of time as listIter++;

Because of this, iterators are divided into types according to what operations they can do in constant time.

Types of iterators and their operations, according to the STL:
Category: output input forward bidirectional random-access
Read: something = *iter; something = *iter; something = *iter; something = *iter;
Access: iter->member; iter->member; iter->member; iter->member, iter[index];
Write: *iter = something; *iter = seomthing; *iter = seomthing; *iter = seomthing;
Iteration: ++ ++ ++ ++, -- ++, +, -, +=, -=
Comparison: ==, != ==, != ==, != ==, !=, >, <, >=, <=

A proper representation of this table would actually be three dimensional. There is another one just like it for const iterators. Those do not allow you to modify the member data they iterate over (so the whole "Write" line is effectively gone for them, and the -> operator will return a const reference). On top of this, there are reverse iterators which are NOT the opposite of forward iterators, but instead simply switch the meaning of ++ and -- wherever they occur.

Jason simplifies things down from this table into just an iterator, a reverse iterator, and a const version of those two. His iterator is in fact a bidirectional iterator by Bjarne and the STL's classification.

Iterator Implementation

So iterators iterate over stuff. That's cool. But when do they stop? Well, as it turns out, every class that implements an iterator has some form of "end marker" in its data. This can be NULL, but more often classes define a special end marker of some sort to stop on. For example in the vector class our end marker is the index one past the last. In the list class (which is double linked and circular) we have an empty node at the beginning of the list, and the first iterator gives you the element one past it, and will cycle around back to the beginning to stop.

Classes that have iterators must implement the following functions:

  1. begin()/begin() const
    Gives you an iterator starting at the first element, const or not depending on object type.
  2. end()/end() const
    Gives you an iterator for one element past the last. This is how you stop iteration, by comparing your iterator to the object.end() method to see if you have iterated over every element in the list.
  3. rbegin()/rbegin() const and rend()/rend() const
    Returns reverse iterators that iterate backwards over your elements, and the end marker for those iterators, respectively.
Note about iterator implementation: By now it should become clear to you that there is nothing in base C++ that defines an "iterator". Iterators are pure abstractions invented by people who knew C++ too well. This means that you can implement an iterator completely wrong, name all the class support functions wrong, make slow sloppy implementations of the overloaded operators and C++ will not give a damn. To try to prevent this, the STL defines some conventions in <iterator>, namely a base class and some typedefs all iterators should create. Notice again that Jason uses little to none of this support and still defines several iterators. This is fine, but it makes his code slightly unintuitive for people who make full use of the STL. For more information, see sections 19.2.2 and 19.2.3 in Bjarne's book (those two will require you to have read and understood chapter 13, which is long and can get convoluted).

So now it's time for some code that makes use of these mofos:

Function Objects

Function objects are really really neat toys. Basically, they are classes that have operator() overloaded so that you can use them just as functions, but with state and member data!

Why use function objects? Templates, my good man, templates! Function objects allow us to specify fully templated algorithms. We can use templates to have an abstract comparison function.

Moreover, these beasts can be deceptively efficient. Oftentimes more efficient than a regular function used in a similar way. "BUH? How can an OBJECT be more efficient than a FUNCTION?" you say? The answer is inlining! (Recall inlining is where you take the code of a function/class and place it "in line" wherever you have a call or use of that function/class).

A function has to set up the stack and arguments, especially if it is defined in another .cc file (which precludes inlining). A function object can be a class defined in a .h file, and the compiler can inline the operator(), as well as any constructors you have, and thus avoid all instantiation and even the function call overhead.

Generic Programming

Ok, so these are some pretty powerful ideas that have just been presented.. We have an abstract pointer, that if done right, behaves the same for ALL containers of data that we use, and we have these function objects that can be used with templates to provide an abstract implementation of any procedural idea.

This enables a very powerful notion known as generic programming. That is, you can write support functions that work for ALL data structures you could ever dream of, present AND future, and you only have to write these functions ONCE!

P.S. If you didn't just say "Holy shit!" to yourself, you don't have a pulse. Call a doctor immediately.

  • All the random code used on this page.