Programming with GNU Software

Noeud:Vector, Noeud Ť Next ť:Deque, Noeud Ť Previous ť:A Crash Course in Iterators, Noeud Ť Up ť:Containers and Iterators

Vector

A vector is like a dynamic array. Thus, you can add elements to the vector, and access them in constant time - being able to access each element as you need. Insertion and deletion of elements at the start or middle of the vector takes linear time; elements at the end take constant time. Think of a vector as a dynamic array of elements; you can change the size as you see fit and insert and retrieve elements according to an index. Inserting and deleting elements anywhere other than the end is costly in terms of time because the elements need to be reallocated - because container elements are stored contiguously in memory, inserting and deleting elements means that succeeding elements need to be reallocated. To use a vector in your program, you must include <vector>.

Let's take a look at a simple source file that uses a vector:

     /* vector1.cc
        Compiled using g++ vector1.cc -o vector1 */
     #include <vector>
     
     int main()
     {
       std::vector<int> v;
       for (int i=0; i<10; i++)
         v.push_back(i);
       cout << "v now contains " << v.size() << " elements:" << endl;
       for (int i=0; i<v.size(); i++)
         cout << "'" << v[i] << "' ";
       cout << endl;
       exit(0);
     }

     Example 3.4: vector1.cc

which produces the following output:

     $ ./vector1
     v now contains 10 elements:
     '0' '1' '2' '3' '4' '5' '6' '7' '8' '9'
     $

First, we declare that we'll be using a vector with the declaration #include <vector>. The declaration std::vector<int> v creates an empty vector named v with no elements whatsoever, which will contain elements of type int. We then iterate through a loop ten times, using the push_back(elem) method.

push_back is an efficient means of insertion for vector - each element is pushed onto the back of the vector without any need of reallocating previous elements, which as we've already mentioned can be costly.

We then utilize the size() method to print the amount of elements the vector contains, and then loop through the elements from 0 to v.size()-1, using the array subscript operator. Note that we can use array subscript operators to access elements of a vector, and thus avoid the use of iterators.

That's OK, but we can rewrite the above example to produce the same result but using essentailly different code and making use of iterators:

     /* vector2.cc
        Comiled using g++ vector2.cc -o vector2 */
     #include <vector>
     
     int main()
     {
       std::vector<int> v(10);
       /* Declare an iterator to work with: */
       std::vector<int>::iterator pos;
       cout << "v now contains " << v.size() << " elements:" << endl;
       int counter = 10;
       for (pos = v.begin(); pos != v.end(); ++pos)
         {
           --counter;
           *pos = counter;
         }
       pos = v.begin();
       while(pos != v.end())
         {
           cout << *pos << ' ';
           ++pos;
         }
       exit(0);
     }

     Example 3.5: vector2.cc

The output to this is obvious: we inform the user that the vector contains 10 elements, and then print them out one by one which involves printing out the numbers 9 down to 0.

This time we opt to declare a vector named v, reserving 10 elements of type int with the declaration std::vector<int> v(10). In addition, we also declare an iterator to work by declaring std::vector<int>::iterator pos. We couldn't have just declared an iterator on it's own, like iterator pos, for example; the reason being is that each container has it's own iterator (and as a consequence we didn't need to declare #include <iterator> because vector already includes it). Let's look at how the iterator works.

Recall from A Crash Course in Iterators that there are many kinds of iterators, suited to different containers. Vector uses a random access iterator; this contains all of the properties of a bidirectional iterator, as well as being able to use random access. Thus, we can use all of the comparison operators with pos, as well as pre- and post-increment operators.

An iterator enables us to maintain track of where we are in a container much the same way we use pointer arithmetic. Therefore, if we set pos at the beginning of a container of objects, we'd expect ++pos to point to the next element (if it exists...), and pos += 3 to point to the 3rd element after the current object. Therefore, the for loop

     for (pos = v.begin(); pos != v.end(); ++pos)
     {
             --counter;
             *pos = counter;
     }

takes advantage of the begin() and end() methods of vector. The begin() method allows us to access the first element of the vector; the end() method points past the last element of the vector. Each of these methods return an iterator. This is represented in figure (TODO: figure no.): figures/libstdcpp/stl-vector-iterator.png

The ++pos statement in the for loop sets pos to the next element in the container using the pre-increment operator. Using pre-increment generally offers better performance - using pos++ returns the old position of the iterator, whereas ++pos returns the new position of the iterator. The decision to use a while loop and a for loop in the example was arbitrary; it was merely to demonstrate the different ways in which iterators could be used with a loop. Declaring a vector, adding elements and then reading them back are very simple, so let's look at some other vector operations we can perform. The following example illustrates how we can use different ways to access elements as well as some other methods of inserting and removing elements. We'll concentrate on using objects as elements rather than primitive data types:

     /* vector3.cc
      * Compiled using g++ vector3.cc Address.cc -o vector3 */
     #include <vector>
     #include "AddressRepository.hh"
     int main()
     {
       vector<Address> v;
       /* Add all of the address objects to the vector: */
       v.push_back(addr1);
       v.push_back(addr2);
       v.push_back(addr3);
       v.push_back(addr4);
       v.push_back(addr5);
       /* Declare an iterator to work with: */
       std::vector<Address>::iterator pos;
       /* Loop through the vector printing out elements: */
       cout << "First iteration" << endl;
       for (pos=v.begin(); pos<v.end(); ++pos)
         {
           pos->print();
         }
     
       /* Remove the last element: */
       v.pop_back();
       /* Create and insert a new Address object: */
       Address addr6("Reggie", "1 Card Rd.", "Hamps", 892286);
       v.insert(v.begin(), addr6);
       cout << "Second iteration" << endl;
       for (pos=v.begin(); pos<v.end(); ++pos)
         {
           pos->print();
         }
       exit(0);
     }

     Example 3.6: vector3.cc

and when compiled and run produces the following output:

     $ ./vector3
     First iteration
     Name: Jane, Street: 12 Small St., City: Worcs, Phone: 225343
     Name: Edith, Street: 91 Glib Terrace, City: Shrops, Phone: 858976
     Name: Adam, Street: 23 Big St., City: Worcs, Phone: 443098
     Name: Jane, Street: 55 Almond Terrace, City: Worcs, Phone: 242783
     Name: Bob, Street: 2 St. Annes Walk, City: Oxford, Phone: 303022
     Second iteration
     Name: Reggie, Street: 1 Card Rd., City: Hamps, Phone: 892286
     Name: Jane, Street: 12 Small St., City: Worcs, Phone: 225343
     Name: Edith, Street: 91 Glib Terrace, City: Shrops, Phone: 858976
     Name: Adam, Street: 23 Big St., City: Worcs, Phone: 443098
     Name: Jane, Street: 55 Almond Terrace, City: Worcs, Phone: 242783

This example isn't too different from the last, although there are a few important points to make.

The push_back method inserts elements at the end of the vector; this method of insertion is extremely fast and should be preferred if you are looking for fast insertion - none of the elements need to be reallocated. Notice that we declare the iterator to have type Address, rather than int as in the previous examples. Since pos is an iterator, and is a smart pointer to the containers element under scrutiny, we can simply access the Address objecct directly with a call to pos->print(), because pos points to each element in the vector.

After adding 5 Address objects, v.pop_back() removes the last element of the vector; so at this point in the execution of the binary, there will only be four elements in the vector, since Edith - the last element in the vector - has been removed. Again, removing the last element of a vector is also fast and achieved in constant time.

We also insert an element into the vector using the line v.insert(v.begin(), addr6). Inserting addr6 at v.begin() results in addr6 being inserted at the start of the vector. This is costly in terms of time; all of the remaining elements need to be reallocated after inserting an element at the start of the vector. Finally, the second loop prints out the vector of Address objects.

As you can see the interface to using vector is very simple, and the difference between using primitive datatypes and objects as vector elements is trivial.

There are a few points to make before we finish and move on to deque. We haven't dealt with deletion yet, and there are a number of consequences when deleting elements from a vector. We'll also focus briefly on reallocation and capacity of vector.

Since vector stores elements contiguously, if we delete an element, an important consequence follows: that all previously assigned references, iterators and pointers to any succeeding elements in the vector are invalidated. By invalidated, we're really saying that, "this (pointer/reference/iterator) is no longer reliable". Let's look at a simple example.

     /* vector 4.cc
        Compiled using g++ vector4.cc -o vector4 */
     #include <vector>
     #include "AddressRepository.hh"
     
     int main()
     {
       int *ill_ptr;
     
       std::vector<int> v;
       for (int i=0; i<10; i++)
         v.push_back(i*100);
     
       std::vector<int>::iterator pos = v.begin();
       pos += 5;
       ill_ptr = pos;
     
       cout << "Element 5: " << *ill_ptr << endl;
       /* Erase the first element: */
       v.erase(v.begin());
       /* Now print out the old 'pos': */
       cout << "... after reallocation: " << *ill_ptr << endl;
       exit(0);
     }

     Example 3.7: vector4.cc

The output of the program is fairly predictable:

     Element 5: 500
     ... after reallocation: 600

The above program pushes ten integers onto the vector, so that it stores the values 0, 100, 200 and so on up to 900. At the point where we find the element 500, we assign ill_ptr to the iterator already pointing to 500. The result is that we perform v.erase(v.begin()) to delete the first element, all of the elements of the vector are reallocated. Consequently, instead of ill_ptr pointing to the value 500, it instead contains 600. Although the address of the pointer points to the same location, the contents of where the pointer points to has changed due to the reallocation, and has been invalidated.

Another point to be aware of when using vector is capacity and reallocation. So far vector seems to lack because of the issues of inserting and deleting elements, which can be time consuming if you insert or delete any elements other than the last element. If speed is an important factor, then we need to avoid reallocation where necessary because reallocation takes time. OK; so is there a way around this? Luckily there is, and it is down to the capacity and reserve methods.

capacity tells us how many elements we could place in a vector. This is pretty useful; it means that we can create a vector and tell it how much space to reserve for us, using the reserve method (see Container Summary for details). Because space has been stored for the elements (providing a constructor is provided for the elements we're inserting), when we come to insert elements, no reallocation is necessary unless the capacity is exceeded. The reserve(n) method ensures that we can create a vector with at least n elements. Providing your objects have a default constructor, you could just call std::vector<Type> v(n), where Type is the data type, and n is the number of elements you wish to create using the default constructor of data type Type. Extra time will have to be taken to instantiate the objects however, so reserve() will probably a better option.

There are too many methods to illustrate using just examples, and the preceeding examples are simple enough for you to be able to use vector on a basic level. All of the available member functions of vector are detailed in Container Summary. More complex examples will be seen when we explore Generic Algorithms and Function Objects.