This is part 4 of a fast paced C++ tutorial for programmers familiar with high level languages like Perl and Python.

Copying files with Standard I/O Streams

After having familiarized ourselves with std::map in the previous tutorial, it’s time to take a closer look at the I/O Streams Library. So, in this tutorial, we’ll be copying text and binary files “the C++ way.”

Copying files isn’t exactly an interesting task, especially since we could run external utilities like cp(1) with the system(3) library call. To avoid the overhead of spawning an external process, we could also copy files the plain old C way, e.g. chunkwise using fread(3) and fwrite(3) from <cstdio>. However, the purpose of this tutorial is to learn C++, so let’s look at how to copy files using I/O streams from the STL.

Copying line-oriented text files

If the file is a collection of lines (i.e. not a binary file), we could copy the file line-wise:

// copy1.cpp -- copying of files, line structure.
 
#include <string>
#include <fstream>
#include <cstdlib>
 
int
main (int argc, char *argv[])
{
  std::ifstream ifs(argv[1]);
  std::ofstream ofs(argv[2]);
 
  std::string aLine;
 
  while (std::getline(ifs, aLine))
    ofs << aLine << std::endl;
 
  ifs.close();
  ofs.close();
 
  return EXIT_SUCCESS;
}

An input file is represented by the input stream ifs of type std::ifstream, and an output file is represented by the output stream ofs of type std::ofstream:

std::ifstream ifs(argv[1]);
std::ofstream ofs(argv[2]);

Those input and output streams can be used just like std::cin and std::cout.

Lines are read in from the input stream with the std::getline function. Destination of the read is aLine, a std::string:

std::string aLine;
 
while (std::getline(ifs, aLine))
  ofs << aLine << std::endl;

We use a std::string instead of an old style buffer, because std::string automatically adapts its length to the size of the input, so we don’t have to worry about buffer overflows.

Since std::getline strips the end-of-line character(s) from its input, we need to add it again in the output (we use the std::endl manipulator for that, though it may have been more efficient to simply append "\n" and not flushing the output stream).

After we’re done with the files, we can close them explicitely:

ifs.close();
ofs.close();

Needless to say: this program is only for line-oriented files.

Copying text files using a buffer

The previous program had an important property: it used a streamlined data flow: as soon as a line (or a chunk) was read in, it was written to the output. That program’s memory footprint was very small.

Alternatively, we could have slurped the whole file into memory (e.g. into a std::vector of std::strings), and then written the output:

// copy2.cpp -- copying of files, lines structures, via container
 
#include <string>
#include <vector>
#include <fstream>
#include <cstdlib>
 
int
main (int argc, char *argv[])
{
  typedef std::vector<std::string> vec_t;
  vec_t theLines;
  std::string aLine;
 
  std::ifstream ifs(argv[1]);
  while (std::getline(ifs, aLine))
    theLines.push_back(aLine);
  ifs.close();
 
  std::ofstream ofs(argv[2]);
  typedef vec_t::const_iterator iter_t;
  for (iter_t i = theLines.begin(); i != theLines.end(); ++i)
    ofs << *i << std::endl;
  ofs.close();
 
  return EXIT_SUCCESS;
}

As before, we used a std::ifstream and std::ofstream to represent input and output files; and of course, we’re again reading in the data line-wise with the std::getline function.

What’s new here is the data structure theLines. This is our vector of strings. Note that we defined the data type vec_t like this:

typedef std::vector<std::string> vec_t;

so that we can later on define a constant iterator out of it:

typedef vec_t::const_iterator iter_t;

We used the vec_t::push_back method of theLines to append the (stripped) lines to the end of the vector in the while loop. In the output for loop, we let an iterator i traverse the vectory from begin to end. Of course, we don’t want to output the iterator i but what i points to, i.e. we dereference i as in *i.

This program isn’t as good as the previous one, because it needs to store the whole file into memory (i.e. into theLines). This is okay for small files, but copying very large files (e.g. many GBs large) is sure to exhaust the virtual memory of the process.

The lesson to remember here: always use a streamlined data flow if you can!

Lather, rinse, repeat… but with algorithms

The code of the previous program wasn’t very elegant. Some idioms could have been written in a more concise way. Look at this variation:

// copy3.cpp -- copying of files, lines structures, via containers and algs.
 
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
#include <fstream>
#include <cstdlib>
 
int
main (int argc, char *argv[])
{
  typedef std::vector<std::string> vec_t;
  vec_t theLines;
  std::string aLine;
 
  std::ifstream ifs(argv[1]);
  std::copy(std::istream_iterator<std::string>(ifs),
            std::istream_iterator<std::string>(),
            std::back_inserter(theLines));
  ifs.close();
   
  std::ofstream ofs(argv[2]);
  std::copy(theLines.begin(),
            theLines.end(),
            std::ostream_iterator<std::string>(ofs, "\n"));
  ofs.close();
 
  return EXIT_SUCCESS;
}

The while and for loops are gone and have been replaced by calls to the generic std::copy algorithm (from <algorithm>). This algorithm has the following signature:

std::copy(input_iterator_begin,
          input_iterator_end,
          output_iterator);

and can be applied to nearly every structure that provides the necessary iterator semantics. For example, to replace the while loop with an idiomatic std::copy call, we need:

  • an input iterator pointing to the beginning of the input sequence. Since we want to read from an input stream, we need an std::istream_iterator adaptor, parameterized for std::string and using ifs as input stream: std::istream_iterator<std::string>(ifs),
  • an input interator pointing one past the end of the input sequence. Here we need a special notation / convention: the iterator adaptor std::istream_iterator<std::string>() without parameters represents such an end iterator.
  • an output iterator, that wenn called, will automatically call the function push_back on the data structure passed to it (so it will fill the vector). We could write such an iterator manually, but why bother, if we can use a prefabricated iterator from <iterator>: std::back_inserter, parameterized with the name of the needed target data structure? std::back_inserter(theLines)

This results in the following idiomatic code:

std::copy(std::istream_iterator<std::string>(ifs),
          std::istream_iterator<std::string>(),
          std::back_inserter(theLines));

To output the vector theLines with std::copy we need:

  • an input iterator pointing to the beginning of the vector: theLines.begin()
  • an input iterator pointing one past the end of the vector: theLines.end()
  • an output iterator, that, when called, will send the data it gets to an output stream. Again, we could write such an input interator by hand, but it’s much more convenient to use an iterator adaptor from <iterator>. More precisely, we get such an output iterator with std::ostream_iterator, passing the data type std::string as template parameter, and the desired output stream and separator string as parameters: std::ostream_iterator<std::string>(ofs, "\n").

This results in the following very idiomatic code:

std::copy(theLines.begin(),
          theLines.end(),
          std::ostream_iterator<std::string>(ofs, "\n"));

Obviously, even though it is more readable than the previous example, this program isn’t streamlined, because it buffers its whole input.

Copying binary files with streambuf iterators

All previous examples were about copying lines. To copy a binary file, we need to read and write bytes or chunks (buffers of bytes) directly. One way to do this, is to call istream::get or istream::read, to fetch data, and ostream::put or ostream::write to save it. You may want to try it. Have a look at the headers <istream> and <ostream> for the signatures.

A different, much more idiomatic approach is to use std::copy again, like this:

// copy4.cpp -- copying of files, via streambufs, iters and algs.
 
#include <algorithm>
#include <iterator>
#include <fstream>
#include <cstdlib>
 
int
main (int argc, char *argv[])
{
  std::ifstream ifs(argv[1]);
  std::ofstream ofs(argv[2]);
 
  std::copy(std::istreambuf_iterator<char>(ifs),
            std::istreambuf_iterator<char>(),
            std::ostreambuf_iterator<char>(ofs));
 
  ifs.close();
  ofs.close();
 
  return EXIT_SUCCESS;
}

istreambuf_iterator and ostreambuf_iterator are iterator adaptors that operate directly on the underlying streambuf of the respective streams. You may find details about them in <iterator> or in a header that is #included by that (e.g. with gcc-4.2, it is in /usr/include/c++/4.2/bits/streambuf_iterator.h on my system)

Summary

There are many ways to copy files using the C++ I/O Streams library. Text files can be copied line-by-line, while binary files need to be copied byte- or chunkwise.

When copying files, we should strive to streamline the data flow — i.e. not to buffer the whole input file in memory — because large files can easily overflow the available amount of virtual memory.

Instead of using loops, you should use the more idiomatic std::copy algorithm with appropriate iterators. Iterators that operate on streams can be obtained with std::istream_iterator and std::ostream_iterator.

Bypassing the formatting that the stream imposes is possible too with the std::istreambuf_iterator and std::ostreambuf_iterator iterator adaptors, which operate directly on the underlying streambuf, and are thus more efficient.

In the next tutorial, we’ll use an external library (POCO) to Base64 encode and -decode files and strings.