This is part 4 of a fast paced C++ tutorial for programmers familiar with high level languages like Perl and Python.
Copying files with Standard I/O Streams
After having familiarized ourselves with
std::map in the previous tutorial, it’s time to take a closer look at the I/O Streams Library. So, in this tutorial, we’ll be copying text and binary files “the C++ way.”
Copying files isn’t exactly an interesting task, especially since we could run external utilities like cp(1) with the system(3) library call. To avoid the overhead of spawning an external process, we could also copy files the plain old C way, e.g. chunkwise using fread(3) and fwrite(3) from <cstdio>. However, the purpose of this tutorial is to learn C++, so let’s look at how to copy files using I/O streams from the STL.
Copying line-oriented text files
If the file is a collection of lines (i.e. not a binary file), we could copy the file line-wise:
An input file is represented by the input stream ifs of type
std::ifstream, and an output file is represented by the output stream ofs of type
Those input and output streams can be used just like
Lines are read in from the input stream with the
std::getline function. Destination of the read is aLine, a
We use a
std::string instead of an old style buffer, because
std::string automatically adapts its length to the size of the input, so we don’t have to worry about buffer overflows.
std::getline strips the end-of-line character(s) from its input, we need to add it again in the output (we use the
std::endl manipulator for that, though it may have been more efficient to simply append
"\n" and not flushing the output stream).
After we’re done with the files, we can close them explicitely:
Needless to say: this program is only for line-oriented files.
Copying text files using a buffer
The previous program had an important property: it used a streamlined data flow: as soon as a line (or a chunk) was read in, it was written to the output. That program’s memory footprint was very small.
Alternatively, we could have slurped the whole file into memory (e.g. into a
std::strings), and then written the output:
As before, we used a
std::ofstream to represent input and output files; and of course, we’re again reading in the data line-wise with the
What’s new here is the data structure theLines. This is our
strings. Note that we defined the data type vec_t like this:
so that we can later on define a constant iterator out of it:
We used the vec_t::push_back method of theLines to append the (stripped) lines to the end of the vector in the
while loop. In the output
for loop, we let an iterator i traverse the vectory from begin to end. Of course, we don’t want to output the iterator i but what i points to, i.e. we dereference i as in
This program isn’t as good as the previous one, because it needs to store the whole file into memory (i.e. into theLines). This is okay for small files, but copying very large files (e.g. many GBs large) is sure to exhaust the virtual memory of the process.
The lesson to remember here: always use a streamlined data flow if you can!
Lather, rinse, repeat… but with algorithms
The code of the previous program wasn’t very elegant. Some idioms could have been written in a more concise way. Look at this variation:
for loops are gone and have been replaced by calls to the generic
std::copy algorithm (from <algorithm>). This algorithm has the following signature:
and can be applied to nearly every structure that provides the necessary iterator semantics. For example, to replace the
while loop with an idiomatic
std::copy call, we need:
- an input iterator pointing to the beginning of the input sequence. Since we want to read from an input stream, we need an
std::istream_iteratoradaptor, parameterized for
std::stringand using ifs as input stream:
- an input interator pointing one past the end of the input sequence. Here we need a special notation / convention: the iterator adaptor
std::istream_iterator<std::string>()without parameters represents such an end iterator.
- an output iterator, that wenn called, will automatically call the function
push_backon the data structure passed to it (so it will fill the vector). We could write such an iterator manually, but why bother, if we can use a prefabricated iterator from <iterator>:
std::back_inserter, parameterized with the name of the needed target data structure?
This results in the following idiomatic code:
To output the vector theLines with
std::copy we need:
- an input iterator pointing to the beginning of the vector:
- an input iterator pointing one past the end of the vector:
- an output iterator, that, when called, will send the data it gets to an output stream. Again, we could write such an input interator by hand, but it’s much more convenient to use an iterator adaptor from <iterator>. More precisely, we get such an output iterator with
std::ostream_iterator, passing the data type
std::stringas template parameter, and the desired output stream and separator string as parameters:
This results in the following very idiomatic code:
Obviously, even though it is more readable than the previous example, this program isn’t streamlined, because it buffers its whole input.
Copying binary files with streambuf iterators
All previous examples were about copying lines. To copy a binary file, we need to read and write bytes or chunks (buffers of bytes) directly. One way to do this, is to call
istream::read, to fetch data, and
ostream::write to save it. You may want to try it. Have a look at the headers <istream> and <ostream> for the signatures.
A different, much more idiomatic approach is to use
std::copy again, like this:
ostreambuf_iterator are iterator adaptors that operate directly on the underlying
streambuf of the respective streams. You may find details about them in <iterator> or in a header that is #included by that (e.g. with gcc-4.2, it is in /usr/include/c++/4.2/bits/streambuf_iterator.h on my system)
There are many ways to copy files using the C++ I/O Streams library. Text files can be copied line-by-line, while binary files need to be copied byte- or chunkwise.
When copying files, we should strive to streamline the data flow — i.e. not to buffer the whole input file in memory — because large files can easily overflow the available amount of virtual memory.
Instead of using loops, you should use the more idiomatic
std::copy algorithm with appropriate iterators. Iterators that operate on streams can be obtained with
Bypassing the formatting that the stream imposes is possible too with the
std::ostreambuf_iterator iterator adaptors, which operate directly on the underlying
streambuf, and are thus more efficient.
In the next tutorial, we’ll use an external library (POCO) to Base64 encode and -decode files and strings.