How to iterate over the words of a string in C++? [Answered]

I’m trying to iterate over the words of a string.

The string can be assumed to be composed of words separated by whitespace.

Note that I’m not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.

The best solution I have right now is:

#include <iostream>
#include <sstream>
#include <string>

using namespace std;

int main()
{
    string s = "Somewhere down the road";
    istringstream iss(s);

    do
    {
        string subs;
        iss >> subs;
        cout << "Substring: " << subs << endl;
    } while (iss);
}

Is there a more elegant way to do this?

How to iterate over the words of a string? Answer #1:

I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>

template <typename Out>
void split(const std::string &s, char delim, Out result) {
    std::istringstream iss(s);
    std::string item;
    while (std::getline(iss, item, delim)) {
        *result++ = item;
    }
}

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, std::back_inserter(elems));
    return elems;
}

Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:

std::vector<std::string> x = split("one:two::three", ':');

Answer #2:

For what it’s worth, here’s another way to extract tokens from an input string, relying only on standard library facilities. It’s an example of the power and elegance behind the design of the STL.

#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>

int main() {
    using namespace std;
    string sentence = "And I feel fine...";
    istringstream iss(sentence);
    copy(istream_iterator<string>(iss),
         istream_iterator<string>(),
         ostream_iterator<string>(cout, "\n"));
}

Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy algorithm.

vector<string> tokens;
copy(istream_iterator<string>(iss),
     istream_iterator<string>(),
     back_inserter(tokens));

… or create the vector directly:

vector<string> tokens{istream_iterator<string>{iss},
                      istream_iterator<string>{}};

Answer #3:

A possible solution using Boost might be:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));

This approach might be even faster than the stringstream approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.

Answer #4:

This should work:

#include <vector>
#include <string>
#include <sstream>

int main()
{
    std::string str("Split me by whitespaces");
    std::string buf;                 // Have a buffer string
    std::stringstream ss(str);       // Insert the string into a stream

    std::vector<std::string> tokens; // Create vector to hold our words

    while (ss >> buf)
        tokens.push_back(buf);

    return 0;
}

Answer #5:

For those with whom it does not sit well to sacrifice all efficiency for code size and see “efficient” as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):

template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
              const std::string& delimiters = " ", bool trimEmpty = false)
{
   std::string::size_type pos, lastPos = 0, length = str.length();

   using value_type = typename ContainerT::value_type;
   using size_type  = typename ContainerT::size_type;

   while(lastPos < length + 1)
   {
      pos = str.find_first_of(delimiters, lastPos);
      if(pos == std::string::npos)
      {
         pos = length;
      }

      if(pos != lastPos || !trimEmpty)
         tokens.push_back(value_type(str.data()+lastPos,
               (size_type)pos-lastPos ));

      lastPos = pos + 1;
   }
}

I usually choose to use std::vector<std::string> types as my second parameter (ContainerT)… but list<> is way faster than vector<> for when direct access is not needed, and you can even create your own string class and use something like std::list<subString> where subString does not do any copies for incredible speed increases.

It’s more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.

Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.

Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.

All it needs is std::string… the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.

Answer #6:

Here’s another solution. It’s compact and reasonably efficient:

std::vector<std::string> split(const std::string &text, char sep) {
  std::vector<std::string> tokens;
  std::size_t start = 0, end = 0;
  while ((end = text.find(sep, start)) != std::string::npos) {
    tokens.push_back(text.substr(start, end - start));
    start = end + 1;
  }
  tokens.push_back(text.substr(start));
  return tokens;
}

It can easily be templatised to handle string separators, wide strings, etc.

Note that splitting "" results in a single empty string and splitting "," (ie. sep) results in two empty strings.

It can also be easily expanded to skip empty tokens:

std::vector<std::string> split(const std::string &text, char sep) {
    std::vector<std::string> tokens;
    std::size_t start = 0, end = 0;
    while ((end = text.find(sep, start)) != std::string::npos) {
        if (end != start) {
          tokens.push_back(text.substr(start, end - start));
        }
        start = end + 1;
    }
    if (end != start) {
       tokens.push_back(text.substr(start));
    }
    return tokens;
}

If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:

std::vector<std::string> split(const std::string& text, const std::string& delims)
{
    std::vector<std::string> tokens;
    std::size_t start = text.find_first_not_of(delims), end = 0;

    while((end = text.find_first_of(delims, start)) != std::string::npos)
    {
        tokens.push_back(text.substr(start, end - start));
        start = text.find_first_not_of(delims, end);
    }
    if(start != std::string::npos)
        tokens.push_back(text.substr(start));

    return tokens;
}

Answer #7:

This is my favorite way to iterate through a string. You can do whatever you want per word.

string line = "a line of text to iterate through";
string word;

istringstream iss(line, istringstream::in);

while( iss >> word )     
{
    // Do something on `word` here...
}

Hope you learned something from this post.

Follow Programming Articles for more!

About ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ

Linux and Python enthusiast, in love with open source since 2014, Writer at programming-articles.com, India.

View all posts by ᴾᴿᴼᵍʳᵃᵐᵐᵉʳ →