Sans Pareil Technologies, Inc.

Key To Your Business

Lesson 9 - String


String in C were represented as a null terminated array of characters. The null terminator was used by the various string manipulation functions (strcmp, strtok etc) and regular code to determine the end of the character array. Character arrays are extremely fast, however they have many negatives. Character arrays are the cause of many bugs, and parsing character arrays is very time consuming.

The std::string class is a part of the STL, and can be used with all the general STL algorithms. A std::string is equivalent in most cases to a std::vector - a container of characters, or an advanced array of characters.

Examples
#include <string>
#include <iostream>

int main()
{
  std::string str{“Hello World!”}; // Or std::string str = "Hello World!";
  std::cout << str << std::endl;
  std::string start = str.substr(0, 5);
  std::string end = str.substr(5);
  std::cout << "Length of string is: " << str.size() << std::endl;

  // Searching within a string
  std::string str1{“Hello, can you find me?”};
  std::string::size_type position = str1.find(“me”);
  std::cout << "First occurrence of me was found at: " << position << std::endl;

  // Print positions of all specified characters in a string
  std::string s{“C++ is an impressive language."};
  position = s.find_first_of(" .");

  while (position != std::string::npos)
  {
    std::cout << "Found space or dot at: " << position << std::endl;
    position = s.find_first_of(" .", position + 1);
  }
}

String tokeniser


A very common operation with strings, is to tokenise it with a delimiter of your own choice. This way you can easily split the string up in smaller pieces, without fiddling with the find() methods too much. In C, you could use strtok() for character arrays, but no equal function exists for std::string. Here are a couple of ways to tokenise std::string instances.
#include <iostream>
#include <algorithm>
#include <sstream>
#include <vector>

using Vector = std::vector<std::string>;

Vector tokenise( const std::string& str,
    const std::string& delimiters = " ,.:;" )
{
  using std::string;
  Vector tokens;

  // Skip delimiters at beginning.
  auto lastPos = str.find_first_not_of( delimiters, 0 );
  // Find first "non-delimiter".
  auto pos = str.find_first_of( delimiters, lastPos );

  while ( string::npos != pos || string::npos != lastPos )
  {
    // Found a token, add it to the vector.
    tokens.emplace_back( str.substr( lastPos, pos - lastPos ) );
    // Skip delimiters.  Note the "not_of"
    lastPos = str.find_first_not_of( delimiters, pos );
    // Find next "non-delimiter"
    pos = str.find_first_of( delimiters, lastPos );
  }

  return tokens;
}

Vector split( const std::string& str )
{
  std::string buf; // Have a buffer string
  std::stringstream ss{str}; // Insert the string into a stream

  Vector tokens; // Create vector to hold our words
  while ( ss >> buf ) tokens.push_back(buf);

  return tokens;
}

void print( const Vector& tokens )
{
  std::copy( tokens.cbegin(), tokens.cend(),
      std::ostream_iterator<std::string>( std::cout, ", ") );
  std::cout << std::endl;
}

int main()
{
  std::string str{ "Split me up! Word1 Word2 Word3." };
  auto tokens = tokenise( str );
  print( tokens );

  tokens = split( str );
  print( tokens ); // Note the trailing dot '.' in output
  return 0;
}

References


https://isocpp.org/blog/2018/10/word-counting-in-cpp-implementing-a-simple-word-counter-jonathan-boccara