Lesson 9 - String
String in C were represented as a null terminated array of characters. The null terminator was used by the various string manipulation functions (strcmp, strtok etc) and regular code to determine the end of the character array. Character arrays are extremely fast, however they have many negatives. Character arrays are the cause of many bugs, and parsing character arrays is very time consuming.
The
std::string
class is a part of the STL, and can be used with all the general STL algorithms. A std::string
is equivalent in most cases to a std::vector
- a container of characters, or an advanced array of characters.Examples
#include <string>
#include <iostream>
int main()
{
std::string str{“Hello World!”}; // Or std::string str = "Hello World!";
std::cout << str << std::endl;
std::string start = str.substr(0, 5);
std::string end = str.substr(5);
std::cout << "Length of string is: " << str.size() << std::endl;
// Searching within a string
std::string str1{“Hello, can you find me?”};
std::string::size_type position = str1.find(“me”);
std::cout << "First occurrence of me was found at: " << position << std::endl;
// Print positions of all specified characters in a string
std::string s{“C++ is an impressive language."};
position = s.find_first_of(" .");
while (position != std::string::npos)
{
std::cout << "Found space or dot at: " << position << std::endl;
position = s.find_first_of(" .", position + 1);
}
}
String tokeniser
A very common operation with strings, is to tokenise it with a delimiter of your own choice. This way you can easily split the string up in smaller pieces, without fiddling with the
find()
methods too much. In C, you could use strtok()
for character arrays, but no equal function exists for std::string
. Here are a couple of ways to tokenise std::string instances.#include <iostream>
#include <algorithm>
#include <sstream>
#include <vector>
using Vector = std::vector<std::string>;
Vector tokenise( const std::string& str,
const std::string& delimiters = " ,.:;" )
{
using std::string;
Vector tokens;
// Skip delimiters at beginning.
auto lastPos = str.find_first_not_of( delimiters, 0 );
// Find first "non-delimiter".
auto pos = str.find_first_of( delimiters, lastPos );
while ( string::npos != pos || string::npos != lastPos )
{
// Found a token, add it to the vector.
tokens.emplace_back( str.substr( lastPos, pos - lastPos ) );
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of( delimiters, pos );
// Find next "non-delimiter"
pos = str.find_first_of( delimiters, lastPos );
}
return tokens;
}
Vector split( const std::string& str )
{
std::string buf; // Have a buffer string
std::stringstream ss{str}; // Insert the string into a stream
Vector tokens; // Create vector to hold our words
while ( ss >> buf ) tokens.push_back(buf);
return tokens;
}
void print( const Vector& tokens )
{
std::copy( tokens.cbegin(), tokens.cend(),
std::ostream_iterator<std::string>( std::cout, ", ") );
std::cout << std::endl;
}
int main()
{
std::string str{ "Split me up! Word1 Word2 Word3." };
auto tokens = tokenise( str );
print( tokens );
tokens = split( str );
print( tokens ); // Note the trailing dot '.' in output
return 0;
}
References
https://isocpp.org/blog/2018/10/word-counting-in-cpp-implementing-a-simple-word-counter-jonathan-boccara