Split String with String Delimiter in C++

How can I split a string in C++ using a string delimiter (standard C++)?

I am currently parsing a string in C++ with the following code:

using namespace std;

string parsed, input = "text to be parsed";
stringstream input_stringstream(input);

if (getline(input_stringstream, parsed, ' ')) {
    // do some processing.
}

Parsing with a single character delimiter works fine. However, I need to use a string as the delimiter.

For example, I want to split the string:

scott>=tiger

using ">=” as the delimiter, so that I can obtain “scott” and “tiger”. How can I achieve this?

I’ve been working with C++ for over a decade, and one effective way to split a string using a string delimiter is to use the find function to locate the position of your delimiter and then use the substr function to extract tokens.

Here’s a simple example:


std::string s = "scott>=tiger";

std::string delimiter = ">=";

std::string token = s.substr(0, s.find(delimiter)); // token is "scott"

The find function returns the position of the first occurrence of the delimiter, and substr extracts the substring. After extracting one token, you can remove it along with the delimiter to proceed with the next extraction, like this:


s.erase(0, s.find(delimiter) + delimiter.length());

You can loop through the string to get all tokens. Here’s a complete example:


std::string s = "scott>=tiger>=mushroom";

std::string delimiter = ">=";

size_t pos = 0;

std::string token;

while ((pos = s.find(delimiter)) != std::string::npos) {

token = s.substr(0, pos);

std::cout << token << std::endl;

s.erase(0, pos + delimiter.length());

}

std::cout << s << std::endl;

Output:


scott

tiger

mushroom

This method is straightforward and works well for simple string splitting tasks.

With over eight years of experience in C++, I often find that using regular expressions can simplify string splitting, especially for complex delimiters. Here’s how you can do it:

#include <iostream>
#include <string>
#include <regex>
#include <vector>

std::vector<std::string> split(const std::string& str, const std::string& regex_str) {
    std::regex regexz(regex_str);
    std::vector<std::string> list(
        std::sregex_token_iterator(str.begin(), str.end(), regexz, -1),
        std::sregex_token_iterator()
    );
    return list;
}

int main() {
    std::string input_str = "lets split this";
    std::string regex_str = " ";
    auto tokens = split(input_str, regex_str);
    for (const auto& item : tokens) {
        std::cout << item << std::endl;
    }
}

This approach allows for splitting using substrings, characters, or even complex regular expressions. It’s concise and leverages C++11 features to make the code cleaner.

Alternatively, you can write it more verbosely:

std::vector<std::string> split(const std::string& str, const std::string& regex_str) {
    std::regex regexz(regex_str);
    std::sregex_token_iterator token_iter(str.begin(), str.end(), regexz, -1);
    std::sregex_token_iterator end;
    std::vector<std::string> list;
    while (token_iter != end) {
        list.emplace_back(*token_iter++);
    }
    return list;
}

int main() {
    std::string input_str = "lets split this";
    std::string regex_str = " ";
    auto tokens = split(input_str, regex_str);
    for (const auto& item : tokens) {
        std::cout << item << std::endl;
    }
}

Using regular expressions is powerful and can handle more complex splitting needs efficiently.

In my 15 years of experience, I’ve found that C++17’s std::string_view offers an efficient way to split strings without unnecessary copying. Here’s a method using std::string_view:

#include <algorithm>
#include <iostream>
#include <string_view>
#include <vector>

std::vector<std::string_view> split(std::string_view buffer, const std::string_view delimiter = " ") {
    std::vector<std::string_view> result;
    std::string_view::size_type pos;

    while ((pos = buffer.find(delimiter)) != std::string_view::npos) {
        auto match = buffer.substr(0, pos);
        if (!match.empty()) {
            result.push_back(match);
        }
        buffer.remove_prefix(pos + delimiter.size());
    }

    if (!buffer.empty()) {
        result.push_back(buffer);
    }

    return result;
}

int main() {
    auto split_values = split("1 2 3 4 5 6 7 8 9     10 ");
    std::for_each(split_values.begin(), split_values.end(),
                  [](const auto& str) { std::cout << str << '\n'; });

    return split_values.size();
}

This example splits the input string based on the provided delimiter, avoiding unnecessary copying and making the process more efficient. The split function iterates over the input string, extracts tokens, and skips any empty matches. The main function demonstrates its usage and prints each token.

Using std::string_view is highly efficient for large strings, as it avoids copying and only creates views of the original string.