Following on my earlier post about the many little C++ papercuts that we all deal with everyday, comes std::string_view and says “hold my beer.”

Before C++17, passing a string around without copying it was an exercise in frustration. You had std::string const&, which was fine if you actually had a std::string, but if you didn’t, this required a conversion (and, potentially, a heap allocation).

And, of course, you had old-faithful: char const*. It worked, but if you had a std::string you needed to tack on an awkward .c_str() since std::string (sanely) doesn’t implicitly convert to a char const*. And, of course, you had to carefully check whether the pointer was ~~NULL~~ nullptr and, if you needed the length, you had to do an O(n) strlen call.

There simply was no efficient, universal, non-owning reference to a contiguous sequence of characters. And then along came std::string_view: a simple, lightweight, non-owning view. No ownership semantics, no allocations, no fuss. You could construct one from a std::string, from a const char*, from a pointer-and-length pair, or just default-construct an empty one.

It was a simple and elegant abstraction that we needed to write better, cleaner code.

And then the committee decided that constructing a string_view from a const char* that is nullptr should be undefined behavior.

The state of affairs

Let’s be precise about what we’re dealing with. A default-constructed std::string_view has data() == nullptr and size() == 0. This is a perfectly valid, well-defined state. The standard guarantees it:

    std::string_view sv;
    assert(sv.data() == nullptr);  // guaranteed
    assert(sv.size() == 0);        // guaranteed
    assert(sv.empty());            // guaranteed

So far, so good. But now try this:

    const char* p = nullptr;
    std::string_view sv(p);  // undefined behavior!

This is UB because the const char* constructor is specified to call Traits::length(p) (which is effectively strlen(nullptr)). Game over.

But it gets worse:

    const char* p = nullptr;
    std::string_view sv(p, 0);  // also undefined behavior!

The pointer-and-length constructor requires [s, s + count) to be a valid range, which fails for nullptr even when count is zero.

So, a default-constructed std::string_view has data() == nullptr and size() == 0. But attempting to manually construct such a std::string_view yields undefined behavior. The destination state is fine, but the journey there kills you.

It’s worth noting that C++20 fixed the analogous problem for std::span: constructing a span from (nullptr, 0) is now well-defined. But, somehow, string_view didn’t get the same treatment.

A footgun with poor aim

Then along came C++23, and we got a new constructor:

    basic_string_view( std::nullptr_t ) = delete;

I don’t object to this, but it doesn’t help much. It catches exactly one case: the literal nullptr. It does nothing for the far more common case of a char const* variable that happens to be null at runtime. Even a programmer who tries to be careful gets burned:

    void process_string(std::string_view s)
    {
        if (s.size() == 0)
            return;
        ... other code here
    }
    
    // UB has already happened *at the call site*, before process_string
    // even gets a chance to perform any kind of check. The defensive code
    // above is too late.
    process_string(some_function_that_might_return_null());

The fix

There is a fix and it is a trivial change. It makes the language safer by default, removing a footgun that exists only to blow toes off.

    // Before:
    constexpr basic_string_view(const CharT* s)
        : data_(s)
        , size_(Traits::length(s))
    {
    }
    // After:
    constexpr basic_string_view(const CharT* s)
        : data_(s)
        , size_(s ? Traits::length(s) : 0)
    {
    }

That’s it: a single branch. One comparison against zero that should be overwhelmingly well-predicted: in the common case, s is not nullptr. And in the rare case where it is, you get a well-defined empty string_view instead of a crash.

And, of course, since the compiler has visibility, if it can prove that the pointer is non-null—say because you passed a string literal, or the result of std::string::c_str(), or any pointer that was already dereferenced—it can elide the check entirely. This is exactly the kind of optimization that modern compilers excel at.

So the cost is zero for every case where the current code is well-defined, and a single branch for the cases that would be UB and even result in a crash. I don’t know about you, but I’ll take a branch.

The “nullable string_view” objection

A likely counterargument is that string_view is not intended to be nullable, and that tolerating nullptr in the constructor encourages using it as a poor man’s optional<string_view>.

But that particular ship has already sailed. A default-constructed string_view has data() == nullptr. The state already exists; the question is only whether arriving at that state via const char* should be well-defined.

The cost of inaction

Every codebase that uses string_view with any const char* of uncertain provenance has to wrap the construction manually:

    std::string_view safe_sv(const char* p)
    {
        return p ? std::string_view(p) : std::string_view();
    }

This is the same branch that the constructor should contain, except now it’s more uglier and replicated across every call site in every codebase.

Not fixing this turns a “zero overhead abstraction” into a negative overhead abstraction in terms of the aggregate cost in code size, the maintenance burden, and the inevitable bugs when someone, somewhere, forgets to do the check. The cost of using string_view correctly exceeds the cost of a single branch in the constructor.

And in practice, the check is already happening. libc++’s hardened mode traps on nullptr construction, which means the null check is already being emitted but has a worse outcome: an abort instead of a graceful empty view.

A precedent

C++ has already solved this exact problem in another context. Consider the humble delete:

    int* p = nullptr;
    delete p;  // perfectly fine! defined as a no-op

The standard requires the implementation to check if the pointer is nullptr before calling the deallocator. This has been the case since C++03. Nobody argues that this check is unacceptable overhead. The cost of a single null check is negligible compared to the cost of the operation that follows (deallocation), and the alternative—UB on a common edge case, or even a hard crash—is unacceptable.

The string_view situation is analogous: the operation that follows (a strlen scan through memory) is more expensive than a null check, and the alternative is equally unacceptable.

A modest proposal

We have to fix basic_string_view(const CharT*) to handle null pointers. To do this involves no ABI break, no complexity and and no performance regression. A single branch removes the need for every user of std::string_view to implement the check manually.

A more ambitious proposal, especially in the context of contracts and hardened implementations, would also address basic_string_view(const CharT*, size_type): if the pointer is nullptr then the size should be zero, and a non-zero size with a null pointer would result in a contract violation or an exception.

CRYPTOGRAPHER - SOFTWARE ENGINEER

NIK BOUGALIS

[ BLOG | PROJECTS | LINKS | ABOUT ]

A view to nowhere: Why the UB in std::string_view constructors makes our life worse