Bromeon :: Articles :: Why RAII rocks

Why RAII rocks

January 2014

Memory management is the process of managing dynamic memory, mainly by allocating and deallocating it. Memory management is tightly coupled to object ownership: pointers or instances are responsible of deallocating memory areas assigned to them. Pointers that do not own their memory, but rather act as passive indirections to other objects (like references), do not fall under this category and are thus not concerned by this article.

In C, memory is managed mainly using malloc() and free(). It's a well-known fact that these two functions do not interact nicely with C++, because they only operate on raw memory, ignoring constructors and destructors. C++ in turn offers the operators new and delete for the allocation and deallocation of single objects, as well as new[] and delete[] for dynamic arrays. In this article, I will refer to the four memory-related C++ operators using the term manual memory management. What a lot of C++ developers are not aware of: manual memory management is complex, error-prone and verbose. And worse, it is employed very widely even though it can easily be avoided. Interestingly, to this day, there are still many developers thinking that all the tasks performed by garbage collectors in other languages have to be hand-crafted by C++ programmers. This is untrue; C++ is not C.

The problems of manual memory management

As the name already implies, the main disadvantage of manual memory management results from manual interaction: the user explicitly has to allocate and deallocate memory for the objects. What may sound like a simple task, can be surprisingly difficult as soon as we move away from the most trivial code. Even advanced C++ developers make often mistakes related to memory management -- not because they are bad programmers, but because they use inherently error-prone techniques. So, why concretely is manual memory management considered bad practice?

A simple example

Manual memory management is especially difficult to handle when a function contains multiple return paths or when it may throw exceptions. Keep in mind that every possible control path of a function must be protected by one or multiple delete statements, if local variables are allocated using new. Let's look at a simple example code that contains memory leaks:

int unsafe()
{
    A* a = new A;

    if (a->f())
    {
        B* b = new B;
        if (b->f())
            return b->g();
        else
            return a->g();
    }
    
    return 0;
}
Our task is to fix it. Simple, isn't it? First, assume there are no exceptions. Let's write the corresponding deallocation code:
int unsafe2()
{
    A* a = new A;

    if (a->f())
    {
        B* b = new B;

        if (b->f())
        {
            int r = b->g();
            delete b;
            delete a;
            return r;
        }
        else
        {
            int r = a->g();
            delete b;
            delete a;
            return r;
        }
    }

    delete a;
    return 0;
}

Not only do we need to insert delete statements, the manual deallocation forces us to break the code flow and store return values in variables instead of returning them directly. The code looks already quite beautiful, yet we're far from safety.

Next, we assume that constructors may fail, and attempt to make the code exception-safe (there are no memory leaks in the presence of exceptions) and exception-neutral (thrown exceptions leave the function unchanged):

int unsafe3()
{
    A* a = new A;

    if (a->f())
    {
        B* b;
        try
        {
            b = new B;
        }
        catch (...)
        {
            delete a;
            throw;
        }

        if (b->f())
        {
            int r = b->g();
            delete b;
            delete a;
            return r;
        }
        else
        {
            int r = a->g();
            delete b;
            delete a;
            return r;
        }
    }

    delete a;
    return 0;
}

We have doubled the original amount of code -- only for deallocation and error handling, we haven't introduced new functionality. And the code is still not safe, since the functions f() and g() might also throw exceptions. I think you can imagine how complexity further explodes if you implement that. Code becomes totally unreadable, the actual function logic is lost in boilerplate technicalities.

Note also that we use only two dynamic allocations that require such a complicated logic. Imagine how it might look like for 5, or 10. Furthermore, I have only talked about memory leaks. Problems like dangling pointers or double deletes exist, too, just in case life becomes boring.

The solution: RAII

Now the good news: C++ offers us RAII, Resource Acquisition Is Initialization. RAII is a simple yet extremely powerful idiom to manage resources automatically. It is not limited to memory, it can handle arbitrary resources (files, network connections, mutexes, ...). It is based on the fact that destructors can perform any cleanup code as soon as a variable falls out of scope. This code is guaranteed to be executed, whether the function is left by a return value or an exception, and the execution takes place automatically.

In C++, RAII on pointer level is implemented through smart pointers. These are class templates that acquire a resource (a pointer to dynamically managed memory) in the constructor and release it in the destructor, by calling thedelete operator automatically. On array level, RAII is implemented through containers, i.e. collections of multiple objects of the same type. STL containers such as std::vector are the most famous examples in the standard library. This article focuses on smart pointers to demonstrate RAII, but most of the points apply to RAII in general.

In C++11, the standard smart pointer std::unique_ptr was introduced. It is a simple wrapper around a pointer, and essentially does nothing more than deleting the object in its destructor. This has the effect that smart pointers falling out of scope automatically destroy their contained objects. This is important; the whole point of RAII is to move away from manual memory management and the troubles coupled to it.

What our previous code looks like with the use of smart pointers:

int safe()
{
    std::unique_ptr<A> a(new A);

    if (a->f())
    {
        std::unique_ptr<B> b(new B);
        if (b->f())
            return b->g();
        else
            return a->g();
    }
    
    return 0;
}

Our code has the same length as the very first snippet and is completely safe. We can introduce further return or throw statements without touching memory management. Also, we do not get into trouble if one of the called constructors or functions begins to use exceptions. Code remains very clean, and we have zero performance overhead. Neither speed, nor memory.

As you see, advantages of RAII are far from little. This idiom is extremely powerful; in fact it is so effective that even languages with garbage collectors have started to implement a weak form of it (e.g. using in C#, or try-with-resources in Java).

Legitimate use cases for manual memory management

This is not a sort of a compromise paragraph that tells you "RAII is not the holy grail, it also has many disadvantages where alternatives are appropriate". No. Whenever you can use RAII, do it. In almost all situations, there are no rational reasons against it (the esoteric ones are covered in the next section). Actually, RAII is one of the few techniques where one can really state that it should be the default approach, and you must have strong, justified reasons to deviate from it.

So what are those reasons? In short, low-level tasks. In order to implement automatic memory management, you need manual memory management. Smart pointers invoke the delete operator internally. Container allocators may allocate memory using new[]. However, and this is crucial, this usage must be encapsulated at all costs. The API user is not supposed to interfere with memory management directly, everything should happen behind the scenes. For this very reason, APIs that require the user to use new/delete (or other forms of manually managed resources, such as create()/destroy()function pairs) can be improved using RAII and C++11 move semantics.

The trend goes more and more into the direction where the user does not even know about the memory management. C++14 introduces the make_unique() function template, which makes it possible to get rid of every single new and delete in end-user code. Even the allocation is encapsulated, which has the nice side effect of mentioning the type only once (and performing internal optimizations in the case of make_shared()):

std::unique_ptr<T> ptr(new T(...));  // is the same as:
auto ptr = std::make_unique<T>(...); // no new here!

Common myths

Since there are up to this day people who constantly construct reasons to avoid RAII and who stick to ancient code techniques in spite of better knowledge, I will try to investigate several myths and logical fallacies related to RAII that I have encountered again and again over the years.

1. "I am using new/delete because I need more control over my code. I want to determine exactly when my objects are created and destroyed."

RAII does not take away control. It does not dictate the lifetime of objects, it only limits it to a scope. You are still able to perform initialization after the definition and destruction before the end of the scope (e.g. using std::unique_ptr::reset()). If you need an object to outlive the scope, then choose a wider scope for it, or transfer ownership to a different object that lives long enough.

2. "RAII implies overhead"

This is as wrong as saying that OOP has overhead, because RAII specifies a concept and not a specific implementation thereof. As we know, C++ widely follows the zero-abstraction overhead principle, meaning that encapsulation within classes and functions does not affect the performance per se. In fact, std::unique_ptr has been designed with exactly this principle in mind: it does not use additional memory or processing cycles. A class object containing only a pointer will be replaced by the pointer itself and function calls will be inlined, leaving binary code that is equivalent to the hand-written version.

There are many situations where the usage of RAII can even improve performance. Since code that manages memory manually is typically more complex than the automatic equivalent, it is more difficult for the compiler to optimize it. Concerning STL containers, their destructors are usually implemented as a loop iterating over each element and calling its destructor. If this destructor call leads to a delete call on the object (by a smart pointer), then the whole deallocation can be performed in one pass. On the other hand, hand-written code iterating over each element and calling delete effectively duplicates the effort of iteration, and with it the possibility of cache misses.

Note that std::shared_ptr is a different story. This smart pointer does have overhead due to reference-counting, thread safety and dynamic deleters. What people often forget though, is that handcrafting those features is not free, either. Nevertheless, use shared pointers with care. A mistake that many people new to smart pointers make is overusing them. Do not use shared pointers as a replacement for GCs, use them only when you have shared ownership -- which occurs rather rarely.

3. "Memory management is only unsafe when you use it incorrectly/in an unsafe way"

This is a typical logical fallacy based on circular logic. The assumption is that new/delete operators are safe as long as you use them safely. What is left out in this reasoning are all the cases where the operators are not used safely. As the example code above shows, these cases are not rare at all, even seemingly simple code can introduce loads of mistakes. Smart pointers, on the other hand, are much harder to use incorrectly, see also the next point.

The other reason leading to this statement is a misguided estimate of the actual complexity. Programmers, and especially C++ programmers, tend to consider themselves experienced as soon as they have worked with a language feature successfully for a few times. The basic concept of new and delete is delusively simple: one operator allocates, the other one deallocates. It is not obvious that this simplicity does not scale in more complex scenarios, where object inter-dependencies, order of destruction or exception safety have to be considered. Unless something goes wrong, one will not notice that there is a problem; with respect to memory leaks, things can even go wrong, yet nobody will notice.

4. "Many people use smart pointers incorrectly"

While this may or may not be true, the statement has no effect on the advantages of smart pointers whatsoever. There are always people who don't use a feature correctly; this on its own is not an argument against the feature. Furthermore, the number of people who use smart pointers incorrectly is significantly smaller than the number of people who use new and delete incorrectly -- simply because smart pointers are easier to use and much more forgiving.

5. "Unique pointers are overkill in this situation"

This statement is based on the assumption that unique pointers add certain weight to the code; something you wouldn't have with a hand-crafted solution. However, as explained above, RAII per se has no overhead, thus the weight cannot relate to memory or CPU costs. What then?

Talking about "client code" (as opposed to the legitimate use cases mentioned above), it is inconsistent to use smart pointers sometimes and sometimes not, because the decision on when to use them is arbitrary. There is no rational criterion determining a point in complexity, until which it is reasonable to abandon RAII -- especially considering the fact that the code may expand in the future. Since RAII requires no additional effort (the exact opposite is true), there is no reason not to use it from the beginning.

6. "Using RAII is a matter of taste"

Safety is never a matter of taste. Saying it is is not only ignorant of all the rational arguments for RAII, but also has a negative effect on co-developers and other people who come in contact with one's code. This argument is often brought by people who have programmed their way for years and who are too narrow-minded to consider alternatives, even if those are superior from an objective perspective.

7. "Smart pointers have an ugly syntax"

The differences in syntax between raw and smart pointers are tiny, by design. This concerns mainly the declaration (std::unique_ptr<T> vs. T*) and some method calls (p.reset(x); vs. p = x;). What people coming up with this argument completely dismiss is the fact that manual memory management makes the code considerably more verbose, if used correctly. See also the code example above.

...not to mention the high price you are going to pay for a slightly different syntax.

8. "I cannot use smart pointers because my compiler doesn't support C++11"

While it is true that you cannot use std::unique_ptr and std::shared_ptr under these circumstances, this doesn't imply you have to abandon RAII completely. The RAII idiom is present since the first C++ standard in 1998, and possibly much longer. The standard library contains std::auto_ptr (which has been deprecated meanwhile, but is still better than raw new/delete), the TR1 (extension to the standard implemented by many compilers) provides std::tr1::shared_ptr and the Boost libraries have contained boost::scoped_ptr and boost::shared_ptr for ages. Even in case you really cannot use any of these solutions, the implementation of a minimalisticscoped_ptr-like smart pointer is a matter of minutes which will pay off for the rest of your life. Searching a single memory leak takes much longer than writing such a class template.

9. "A lot of successful and popular software projects don't strictly use RAII, they must have a good reason why"

Unfortunately, no. The spread of a habit doesn't allow direct conclusions about its quality. C++ has always been a language that posed virtually no restrictions on the way you can write code. The particular problem with C++ is that since its first standardization in 1998, the way how it has been used has altered massively, even during the time where the language itself was left unchanged. A lot of idioms and best practices have only emerged gradually over the years, and not all developers have witnessed them. One and a half decades later, there is still much legacy code around, and sadly sometimes even new code is written as if there had been no progress. One problem is the sheer complexity of C++ and the difficulty to keep up-to-date, another the tremendous amount of questionable literature and existing code, inspiring newcomers to adopt obsolete techniques.

Conclusion

This article explained why manual memory management is problematic, and why even trivial programs built on it can cause endless issues. RAII, on the other hand, is a simple and powerful idiom that mitigates those problems at no cost, making code more robust and readable. The only case where you should use new, new[], delete or delete[] directly is inside the implementation of low-level primitives such as smart pointers or containers. Encapsulate them, so that their users can benefit from the easy-to-use API. This article is not about avoiding raw pointers, but rather about avoiding raw pointers that own memory and thus require manual memory management.

Use RAII, and memory leaks are a problem of the past.