Garbage Collection (.NET) vs. ARC (Swift)

a.k.a. Escape from C# / Into the Swift Trenches – Part 2

Introduction

In Part 1 of this series, we dug deep into many languages differences, both good and bad, between C# and Swift. As you begin to use the language on a regular basis, one major consideration that will eventually come up in your development is how each environment handles memory deallocation.

C# uses what is known as Garbage Collection (GC). This is a very safe and managed system of deallocating unused objects. The developer, generally, does not need to be concerned with cleaning up objects. The GC subsystem in .NET is even intelligent enough to deal with reference cycles (circular references) without getting confused.

Swift, on the other hand, uses ARC (Automatic Reference Counting). While there are some obvious performance gains to be had here, it is not without its weaknesses. In particular, ARC does not know how to handle reference cycles (circular references) without developer intervention.

The Basics

Garbage Collection

Garbage Collection is a process used for cleaning up objects after they are no longer needed by your application. It runs at indeterminate times during the lifetime of your application (usually during idle time). There are typically two parts to a garbage collection process. The first part is going through the object graph and marking objects which are considered “safe to collect”. Another process at some indeterminate time will go through and deallocate those objects that are marked to be collected.

This technique is far more complex that how I am describing it, but in a nutshell, this is what is meant by garbage collection.

Advantages

  • Everything can be detected for garbage collection, including reference cycles
  • Processing is done in the background (mostly)

Disadvantages

  • Non-deterministic finalization (deconstructor execution is not predictable)
  • Under low memory conditions, garbage collection may halt thread execution

ARC (Swift)

Automatic Reference Counting is technically a form of garbage collection. However, typically when one refers to a garbage collector, they mean a separate process or subsystem that runs in the background independent of your application. ARC, on the other hand, is fully part of your application code. There is no background process, and memory deallocations are deterministic, meaning they are predictable as to when they will happen.

ARC is essentially the technique of keeping a count of the number of references to a given object. In the case of Swift, these count increments/decrements are inserted into your code at compile time for you, so there is no need to maintain this count yourself as a developer. Once the number of references to an object hits zero, the object is immediately deallocated.

Advantages

  • Deterministic finalization (deconstructor is predictably run)
  • Objects are deallocated immediately when they are no longer needed.

Disadvantages

  • Can not deal with reference cycles without developer intervention.

A Case Study

The Scenario

All this talk of GC vs. ARC is all well and good, but it would be really nice if we could see both of these memory management techniques in action. Below is some identical code written in both C# and Swift that should do the trick of exercising memory management in both GC and ARC systems. Both are ready to run as-is if you wish to try them out yourself.

What this code does is it allocates a lot of small objects (Item) very quickly, and keeps a doubly-linked list of arrays of these objects (Items). Once we hit a threshold (50 in this example), these Items objects begin to be unreferenced (starting with the oldest first). I felt that this example would be a reasonable representation of a real world application (perhaps a large e-commerce system that caches a certain amount of results from a database, that gets thousands of hits per second).

C# Example

// C#
using System;
using System.Threading;

namespace GCTest
{
    // A small object with a few small fields
    class Item
    {
        public int a = 1;
        public string b = "12345";
        public Item()
        {
        }
    }
    
    // Object that contains an array of several small objects inside
    class Items
    {
        private const int ITEM_COUNT = 1000000;
        private Item[] itemList = new Item[ITEM_COUNT];
        
        public int id;

        public Items(int id)
        {
            this.id = id;

            int i;
            for(i = 0; i < ITEM_COUNT; i++)
            {
                itemList[i] = new Item();
            }
        }

        ~Items()
        {
            Console.WriteLine("Finalized Items");
        }

        public Items next;
        //public Items prev;
    }

    class Program
    {
        const int LINKED_LIST_SIZE = 50;
        // Create linked list with just one item in it to start
        static int count = 0;
        static Items head = new Items(count);
        static Items tail = head;

        static void Main(string[] args)
        {
            count = 1;
            // Loop until user hits a key
            while(!Console.KeyAvailable)
            {
                Console.WriteLine("Adding new item to tail. - " + head.id.ToString() + "," + tail.id.ToString());
                // Append a new item to the end of the list
                tail.next = new Items(count);
                // NOTE: If we uncomment this, GC doesn't clean up and we get out of memory.  Best guess
                // is that this wouldn't happen if the variables went out of scope.  Circular references
                // are probably less aggressively cleaned up in order to avoid potential race conditions.
                //tail.next.prev = tail;
                tail = tail.next;
                count++;
                // If our count has hit the max size we've set
                if(count > LINKED_LIST_SIZE)
                {
                    Console.WriteLine("Removing oldest item from head.");
                    // Surprisingly enough, GC does not collect at all unless we set
                    // the prev reference to null first.  This is contrary to what I
                    // would expect from GC, since it supposedly handles reference
                    // cycles correctly.  My best guess is that the GC "plays it safe"
                    // and does not attempt to collect in this scenario until the variables
                    // are out of scope.
                    head.prev = null;

                    // Remove reference to oldest items object in list
                    head = head.next;
                }

                // Sleep for one millisecond so we work consistently on
                // different systems.
                Thread.Sleep(1);
            }
        }
    }
}

Swift Example

// Swift
//
//  main.swift
//  TestARC
//
//  Created by Gregory Read on 3/29/16.
//  Copyright © 2016 Gregory Read. All rights reserved.
//

import Foundation

// A small object with a few small fields
class Item {
    var a: Int = 1
    var b: String = "12345"
    init() {
    }
}

// Object that contains an array of several small objects inside
class Items {
    let ITEM_COUNT: Int = 1000000
    var itemList: [Item]
    
    var id: Int
    
    init(id: Int) {
        self.id = id;
        itemList = []
        
        // You can't allocate array in advance, but we can preallocate
        // the memory required for it.  This will make it similar to
        // how C# arrays work.
        itemList.reserveCapacity(ITEM_COUNT);
        
        for _ in 0 ..< ITEM_COUNT {
            // You have to append to an array since it's not allocated.
            itemList.append(Item())
        }
    }
    
    deinit {
        print("Finalized Items")
    }
    
    var next: Items!
    var prev: Items!
}

func main() {
    let LINKED_LIST_SIZE: Int = 50

    // Create linked list with just one item in it to start
    var count: Int = 0
    var head: Items = Items(id: count)
    var tail: Items = head

    count = 1;
    // Loop until user hits a key
    while true {
        print("Adding new item to tail. - " + String(head.id) + "," + String(tail.id));
        // Append a new item to the end of the list
        tail.next = Items(id: count);
        tail.next.prev = tail;
        tail = tail.next;
        count += 1;
        // If our count has hit the max size we've set
        if count > LINKED_LIST_SIZE {
            print("Removing oldest item from head.")
            // With ARC, if we do not specifically set one of the references
            // to nil, neither object will be deallocated due to a reference cycle.
            // An alternative to this would be to have the "prev" reference be declared
            // with the "weak" keyword.
            head.prev = nil
            // Remove reference to oldest items object in list
            head = head.next
        }
    }
}

main()

Results

Due to the hardware/software differences between my Mac OS and Windows setups, doing any sort of timed tests would not have been appropriate. Also, .NET in particular performs some additional optimizations when allocating memory, which while beneficial, is outside of the scope of this post. So here we will just be discussing observable results, without any sort of scientific metrics performed.

In our C# example, objects are allocated very quickly at first, but as memory begins to be constrained, garbage collection kicks in. You will observe this when you see the “Finalized Items” messages starting to show up in the console in groups. As memory consumption for the application reaches its limit, the application will appear to be halted (sometimes for several seconds at a time).

Meanwhile, in the Swift code, objects are allocated as normal. Contrary to the C# version, we immediately see the destructors being executed as they are unreferenced in the linked list (once we reach > 50 objects in the linked list). Because of the size of the object, we do indeed see a slight visible delay during deallocation of each of the objects. The primary difference, however, is that the performance impact is minimal and predictable.

Getting Into Trouble

In our example above, it is easy to run into peculiar race conditions in either types of memory management subsystems. Let’s take a look at a few examples below…

Garbage Collection

Garbage collectors generally take an approach of “it’s always safe NOT to collect”. For the most part, this is what you want, and it almost always benefits the developer when it comes to performance of your application. However, there are a few instances in which this can lead to undesirable results.

For instance, even though .NET’s garbage collector handles reference cycles, if you remove the “head.prev = null” statement in the C# code, you will actually get an OutOfMemoryException due to the garbage collector never freeing objects. This is likely done for performance/safety reasons as it likely waits until those variables go “out of scope” before they collect.

Also, if you make the “head” and “tail” variables local to the main() method, you will also get an OutOfMemoryException. Again, this is most likely due to the garbage collector playing it safe and choosing not to mark those objects as collectable until you leave scope. In the vast majority of cases, this is perfectly fine, but depending on your codebase, this could cause some pains.

In addition, if your application is using memory very aggressively, you could potentially spend a lot of time in a “suspended” state while garbage collection is happening. For large scale web applications, this can easily become very problematic, very quickly.

Automatic Reference Counting

With ARC, there is really only one “gotcha”, and that is reference cycles. Once you have objects reference each other, the developer must intervene either by setting one of the reference to null, or declaring one as “weak”. Parent/child object relationships are the most common scenarios you will run into.

This is a pretty big deal if you do not know about it in advance. Once you run into this, it is often hard to pinpoint after the fact. The best way to handle this scenario is to educate yourself up front about reference cycles, and code for them up front.

Conclusion

While there are pros and cons to both Garbage Collection and Automatic Reference Counting, in my opinion, ARC wins out as the superior option. That statement, however, comes with a big disclaimer…

You need to understand what weak references are and how they work!

Once you understand this, ARC is almost always the clear winner for performance, predictability and reducing race conditions.

Leave a Reply

You must be logged in to post a comment.

To get started on your custom software solution, call 1-888-421-1155.