From 1.2 GB to 469 MB

I had 5.2 million FileNode objects. I didn’t need them.

I asked a friend to test Renala. He works in graphic design, has a good eye for UI, and I knew he would be thorough and would not pull any punches. His opinion on whether something looks and feels right is worth more to me than most. He also owns an older Intel Mac with 16 GB of RAM, and one of his external disks had something like 13 million files on it.

He ran a full disk scan. The first thing he reported was not the memory. It was this: “I don’t understand what the black blocks are.” Dark rectangles scattered across the treemap with no visible text. I spent an embarrassing amount of time chasing a rendering bug: font fallback issues, color space differences between Intel and Apple Silicon, dark mode edge cases. None of it was the problem. The problem was that he had unchecked the “Directory headers” display option, which hid the label text but kept the header background visible. Black blocks with no text. I had not made the connection because I had never turned that option off myself.

Then he mentioned, politely, that Activity Monitor showed Renala using 1.2 GB of RAM during the scan and that his machine had started swapping.

I had run the same scan dozens of times on an M3 Pro with 18 GB. I had never once looked at Activity Monitor. Every developer thinks their machine is representative.

The original data model was straightforward. FileNode was a class. Each instance held a UUID, a stored path string, an array of children, size, modification date, and a few flags. For 5.2 million files, it was a problem.

Five million class instances means five million separate heap allocations. Each one carries a reference count, type metadata, and alignment padding that you never asked for. Heap fragmentation means the actual memory consumed is larger than the sum of the object sizes. On a machine with 16 GB of RAM, five million objects scattered across virtual memory is the difference between “runs fine” and “the OS starts swapping.”

The cost per FileNode broke down roughly like this:

Before (per FileNode class instance):
  UUID:        16 bytes
  path String: ~40 bytes average (heap)
  children []: 24 bytes header + 8 per child
  metadata:    ~24 bytes
  Swift object overhead: ~16 bytes
  Total per node: ~120 bytes
  × 5.2M nodes = ~624 MB (plus heap fragmentation)

After (SlimNode struct in ContiguousArray):
  parentIndex:  4 bytes (UInt32)
  nameIndex:    4 bytes (UInt32)
  size:         8 bytes (Int64)
  nodeType:     1 byte
  flags:        1 byte
  pad:          2 bytes
  Total per node: 20 bytes
  × 5.2M nodes = ~104 MB
  + name pool: ~156 MB (UTF-8, ~30 bytes avg name)
  Total: ~260 MB

The UUID alone was 16 bytes per node. For 5.2 million nodes: 83 MB just for identifiers. The fix for that part is arithmetic: switch to a UInt32 NodeID, 4 bytes each, total 21 MB. A UInt32 can address 4.2 billion nodes, which is more than any disk scan will ever produce.

The stored path string was the larger problem.

Every node stored its full path. /Users/dinh/Library/Application Support/SomeApp/SomeSubfolder/somefile.dat. That string is reconstructible at any time from two pieces of data you already have: the parent’s path and the node’s own name. Storing the full path is redundant. But redundancy at scale stops being a design choice and starts being a memory leak: an average path of 40-50 bytes, multiplied by 5.2 million nodes, is 200 MB of strings that do not need to exist.

Remove stored paths. Reconstruct on demand by walking up the parent chain. The name stays, because you need it for display and for reconstruction. The path goes.

graph TD
    subgraph FN["FileNode (class instance, heap)"]
        FN1["UUID — 16 bytes"]
        FN2["path String — ~40 bytes (heap)"]
        FN3["children: [FileNode] — 24 bytes header<br/>+ 8 bytes per child pointer"]
        FN4["metadata (size, mtime, flags) — ~24 bytes"]
        FN5["Swift object overhead — ~16 bytes"]
        FN6["Per node: ~120 bytes<br/>5.2M nodes: ~624 MB + heap fragmentation"]
    end
    subgraph SN["SlimNode (struct, ContiguousArray)"]
        SN1["parentIndex: UInt32 — 4 bytes"]
        SN2["nameIndex: UInt32 — 4 bytes"]
        SN3["size: Int64 — 8 bytes"]
        SN4["nodeType + flags + pad — 4 bytes"]
        SN5["Per node: 20 bytes<br/>5.2M nodes: ~104 MB + ~156 MB name pool"]
    end
    FN6 -.->|"replaced by"| SN5

Structure	Per-node cost	5.2M nodes
FileNode (class, UUID)	~120 bytes	~624 MB
SlimNode (struct, UInt32)	~50 bytes	~260 MB

Approximate figures. SlimNode cost includes the proportional name pool share (~30 bytes/node average). Actual working-set memory adds scan metadata and treemap rectangle cache on top.

Drag the slider to change the node count. The memory bars update in real time, showing the difference between the class-based tree and the flat struct array.

Removing paths meant rethinking children too.

The original FileNode held an [FileNode] array for its children. That array header is 24 bytes (pointer, length, capacity). Each element is an 8-byte pointer to a heap-allocated child object. For a directory with 50 children: 24 + 400 bytes, plus 50 separate heap allocations for the children themselves.

The NodeStore replaces all of this with a ContiguousArray<SlimNode>. Every node in the entire scan lives in one contiguous block of memory. Parent-child relationships are expressed as index pairs: each SlimNode stores a parentIndex (the position of its parent in the array) and a nameIndex (the offset of its name in a separate UTF-8 name pool). To find a node’s children, you scan the array looking for nodes whose parentIndex matches the parent’s position.

The array is populated in depth-first order during the scan, so children sit adjacent to their parents in memory. Finding all children of node N means scanning a contiguous segment of the array. No pointer chasing, no random memory access.

ContiguousArray matters here. Swift’s standard Array does not guarantee contiguous storage. ContiguousArray does: elements are packed in memory, the CPU prefetcher can predict sequential access, and cache lines load ahead of the read. For 5.2 million nodes, this is the difference between cache hits and cache misses on every traversal.

Hover any node in the tree to trace it in memory. Left: FileNode class instances scattered across the heap, connected by pointers that jump unpredictably. Right: SlimNode structs packed in depth-first order in a single ContiguousArray, children adjacent to their parent. The outlined cells show the CPU prefetch window.

The name pool is a single Data blob holding all names concatenated as raw UTF-8. The nameIndex in SlimNode is a byte offset into that pool. Names end at the next name boundary, tracked by a parallel offsets array.

One blob, one allocation, no per-name heap overhead. The average filename is around 30 bytes. 5.2 million names: roughly 156 MB in the pool, stored in one contiguous allocation. Compare that to 5.2 million separate String heap allocations, each with its own reference counting and metadata.

The path reconstruction when you need it looks like: read the node’s nameIndex, read the parent’s nameIndex, walk up the parentIndex chain until you hit the root, concatenate with slashes. You only do this when the UI needs to display a path or open a file. It is never on any hot traversal path.

The result: 1.2 GB down to 469 MB at idle for a 5.2M node scan on an M3 Pro. On the Intel Mac, no more swapping. The scan completes, memory stabilizes, and the rest of the operating system can breathe.

The 469 MB is higher than the theoretical minimum from the layout math above because the FileNode tree is kept alive for a period during the transition, scan metadata adds overhead, and the treemap rectangle cache consumes additional memory during rendering. The 260 MB structural estimate was for the store alone. In practice, the working set is larger. But it is 60% smaller than what shipped initially, and the reduction is felt immediately on constrained hardware.

The parent-child relationships are the same. The names, sizes, and types are the same. What changed is how memory is laid out: one allocation containing everything, with integer indices where pointers used to be.

Game developers and systems programmers call this the “struct of arrays” pattern. It is less common in application code because the scale rarely justifies it. At five million nodes, the scale does not ask for permission.

The friend with the Intel Mac sent a message after the fix: “It seems much faster than yesterday, even with thermal throttling. Good.” And then: “I get the impression there are no more memory leaks.” There never were memory leaks. There was just too much memory being used on purpose. Somehow that felt worse.

References

Identify trends with the Power and Performance API: WWDC 2020
Detect and diagnose memory issues: WWDC 2021
Memory Usage Performance Guidelines: Apple Developer Library
Locality of reference: Wikipedia
ContiguousArray: Swift Documentation