The Scanner Bottleneck: One Syscall per Directory
Every abstraction has a per-unit tax. At 50 files, you never notice. At 5.2 million, the tax is the entire bill.
FileManager asks the kernel about each file the way a tourist asks for directions one street at a time. getattrlistbulk asks for the whole map.
This article is about when to punch through an abstraction layer, and how to contain the damage when you do.
The Abstraction Tax
The first working scanner used FileManager.enumerator. You probably would have written the same thing: idiomatic Swift, clean API, handles symlinks and errors, does exactly what the documentation says. A few dozen lines and the whole filesystem is yours.
It worked. It returned correct results. And it was slow enough to make me question my life choices.
Here is the non-obvious part: the kernel was not the bottleneck. contentsOfDirectory(at:includingPropertiesForKeys:) can batch attribute prefetch per directory.1 The kernel had everything ready. The slowdown was the wrapping.
Each file gets its own URL allocation, ObjC-to-Swift bridging, and resourceValues dictionary construction. At 50 files, that ceremony is invisible. At 5.2 million files, it dominates the profile. The kernel caches directory metadata in memory; the question is how much overhead your code adds on top of each read. Comfortable for dozens, punishing for millions.
Measure, Then Cut
Don’t guess where the bottleneck is. Profile.
Before dropping to a lower-level API, it helps to map which layer actually owns the cost. Each POSIX alternative eliminates some overhead but leaves the per-item core intact. Think of it as a decision tree:
getdirentries64 returns directory entries in bulk, including names and file types, but not size or modification date. You still need a per-item stat call for those. This eliminates Swift wrapping but not the syscall-per-file problem.
fts_open / fts_read is a higher-level POSIX tree-walking API, but it processes one entry at a time and calls stat per entry by default.2 Higher-level than raw syscalls, still per-entry.
Parallelism (via TaskGroup) spreads the per-item cost across cores. It does not remove it. Four threads doing redundant work is not faster than one thread doing the right work.
Each option peels away one layer of overhead but leaves the fundamental cost structure unchanged: one kernel interaction per file, one result per call. The goal is not to speed up the per-item path. The goal is to eliminate it.
The Escape Hatch
getattrlistbulk is the API Apple has had since OS X 10.10. One system call per directory. The kernel packs all file attributes for every entry in that directory into a single buffer. Name, type, size, modification date: all of it, in one read. Finder and Spotlight almost certainly use it or something equivalent.3 It is documented in the BSD layer of the macOS SDK. It is just not what you reach for when you are writing Swift and feeling civilized about it.
graph LR
accTitle: FileManager vs getattrlistbulk system call comparison
accDescr: FileManager calls resourceValues() per item with per-item overhead, producing one FileNode at a time. getattrlistbulk makes one syscall per directory and returns all entries in a packed buffer, achieving 1.7x faster throughput for 5.2 million nodes.
subgraph FM["FileManager approach"]
FM1[Directory entry] -->|"resourceValues()<br/>per-item overhead"| FM2[Single attribute dict]
FM2 -->|repeat per file| FM3[FileNode]
end
subgraph PX["getattrlistbulk approach"]
PX1[Directory] -->|"getattrlistbulk()<br/>1 syscall per directory"| PX2["Packed buffer<br/>(all entries, all attrs)"]
PX2 -->|walk buffer| PX3[FileNode per entry]
end
FM3 -.->|"5.2M nodes"| OUT[ScanResult]
PX3 -.->|"5.2M nodes: 1.7x faster"| OUT
One implementation detail worth keeping
The buffer parsing is pointer arithmetic over packed binary data: read a 4-byte record length, advance, extract fields.4 One detail matters enough to call out.
When reading fields from a raw buffer, the safe approach is memcpy into a local variable rather than casting the pointer directly. The reason is not hardware faults (Apple Silicon and Intel both handle unaligned loads) but C strict-aliasing rules: the compiler assumes pointers of different types never point to the same memory. A cast like *(uint32_t *)ptr on a char * is undefined behavior, and the compiler is free to miscompile it.5
// Correct: memcpy before reading a 4-byte field
uint32_t size;
memcpy(&size, ptr, sizeof(size));
ptr += sizeof(size);
// Wrong: strict-aliasing violation, undefined behavior in C
uint32_t size = *((uint32_t *)ptr);
In practice, the direct cast works on every Apple architecture shipped in the last 15 years. You use memcpy anyway, because “works on my machine” is not a correctness argument when the C standard says otherwise.
Edge cases: the price of the escape hatch
When you punch through an abstraction, you inherit the edge cases that abstraction was handling for you.
FAT32 volumes. External drives formatted as FAT32 do not support all attribute combinations. The call returns ENOTSUP, and the scanner falls back to FileManager for the entire volume.6 An escape hatch has a price: you own the fallbacks.
iCloud drives. An evicted iCloud file reports zero for ATTR_FILE_ALLOCSIZE (bytes on disk), because the data is not here. Use that number naively, and your entire iCloud library vanishes from the disk map. The fix is to detect cloud-backed paths and use ATTR_FILE_DATALENGTH (logical size) instead.78 This is the kind of subtlety that high-level APIs hide from you, and that low-level APIs expose.
Containment: punch through, then stop
The implementation lives in POSIXDirectoryScanner, isolated behind a DirectoryScannerProtocol. The rest of the app calls the protocol. It does not know or care whether it is talking to FileManager or to raw syscalls.
This is the key design lesson. Punch through exactly one layer, behind a clean boundary, then stop. The escape hatch exists inside a box. The rest of the codebase stays idiomatic. If Apple ever ships a better API, you replace one conformance and nothing else changes.
graph TD
accTitle: Recursive directory scanning with attribute buffering
accDescr: Starting from a volume root, each directory is scanned with getattrlistbulk in one syscall. The packed attribute buffer is parsed into FileNode structs. Child directories recurse. Files accumulate into the tree, producing a final ScanResult.
V[Volume root] --> D[Directory]
D -->|getattrlistbulk<br/>one syscall| B[Attribute buffer]
B -->|parse packed struct| N[FileNode]
N -->|child dir?| D
N -->|file| T[Tree accumulation]
T --> SR[ScanResult]
The Bottleneck Moves
All timings from PerfLogWriter instrumentation, Debug build (-Onone) on M3 Pro (18 GB). Protocol: 3 warm-up scans to fill filesystem cache, measure on 4th. Four runs per method, reporting median.9
| Metric | FileManager | POSIX | Improvement |
|---|---|---|---|
| Scan (5.2M nodes) | 43.6s | 25.7s | 1.70× |
| Cache save | 7.1s | 3.6s | 1.97× |
| End-to-end | 50.7s | 29.3s | 1.73× |
Warm-cache methodology was a deliberate choice: cold-cache timings depend on too much uncontrolled system state to give reproducible baselines. The cold-cache delta would likely be larger, but I never measured it systematically.10
The 1.7x was real but humbling. I had targeted 2.5-3x. The remaining scan time tells you why: 5.2 million FileNode allocations and String constructions that no syscall change can fix. The work shifted from “asking the kernel” to “building the tree.”
Sometimes you optimize the right thing and the bottleneck just moves. That is not failure. That is information. You follow it. That tree construction cost becomes the subject of the next article: replacing 5.2 million heap-allocated class instances with a flat struct array.
References
- getattrlistbulk(2) man page: macOS man page mirror
- File System Programming Guide: Apple Developer Library
- POSIX: Wikipedia
Footnotes
-
FileManager.contentsOfDirectory(at:includingPropertiesForKeys:)can prefetch specified resource keys in bulk per directory viagetattrlist(). The per-item overhead comes from constructing aURLfor each entry, bridging from Objective-C to Swift, and building theresourceValuesdictionary. Whether this involves one kernel crossing per file or one per directory depends on whichFileManagerAPI is used and which keys are requested. ↩ -
The
FTS_NOSTATflag suppresses the per-entrystatcall when only file type (available fromd_typein the directory entry) is needed. The article describes the default behavior. ↩ -
Apple has not publicly documented which syscalls Finder or Spotlight use internally. Community DTrace investigations suggest bulk attribute APIs, but “suggest” and “confirmed” are different things. ↩
-
You allocate a fixed-size buffer (the scanner uses 256 KB) and call
getattrlistbulkin a loop. Each call fills the buffer with as many records as fit. When the kernel returns 0, the directory is exhausted. For a directory with 50,000 entries, this might take a handful of calls rather than one, but each call still returns hundreds of entries at once. ↩ -
The C standard (C11 §6.5/7) defines accessing an object through a pointer of incompatible type as undefined behavior.
memcpyavoids this because it operates onchar *, which is allowed to alias any type. In practice, Apple’s Clang with-fno-strict-aliasing(the default for many projects) would not miscompile the direct cast, but relying on compiler flags for correctness is fragile. ↩ -
The detection is reactive, not proactive: the scanner attempts
getattrlistbulkon the first directory and checks the return code. If it getsENOTSUPor an attribute count of zero, it switches to theFileManagercode path for the entire volume. There is no upfront probe. ↩ -
These are user-space paths within the home directory, not volume mount points. iCloud Drive does not appear as a separate volume; it is a directory tree on the APFS system volume managed by File Provider. ↩
-
String-matching on path prefixes is a heuristic, not an API contract. Apple could relocate these directories in a future macOS version. There is no public API to ask “is this path cloud-backed?” The File Provider framework exposes domain information, but querying it per file during a scan would reintroduce per-item overhead. The path check is a pragmatic tradeoff: fragile in theory, stable across every macOS release tested so far. ↩
-
With only four runs, the median is the average of the 2nd and 3rd values when sorted. This is not a robust estimator: a single outlier shifts it meaningfully. The numbers should be read as indicative, not precise. A more rigorous protocol would use n ≥ 7 runs and report range or standard deviation. ↩
-
This is a Debug build (
-Onone), which inflates Swift runtime overhead (ARC, dynamic dispatch, bounds checking) disproportionately on the FileManager side. The 1.7x ratio in a Release build could differ, since compiler optimizations would reduce the per-item abstraction cost. Without profiling data that isolates syscall time from runtime overhead, the exact contribution of each factor is unknown. ↩
