Storage Format
One sealed segment file per (table, day-partition, channel): the channel's (entity, ts)-sorted samples, split into blocks, with a sparse index in the footer.
Layout
"XCSG" | version u8 | table uvarint | channel uvarint | kind u8
block* rowCount | entities col | timestamps col | values col
footer per block: entity span, ts min/max, offset, length, count
footerLen u32 | "XCSG"
Default 8192 rows per block. Writers enforce (entity, ts) order and write to path.tmp, renaming on seal — a crashed merge never leaves a half-segment behind. Readers mmap the file; the sparse index (one entry per block) lives decoded in memory, blocks decode on demand. No buffer pool: the OS page cache is the cache.
Encodings
| Column | Encoding | Why it wins on IoT data |
|---|---|---|
| timestamps | delta-of-delta, zigzag varints | steady cadence → ~1 byte/point |
| f64 values | Gorilla/XOR bit-packing | slow drift → ~1 bit/point when constant |
| strings | adaptive: dict+RLE or block-zstd | engine picks by cardinality; state channels collapse to a handful of runs |
| entities | same adaptive string encoder | few devices per block → dictionary |
The user only ever declares f64 or string; encoding choice is the engine's, made per block by looking at the data (a heuristic, not a model).
Block skipping
Scan(entity, from, to) binary-searches the sparse index by entity span, then skips blocks whose [minTS, maxTS] window misses the range. Only surviving blocks are decoded.