Memory That Collaborates

Miriam Vogt 4 days agoLast Updated: April 14, 2026

3 5 minutes read

Memory That Collaborates When two teams need to combine data, the usual answer is infrastructure: an ETL pipeline, an API, a message bus

Each adds latency, maintenance burden, and a new failure mode

The data moves because the systems can’t share it in place

If your database is an immutable value in storage, then anyone who can read the storage can query it

No server to run, no API to negotiate, no data to copy

And if your query language supports multiple inputs, you can join databases from different teams in a single expression

This is how Datahike works

It isn’t a feature we bolted on – it intentionally falls out of two properties fundamental to the architecture

Databases are values In a traditional database, you query through a connection to a running server

The data may change between queries

The database is a service, not something you hold

Dereference a connection (@conn) and you get an immutable database value – a snapshot frozen at a specific transaction

Pass it to a function, hold it in a variable, hand it to another thread

Two concurrent readers holding the same snapshot always agree, without locks or coordination

This is an idea Rich Hickey introduced with Datomic in 2012: separate process (writes, managed by a single writer) from perception (reads, which are just values)

The insight was that a correct implementation of perception does not require coordination

Datomic’s indices live in storage, but its transactor holds an in-memory overlay of recent index segments that haven’t been flushed yet

Readers typically need to coordinate with the transactor to get a complete, current view

The storage alone isn’t enough

Datahike removes that dependency

The writer flushes to storage on every transaction, so storage is always authoritative

Any process that can read the store sees the full, current database – no overlay, no transactor connection needed

To understand why this works, you need to see how the data is structured

Trees in storage Datahike keeps its indices in a persistent sorted set – a B-tree variant where nodes are immutable

Every node is stored as a key-value pair in konserve, which abstracts over storage backends: S3, filesystem, JDBC, IndexedDB

When a transaction adds data, Datahike doesn’t modify existing nodes

It creates new nodes for the changed path from leaf to root, while the unchanged subtrees are shared with the previous version

This is structural sharing – the same technique behind Clojure’s persistent vectors and Git’s object store

A concrete example: a database with a million datoms might have a B-tree with thousands of nodes

A transaction that adds ten datoms rewrites perhaps a dozen nodes along the affected paths

The new tree root points to these new nodes and to the thousands of unchanged nodes from before

Both the old and new snapshots are valid, complete trees

They just share most of their structure

The crucial property: every node is written once and never modified

Its key can be content-addressed

This means nodes can be cached aggressively, replicated independently, and read by any process that has access to the storage – without coordinating with the process that wrote them

(For more on how structural sharing, branching, and the tradeoffs work, see The Git Model for Databases

) The distributed index space This is where it comes together

When you call @conn, Datahike fetches one key from the konserve store: the branch head (e

This returns a small map containing root pointers for each index, schema metadata, and the current transaction ID

Nothing else is loaded – the database value you receive is a lazy handle into the tree

When a query traverses the index, each node is fetched on demand from storage and cached in a local LRU

Subsequent queries hitting the same nodes pay no I/O

That’s the entire read path

No server process mediating access, no connection protocol, no port to expose

The indices live in storage, and any process that can read the storage can load the branch head, traverse the tree, and run queries

We call this the distributed index space

Two processes reading the same database fetch the same immutable nodes independently

They don’t know about each other

A writer publishes new snapshots by writing new tree nodes, then atomically updating the branch head

Readers that dereference afterward see the new snapshot

Readers holding an earlier snapshot continue undisturbed – their nodes are immutable and won’t be garbage collected while reachable

Joining across databases Because databases are values and Datalog natively supports multiple input sources, the next step is natural: join databases from different teams, different storage backends, or different points in time – in a single query

Team A maintains a product catalog on S3

Team B maintains inventory on a separate bucket

A third team joins them without either team doing anything: Each @ dereference fetches a branch head from its respective S3 bucket and returns an immutable database value

The query engine joins them locally

There is no server coordinating between the two, no data copied

And because both are values, you can mix snapshots from different points in time: The old snapshot and the current one are both just values

The query engine doesn’t care when they’re from

This is useful for audits, regulatory reproducibility, and debugging: “what would this report have shown against last quarter’s data?” From storage to browsers So far, “storage” has meant S3 or a filesystem

But konserve also has an IndexedDB backend, which means the same model works in a browser

Using Kabel WebSocket sync and konserve-sync, a browser client replicates a database locally into IndexedDB

Queries run against the local replica with zero network round-trips

Updates sync differentially – only changed tree nodes are transmitted, the same structural sharing that makes snapshots cheap on the server makes sync cheap over the wire

A complete cross-database join, runnable in a Clojure REPL: Replace :memory with :s3, :file, or :jdbc and the same code works across storage backends

The databases don’t need to share a backend – join an S3 database against a local file store in the same query

Mary Earps bids England farewell—and pushes women’s goalkeeping forward
Geese cover Justin Bieber’s “Baby” at Coachella

Miriam Vogt 4 days agoLast Updated: April 14, 2026

3 5 minutes read