Memory That Collaborates
Memory That Collaborates When two teams need to combine data, the usual answer is infrastructure: an ETL pipeline, an API, a message bus
Each adds latency, maintenance burden, and a new failure mode
The data moves because the systems can’t share it in place
If your database is an immutable value in storage, then anyone who can read the storage can query it
No server to run, no API to negotiate, no data to copy
And if your query language supports multiple inputs, you can join databases from different teams in a single expression
This is how Datahike works
It isn’t a feature we bolted on – it intentionally falls out of two properties fundamental to the architecture
Databases are values In a traditional database, you query through a connection to a running server
The data may change between queries
The database is a service, not something you hold
Dereference a connection (@conn) and you get an immutable database value – a snapshot frozen at a specific transaction
Pass it to a function, hold it in a variable, hand it to another thread
Two concurrent readers holding the same snapshot always agree, without locks or coordination
This is an idea Rich Hickey introduced with Datomic in 2012: separate process (writes, managed by a single writer) from perception (reads, which are just values)
The insight was that a correct implementation of perception does not require coordination
Datomic’s indices live in storage, but its transactor holds an in-memory overlay of recent index segments that haven’t been flushed yet
Readers typically need to coordinate with the transactor to get a complete, current view
The storage alone isn’t enough
Datahike removes that dependency
The writer flushes to storage on every transaction, so storage is always authoritative
Any process that can read the store sees the full, current database – no overlay, no transactor connection needed
To understand why this works, you need to see how the data is structured
Trees in storage Datahike keeps its indices in a persistent sorted set – a B-tree variant where nodes are immutable
Every node is stored as a key-value pair in konserve, which abstracts over storage backends: S3, filesystem, JDBC, IndexedDB
When a transaction adds data, Datahike doesn’t modify existing nodes
It creates new nodes for the changed path from leaf to root, while the unchanged subtrees are shared with the previous version
This is structural sharing – the same technique behind Clojure’s persistent vectors and Git’s object store
A concrete example: a database with a million datoms might have a B-tree with thousands of nodes
A transaction that adds ten datoms rewrites perhaps a dozen nodes along the affected paths
The new tree root points to these new nodes and to the thousands of unchanged nodes from before
Both the old and new snapshots are valid, complete trees
They just share most of their structure
The crucial property: every node is written once and never modified
Its key can be content-addressed
This means nodes can be cached aggressively, replicated independently, and read by any process that has access to the storage – without coordinating with the process that wrote them
(For more on how structural sharing, branching, and the tradeoffs work, see The Git Model for Databases
) The distributed index space This is where it comes together
When you call @conn, Datahike fetches one key from the konserve store: the branch head (e
This returns a small map containing root pointers for each index, schema metadata, and the current transaction ID
Nothing else is loaded – the database value you receive is a lazy handle into the tree
When a query traverses the index, each node is fetched on demand from storage and cached in a local LRU
Subsequent queries hitting the same nodes pay no I/O
That’s the entire read path
No server process mediating access, no connection protocol, no port to expose
The indices live in storage, and any process that can read the storage can load the branch head, traverse the tree, and run queries
We call this the distributed index space
Two processes reading the same database fetch the same immutable nodes independently
They don’t know about each other
A writer publishes new snapshots by writing new tree nodes, then atomically updating the branch head
Readers that dereference afterward see the new snapshot
Readers holding an earlier snapshot continue undisturbed – their nodes are immutable and won’t be garbage collected while reachable
Joining across databases Because databases are values and Datalog natively supports multiple input sources, the next step is natural: join databases from different teams, different storage backends, or different points in time – in a single query
Team A maintains a product catalog on S3
Team B maintains inventory on a separate bucket
A third team joins them without either team doing anything: Each @ dereference fetches a branch head from its respective S3 bucket and returns an immutable database value
The query engine joins them locally
There is no server coordinating between the two, no data copied
And because both are values, you can mix snapshots from different points in time: The old snapshot and the current one are both just values
The query engine doesn’t care when they’re from
This is useful for audits, regulatory reproducibility, and debugging: “what would this report have shown against last quarter’s data?” From storage to browsers So far, “storage” has meant S3 or a filesystem
But konserve also has an IndexedDB backend, which means the same model works in a browser
Using Kabel WebSocket sync and konserve-sync, a browser client replicates a database locally into IndexedDB
Queries run against the local replica with zero network round-trips
Updates sync differentially – only changed tree nodes are transmitted, the same structural sharing that makes snapshots cheap on the server makes sync cheap over the wire
A complete cross-database join, runnable in a Clojure REPL: Replace :memory with :s3, :file, or :jdbc and the same code works across storage backends
The databases don’t need to share a backend – join an S3 database against a local file store in the same query