Project Obelix

GitHub: obelix

Project Obelix applies Java 8's built in parallel streams on huge persistent data structures. (With persistent data we mean persistent in the usual way, data surviving a Java VM restart, not in the sense of Clojure's persistent collections, immutable collections where each change creates a new collection).
It provides a layered approach.
Package com.amplifino.obelix.space provides the bytespace abstraction: A 64 bit address space that can be mapped to a variety of backing stores: Small spaces can be mapped to regular files, large spaces to sparse files, and huge spaces to a directory of sparse files. Files are typically opened memory mapped, but regular file io is also possible.
For testing the bytespace can be backed by Java Heap Memory
At the other end package com.amplifino.obelix.sets provides the abstract interface to model large collections. It inherits its terminology from mathematical set theory, and more specific binary relations.
Package com.amplifino.obelix.injections takes care of converting domain objects to byte arrays and back.
The remaining packages implement some basic data structures and algorithms:

A variable record length segment with space management
Several hashing algorithms exploring the space - access time tradeoff
A concurrent BTree implementation
A timeseries module using a direct conversion from timestamp to logical address

Note: If you are on MacOS take care when running the test cases, as the default file system on MacOS does not support sparse files, and you may quickly run out of disk space after allocating a few TB files.