Testudo

Simple embedded database for java

This project is a practice for implementing a minimal database system. The practice includes different indexing mechanisms (B+Tree and Bitmaps for example), dealing with data storage on disk (Page and Page buffers), caching (LRU), locking (Reader-Writer Lock), parsing and other topics.

Introduction

Notice: the idea of the project is still evolving! Anything that you may read here is open for changes in future.

The library implemented in this project repository empowers a java application to store data in collection-field formats and it provides the sensation - and minimal features - of a database system.

The project is meant to be a practice so that the developer gets involved with some software engineering aspects and gain knowledge, and it may not solve a real world problem that other database implementations (including the embedded ones) don't solve already.

I started studying a tutorial called "Let's Build a Simple Database" which is about "writing a sqlite clone from scratch in C" till I met BTree algorithm section. Later, I understood more about BTree and B+Tree from "B Trees and B+ Trees. How they are useful in Databases" on Youtube, and eventually found myself working on this mini project.

The thought process, progress and more details of this project is explained on a youtube playlist called "Write a database from scratch". If you are visiting this repository from Youtube, welcome. If not, I suggest you to take a look at the playlist.

Development Progress

Open Problems

Can't perform `query` operation on fields that are not indexed.

Doing so right now will make us use cluster index, load objects into memory to perform comparisons, and the result would be Iterator<V> where V is cluster id. This means that these objects may later get loaded into memory again. We need a solution to avoid loading objects twice (once for query, once for the higher level operation such as read/update/delete)

Additional Note: from performance point of view things are not as awful as it seems:

We have a pool of DBObjects per each page that is loaded in Page Buffer. Pages may already be in LRU cache, and DBObject pool prevents recreation of new objects in memory (more of a way to reduce memory consumption than performance improvement)
We shall use LRU cache for Cluster Index Manager, which means re-reading objects from the cluster index should perform quicker than hitting the disk multiple times.

Bitmap Max Size

While cluster IDs are set to support Unsigned Long, current implementation and usage of Bitmap in indexes can't support any value larger than an integer. This is because a byte[] is used and the index passed to an array in java can only be an integer.

Storing indexes in same file as DB (DONE | NOT TESTED)

The current implementation has a problem with this, since two instances of DiskPageDatabaseStorage will be created and synchronized blocks wouldn't perform validly.

Even though current tests work, when we work with multiple collections things will break. The reason CollectionSelectInsertOperationMultiThreadedTestCase can work with Page Buffer is that we lock the collection in DefaultCollectionInsertOperation, so even though we have multiple instances of DiskPageDatabaseStorage (one for DB and one for index), their .store() method won't be called from multiple threads.

Note: just using same instance is not enough. The type of the pointer returned by store method would be different then. Maybe we should not let the Database Storage Manager handle pointer types? seems totally unnecessary!

Name		Name	Last commit message	Last commit date
Latest commit History 258 Commits
.docs/assets		.docs/assets
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Testudo

Introduction

Development Progress

Open Problems

Can't perform `query` operation on fields that are not indexed.

Bitmap Max Size

Storing indexes in same file as DB (DONE | NOT TESTED)

About

Releases

Packages

Languages

License

sepgh/testudo

Folders and files

Latest commit

History

Repository files navigation

Testudo

Introduction

Development Progress

Open Problems

Can't perform query operation on fields that are not indexed.

Bitmap Max Size

Storing indexes in same file as DB (DONE | NOT TESTED)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Can't perform `query` operation on fields that are not indexed.

Packages