Inside embedded Node.js database

Data storage

Database stored as a folder where each file represent single collection. Collection files used in append only mode which ensures safe access to data but can cause space overuse as you update your data. As workaround for this database will make automatic compactization when overuse ratio grows above configured limit.

Documents itself serialized using …. surprise…. JSON. However at the time when we started implementation we was not sure that JSON is best format. We did some benchmarks using BSON, MessagePack ans some others. The winner was JSON. Probably this is because it is native for JavaScript and highly optimized. Every document represented by 3 objects in data file. First is constant size header, second is variable size header and the last is document itself. This approach allows to increase initial file load speed and give us some freedom for future changes.

Memory cache

MongoDB uses memory mapping which employ Linux Kernel caching and it is perfect solution. But we also need cache because without it things will be too slow. Every query will pull data twice when it does actual search and then when results fetched. So TingoDB has size limited memory cache which use the fastest possible replace strategy. We tried more sophisticated options like LRU but they are not effective.

Data types

We support build-in JavaScript/NodeJS types (String, Number, Date, Buffer, Object, Array) plus ObjectID. All of this naturally mapped into JSON. Date, Buffer and ObjectID have specific handling during serialization with goal to preserve types in deserialization. There is also specific Code class similar to BSON, but it not serialized and used only in API.

Indexes

Indexes are represented by in-memory B-tree lists. They are wrapped to high level objects that implement MongoDB specific behaviour (sparse indexes, unique and so on). Indexes are not serialized and recreated every time you load database. This is not the most efficient approach but for initial design goals it is more than enough.

Search & Sorting

Search fully mimics MongoDB behaviour, all operators are implemented. For optimal speed MongoDB query constructions are compiled into static loop-less functions and use standard JavaScript comparison operations. Surprisingly this approach works very fast and highly compatible with MongoDB. Even regular expressions work out of the box with same code. Initially we tried to use dynamic approach that we found in other libraries. However based on further benchmarks we found that it is very slow solution compared to static functions.

Search use indexes when possible. We didn’t make any super smart optimizations so this code is relatively simple. But still indexes are used in almost all operators and can work with multiple indexes. Cursor limit option will always speedup queries. Skip options works as well, but it will be most effective when query use only indexes.

Sorting will use existing indexes when possible even if they are not part of query itself. When indexes absent sorting will still work by using dynamic indexes.

Concurrency

Every collection has its own work queue. Read only operations are non blocking and executed in parallel. Write operations are blocking and executed in sequence. Write operations will wait for completion of current read only operations and block collection until completion. Search operation will return cursor object that is consistent with the data that was in database on the moment of query. All updates that will happens after query was executed will not be visible for cursor consumers.

Custom ObjectID

Based on our benchmarks we found that integer keys will speedup things a lot. For in-process database integers almost not have any drawbacks comparing to GUIDs. So by default TingoDB will use its own implementation of ObjectID which will generate integer keys that will be unique in collection scope. ObjectID API and behaviour is designed to be close compatible with BSON.ObjectID and with some small hacks it is possible to write code that will work with both transparently. If you prefer BSON.ObjectID you can enable it using configuration option.

Tests

So far we have near to 3 hundreds of tests that give us 95% code coverage. Significant part of tests is taken as is from MongoDB NodeJS driver project. Tests designed to work both on MongoDB and TingoDB to ensure exactly same behaviour. Same tests launched twice for BSON.ObjectID and TingoDB.ObjectID to ensure that they work equally.