Creating a binary storage format from scratch

Sem Postma, January 30, 2024

Binary formats are often the most efficient and performant, the drawback being that they are not human-readable.

In this article, I’ll be using

Key/Column order

The key order is extremely important. The order of the keys determines how to index is structure. An easy way to think about it, is that every entry in a normal index points to a row. Like the following example where the author_id is mapped to the pages table using it’s primary key.

# normal index (pages table):
# author_id:    1   2   2   3   1   1  
#               |   |   |   |   |   |
# page_id:      1   2   3   4   5   6
CREATE INDEX normal_index ON pages (author_id);
SELECT * FROM pages WHERE author_id = 1;

In a composite index, the first key of the index points to the next key and the last key points to the table itself like this:

# composite index (pages table):
# author_id:          1             2       3
#                /    |    \      /   \     |
# view_count:  323   97    123  432   14  15421
#               |     |     |    |     |    |
# page_id:      1     5     6    2     3    4
CREATE INDEX normal_index ON pages (author_id, view_count);
SELECT * FROM pages WHERE author_id = 1 AND view_count > 100;

The most basic rule to composite indexes is to put columns in the `WHERE` statement, in the first part of the composite index, and the `ORDER BY` column as the last key in the index:

SELECT * WHERE category_id = 30 AND author_id = 40 ORDER BY created_at;
CREATE INDEX my_composite_index ON pages (category_id, author_id, created_at);

If you leave out any columns in the WHERE query, the database cant use the last column for ORDER BY. This is a called a “covering index”, which is when the index covers all of the columns in a query (in this case a “covered query”). If you also `SELECT` only columns from the index, the database won’t even look up the table and will just use the INDEX itself.

The following composite indexes will all work for the previous query:

CREATE INDEX my_composite_index_2 ON pages (category_id, author_id); # no ORDER BY column
CREATE INDEX my_composite_index_3 ON pages (author_id, category_id, created_at) # WHERE columns in a different order

In some edge cases, the database will still use your composite index, even if column in the `WHERE` statement is missing from the index. It will skip that column, lookup the index and then filter the records in the index based on the missing column. This only works of the query optimizer determines that the missing column is not very important to the query. If you included the `ORDER BY` column  in the composite index, the database won’t be able to use it in this case.

Most likely your query will include other operations than just checking for equality. Like range conditions. If you are using a range condition, it should be the last of the `WHERE` columns, but before the `ORDER BY` columns. Like this:

Putting view_count first will not work because the database will do a range scan of “view_count”, after that it will return ALL records with more than a 1000 views regardless of the category_id (remember that the composite index works from left to right).

If you are using multiple range conditions or you have a range condition with an `ORDER BY` column, the database will only use the first condition. In short, a range condition or `ORDER BY` column should be the last column in your composite index. Any subsequent columns are ignored. A software engineer at Mollie wrote a blog post about this  here. There are ways to work around this issue if you really need to. For example, you could rewrite this:

CREATE INDEX my_slower_index ON pages (author_id, created_at);
# pages which are less than 3 days old, ordered by view_count
SELECT * FROM pages 
WHERE author_id = 1 
AND DATEDIFF(NOW(), created_at) < 3 
ORDER BY view_count 
LIMIT 10 OFFSET 10; 

Into this (using a functional index):

CREATE INDEX my_fast_index ON pages (author_id, (TO_DAYS(created_at)), view_count);
# instead of using a range condition, we fetch rows for each day separately and use an equality operation for each day
(
SELECT * FROM pages WHERE author_id = 1 AND TO_DAYS(NOW) - TO_DAYS(created_at) = 0 ORDER BY view_count DESC LIMIT 10 OFFSET 10
) UNION ALL (
SELECT * FROM pages WHERE author_id = 1 AND TO_DAYS(NOW) - TO_DAYS(created_at) = 1 ORDER BY view_count DESC LIMIT 10 OFFSET 10
) UNION ALL (
SELECT * FROM pages WHERE author_id = 1 AND TO_DAYS(NOW) - TO_DAYS(created_at) = 2 ORDER BY view_count DESC LIMIT 10 OFFSET 10
) UNION ALL (
SELECT * FROM pages WHERE author_id = 1 AND TO_DAYS(NOW) - TO_DAYS(created_at) = 3 ORDER BY view_count DESC LIMIT 10 OFFSET 10
) WHERE DATEDIFF(NOW(), created_at) < 3 ORDER BY view_count DESC LIMIT 10 OFFSET 10;
# because the `UNION ALL` queries will fetch records from 4 full days, 
# and we strictly want the last 72 hours, we add the original `WHERE` query at the end. 

You’ll notice that it will retrieve 40 records instead of 10 and filter it back down to 10 in the end. You should first test the performance differences and set an upper limit to the maximum amount of `UNION ALL` statements that you are allowing to prevent denial of service attacks.

A good practise overall, is to order your columns from least specific to most specific. Assuming that’s possible while keeping all the rules above in place. This will not improve performance at all but it does improve index key compression. You can read more about index key compression in this blog post from Oracle.

Index types

By default, most databases use B-Tree indexes. B-Tree’s or Binary trees are a type of data structure that stores data in a sorted format. They facilitate fast searching, insertion and deletion operations. Because B-Tree’s are inherently sorted, they work well for `ORDER BY index` statements for example as well as equality or range conditions. I have created at an article where I code a binary tree from scratch which you can find here.

Another index type is the Hash Index. Hash indexes do not work for `ORDER BY` operations but they are exceptionally fast for equality operations.

There are also R-Tree indexes, Full text indexes but those are outside the scope of this guide.

If your index is not being used it could be due to the type of index you have chosen so check the type of index you are using and if it’s compatible with the query you are trying to optimize.

The Query Optimizer

Even the smallest change (in big databases) can make a 20ms query turn into 60s just because the query optimizer decides that due to this seemingly unrelated change, your beautiful composite index is no longer useful. The query optimizer very often may not use your indexes for reasons which sometimes seem incomprehensible. Databases work in mysterious ways but remember that more often than not, these seemingly “slow” execution plans are actually faster: When a table scan is actually faster than an index scan.

That being said, query optimizers sometimes do make bad decisions. The query optimizer will also always choose the least “risky” execution plan. You may have a very good index which is actually faster for 90% of your queries, but very slow for the other 10%. In this case the query optimizer will always go for the slower but more predictable execution plan. Another example where the query optimizer may come up with slower execution plans is when you are using a functional index in your composite index and you have a database with news articles for example. Only 1% of the news articles are requested most often and the other 99% are almost never accessed. Even if your composite index is actually faster for the 1% of news articles, which are accessed most often, the query optimizer will still want to go for consistency and use the slower execution  plan. There are some exceptions and databases do use some statistics about your data and take those into account, but generally the decisions are made based on the database structure.

When you suspect that the query optimizer is making a bad decision, you have the option to suggest or force the query to use a specific index. For MySQL you can use “USE INDEX” or “FORCE INDEX”. For Postgres you can use “enable_indexscan” and “enable seqscan” (https://www.postgresql.org/docs/current/runtime-config-query.html).  I suggest that you do use those statements in production, only for testing. If you actually find out that your queries run faster you should find out why the query optimizer is not using them. This means testing and tweaking with the database config variables, which is not something for everyone and you should probably consider hiring a consultant if you are unable to find the issue.

Thank you for reading!