B-Tree & B+ Tree

April 19, 2022

B-Tree – A method used by application storage engines for writing, reading and indexing data optimised for reading. One way of writing data is just to append it to existing data. Then you search for it by reading each piece of data, checking if it’s the right one, and reading the next item if it is not. This can, however, take a very long time if your database or file has a million entries. The longer the file, the worse the average search and read time. We need to index the file or database so we can find entries faster.

A binary search tree and a B-Tree both store index or key:values for data items. A binary tree has a starting point or root, with a few layers of nodes. Each layer is laid out from left to right and a node has a key:value that is greater than nodes to the left and smaller than nodes to the right. To find a particular key, you progress down through the layers in the tree util you arrive at the node with the desired value. The deeper the tree and the more nodes, the longer this takes.

A B-Tree fixes this problem. It has a starting point or root with a small number of node layers called leaves and children. We’ve drawn a simplified diagram to show this. Note that there are different definitions of what is a leaf node and what is not. We have kept things simple and may be oversimplifying.

A node can hold one or more keys, up to a limit, which are sorted in ascending order. The numbers in each node in our diagram are range limits. The left-most child node, for example, has a range of keys between 5 and 15. If we have a leaf node with a range of 20 to 150 then we know all key:values below 20 will be in its left-most child, key:values between 20 and 150 will be in the middle child, and keys valued at more than 150, but less than 500, will be in in its rightmost child.

As you add more data, a B-tree gets more index values which have to be added. This can lead to more leaf nodes and child nodes, with nodes being split in two, and reorganization going on in the background.

B+ Tree

A B+ tree stores data pointers only at the leaf nodes of the tree. The B-Tree has far fewer layers than a binary tree which reduces the number of disk access needed to search it. This makes a B-Tree scheme good for read-intensive workloads. A Log-Structured Merge Tree is better for write-intensive workloads.

A B+ Tree differs from a B-tree in that both are self-balancing tree data structures that maintain sorted data and allow searches, sequential access, insertions, and deletions in logarithmic time. However, there are several key differences between them:

Structure:

B-tree:
- Nodes: Both internal and leaf nodes can store key-value pairs or pointers to records.
- Data Storage: Data can be stored in both leaf nodes and internal nodes. Each node contains a number of keys and pointers, where the keys act as separators for the pointers. For example, if a node has keys K1, K2, …, Kn, then pointers P1, P2, …, P(n+1) direct to subtrees where keys less than K1 go to P1, keys between K1 and K2 go to P2, and so on.
- Internal Nodes: Have both keys and pointers to child nodes.

B+ tree:
- Nodes: Only leaf nodes store actual data or key-value pairs. Internal nodes only contain keys and pointers to child nodes.
- Leaf Nodes:
  - All data is stored in the leaf nodes, which are linked together in a linked list (or sometimes a doubly linked list) for sequential access. This means you can traverse from one leaf to another without going back up the tree.
  - Each leaf node contains all the keys within its range in sorted order.
- Internal Nodes: Only contain keys for routing, not the data itself. Each key in an internal node acts as a guide to where further search should go.

Key Differences:

Data Storage:
- B-tree: Data can be stored in any node.
- B+ tree: Data is only stored in leaf nodes.
Sequential Access:

B-tree: Requires recursive or iterative traversal back up the tree to move to the next or previous key, making sequential access less efficient for operations like range queries.
B+ tree: Provides efficient sequential access through the linked list of leaf nodes, which is advantageous for range queries or full scans.
Redundancy and Space Efficiency:

B-tree: Keys might be repeated in internal nodes leading to higher space usage.
B+ tree: Keys in internal nodes are not duplicated in leaf nodes, potentially saving space, although this depends on the size of the keys and the data they reference.
Search Path:

B-tree: A search might end at an internal node if the key is found there.
B+ tree: All searches must traverse down to a leaf node since only leaves hold data.
Performance for Different Operations:

Insertions and Deletions: Both can be complex but handled similarly in terms of maintaining balance. B+ trees might involve more pointer updates due to the linked list in leaf nodes.
Range Queries: B+ trees are generally more efficient due to the direct connection between leaf nodes.

Use Cases:

B-trees are often used in database systems where random access to data is needed, and where the data size might not justify the overhead of B+ trees.
B+ trees are preferred in file systems, databases for range queries, and any scenario where sequential access is common or beneficial due to their efficiency in scanning large portions of data.

In summary, while both B-trees and B+ trees are used for similar purposes in managing sorted data, B+ trees offer advantages in terms of sequential access and potentially better space utilization for certain types of data, at the cost of slightly more complex insertion and deletion operations due to the maintenance of the linked list in leaf nodes.

B-Tree & B+ Tree

ABOUT US

FOLLOW US

Rubrik touts new cyber-resilience features

PEAK:AIO AI Data Server peaks at 120 GBps

Supermicro says Fibre Channel is on the way out, but SAS lives on and...