LSM Trees Unveiled - Part 2 - Efficient Rust Implementation

Photo by Wil Stewart / Unsplash

Welcome to Part 2 of our series on Log-Structured Merge (LSM) trees. In Part 1, we explored the core concepts and components of LSM trees. Now, we'll dive into an efficient Rust implementation, demonstrating how to bring these concepts to life with a focus on concurrency and performance.

Implementing an LSM Tree in Rust

Our Rust implementation will focus on efficiency and concurrency, using techniques like Copy-on-Write, lock-free data structures, and message passing. Let's break down the implementation, starting with our dependencies:

[dependencies]
crossbeam-channel = "0.5"
crossbeam-skiplist = "0.1"
parking_lot = "0.12"
im = "15.1.0"

Note the inclusion of the im crate. This crate provides immutable data structures with structural sharing, which we'll use for our SSTable index to implement efficient Copy-on-Write semantics.

Now, let's dive into each component of our LSM tree implementation.

MemTable Implementation

For our MemTable, we'll use a lock-free SkipList from the crossbeam-skiplist crate. But first, let's discuss why SkipLists are well-suited for lock-free implementations in concurrent scenarios.

Why SkipLists Can Be Lock-Free

SkipLists are probabilistic data structures that allow for efficient search, insertion, and deletion operations. They can be implemented in a lock-free manner due to their unique properties:

  1. Node-based structure: SkipLists consist of nodes linked at multiple levels, allowing for localized updates.
  2. Probabilistic balancing: Unlike strict balanced trees, SkipLists use randomization for balancing, which simplifies concurrent modifications.
  3. Bottom-up updates: Insertions and deletions can be performed bottom-up, allowing concurrent operations to proceed without blocking each other.
  4. Atomic operations: Modern CPUs support atomic Compare-and-Swap (CAS) operations, which can be used to update pointers without locks.

These properties allow for implementing SkipLists where threads can traverse and modify the structure concurrently without the need for global locks, leading to higher scalability in multi-threaded environments.

Now, let's implement our MemTable using a lock-free SkipList:

use std::sync::atomic::{AtomicUsize, Ordering};
use crossbeam_skiplist::SkipMap;

struct MemTable {
    data: SkipMap<String, String>,
    size: AtomicUsize,
}

impl MemTable {
    fn new() -> Self {
        MemTable {
            data: SkipMap::new(),
            size: AtomicUsize::new(0),
        }
    }

    fn insert(&self, key: String, value: String) {
        let size_delta = key.len() + value.len();
        self.data.insert(key, value);
        self.size.fetch_add(size_delta, Ordering::Relaxed);
    }

    fn get(&self, key: &str) -> Option<String> {
        self.data.get(key).map(|entry| entry.value().clone())
    }

    fn drain(&self) -> Vec<(String, String)> {
        let mut result = Vec::new();
        for entry in self.data.iter() {
            result.push((entry.key().clone(), entry.value().clone()));
        }
        self.data.clear();
        self.size.store(0, Ordering::Relaxed);
        result
    }

    fn size(&self) -> usize {
        self.size.load(Ordering::Relaxed)
    }
}

This implementation allows for concurrent reads and writes without locking, leveraging the lock-free properties of SkipList.

SSTable Implementation with Copy-on-Write

For our SSTable, we'll use a Copy-on-Write (CoW) strategy for the index to allow lock-free reads while providing efficient updates. We'll use OrdMap from the im crate to optimize index updates:

use std::sync::Arc;
use std::fs::File;
use std::io::{self, BufReader, BufWriter, Read, Seek, SeekFrom, Write};
use std::path::Path;
use parking_lot::Mutex;
use im::OrdMap;

struct SSTable {
    id: usize,
    file: Mutex<File>,
    index: Arc<OrdMap<String, u64>>,
}

impl SSTable {
    fn new(id: usize, data_dir: &Path) -> io::Result<Self> {
        let file_path = data_dir.join(format!("sstable_{}.dat", id));
        let file = std::fs::OpenOptions::new()
            .create(true)
            .read(true)
            .write(true)
            .open(&file_path)?;

        let mut sstable = SSTable {
            id,
            file: Mutex::new(file),
            index: Arc::new(OrdMap::new()),
        };

        if file_path.exists() {
            sstable.rebuild_index()?;
        }

        Ok(sstable)
    }

    fn rebuild_index(&mut self) -> io::Result<()> {
        let mut file = self.file.lock();
        file.seek(SeekFrom::Start(0))?;
        let mut reader = BufReader::new(&mut *file);
        let mut new_index = OrdMap::new();
        let mut offset = 0;
        let mut line = String::new();

        while reader.read_line(&mut line)? > 0 {
            if let Some((key, _)) = line.split_once(':') {
                new_index.insert(key.to_string(), offset);
            }
            offset += line.len() as u64;
            line.clear();
        }

        self.index = Arc::new(new_index);
        Ok(())
    }

    fn write(&mut self, key: &str, value: &str) -> io::Result<()> {
        let mut file = self.file.lock();
        let mut writer = BufWriter::new(&mut *file);
        let offset = writer.seek(SeekFrom::End(0))?;
        writeln!(writer, "{}:{}", key, value)?;
        writer.flush()?;

        let mut new_index = (*self.index).clone();
        new_index.insert(key.to_string(), offset);
        self.index = Arc::new(new_index);

        Ok(())
    }

    fn get(&self, key: &str) -> io::Result<Option<String>> {
        if let Some(&offset) = self.index.get(key) {
            let mut file = self.file.lock();
            file.seek(SeekFrom::Start(offset))?;
            let mut reader = BufReader::new(&mut *file);
            let mut line = String::new();
            reader.read_line(&mut line)?;
            if let Some((_, value)) = line.split_once(':') {
                return Ok(Some(value.trim().to_string()));
            }
        }
        Ok(None)
    }
}

The use of OrdMap from the im crate allows for efficient partial updates to the index, reducing the overhead of copying the entire index on writes. The im crate provides immutable data structures with structural sharing, which is perfect for our Copy-on-Write strategy. When we update the index, most of the structure is reused, and only the changed parts are newly allocated.

Conclusion

In this part of our series, we've implemented two core components of our LSM tree: the MemTable and the SSTable. We've leveraged lock-free data structures and Copy-on-Write techniques to ensure high concurrency and efficiency.

In the next part, we'll put these components together to create our complete LSM tree implementation, including the main tree structure, compaction process, and usage examples.

Stay tuned for Part 3, where we'll complete our LSM tree implementation and see how all these pieces work together!

Haili Zhang

Haili Zhang