# README
LSM Implementation built with AI
This is a Log-Structured Merge-tree (LSM) implementation similar to the Golang port of SlateDB, an embedded storage engine built on top of object storage. The purpose of this project is for me to get familiar with LSM and use Aider using the most advanced AI models like claude 3.5 to help build SlateDB, but designed with DDD principles in mind.
Overview
The goal is to
- Organize the code into small problem domains
wal
,sstable
,bloom
etc... - Define an interface for each of those problem domains by hand
- Instruct the AI to implement interfaces
- Test the implemented interfaces for correctness
- Have the AI fix the implementation when it gets it wrong
- Or Have the AI suggest ways to fix the implementation
- If that fails, then I just ask the AI to borrow SlateDB implementation and adapt it to the new interface
Results
Block Package
As this is the first thing I asked the AI to build, I didn't completely grok that I should write detailed documentation about how the blocks should be laid out on disk in order for the AI to properly implement this portion. So, instead I had the AI adapt the code from slatedb-go.
Bloom Package
The AI implemented a complete bloom filter package, however it ignored the request to
implement enhanced double hashing
which resulted in a significantly different implementation
that what slateDB has. This is fine, but when I ran the tests, all the HasKey()
tests failed as
the AI wasn't able to figure out how to calculate the filter bits without a reference. I fixed this
and the code worked, but I ended up using the slateDB implementation because I'm not a bloom filter
expert. =/
Also, the AI did a pretty good job of describing how I should go about diagnosing the bug. In all I spent about 1 hour on this package, which is pretty fast compared to how long it would have taken me to write it from scratch, then throw it away and use the slatedb implementation.
SSTable Package
The AI implemented all the flat buffer encoding and decoding after I finally understood that
the methods I was asking it to implement needed to use flatbuf.SsTableIndexT
instead of
flatbuf.SsTableIndex
. The AI had some trouble properly implementing sstable.Builder.Build()
.
I had to add TODO comments to the code with specific instructions to utilize encodeIndex
which
the AI wrote a few prompts ago. I also had to come up with a way of storing the block offsets before
the AI understood that calling block.Encode()
everytime it needed an offset wasn't an efficient way
to solve the problem.
ReadInfo()
The AI implemented the method incorrectly, after a few attempts and diagnosis, I realized the AI wrote the
final SSTable offset as an uint64
instead of a uint32
which caused an out-of-bounds error. Once I fixed
this the method passed the provided test.
ReadIndex()
The AI implemented both ReadIndex()
and ReadIndexFromBytes()
perfectly, even including negative and
positive tests.
ReadBlocks()
I had to /ask
the AI to describe how it would implement the method several times. Each time I had to
add more //TODO
comments to the method and update the method comment before it finally gave me what I wanted.
The AI wanted to make multiple calls to the read only blob in order to fetch each block individually.
This was suboptimal as we have no idea if the ReadOnlyBlob implementation is making remote calls to fetch
the offsets, so I had to spell out exactly what I wanted in the TODOs before it would do it.
Once that was done, the AI wrote the code perfectly. The only change I needed to make was in the tests, as the AI didn't realize that multiple blocks would not result unless the keys exceeded the block size.
What I've learned
In all, using the AI made me really think hard about how I would explain what I wanted. The advantage of this is that the code gets better documentation than it normally would if I was writing the implementation as I know what I want, and don't have to describe it to anyone. This way, the code gets documented and I end up moving faster than I normally would.
The other thing I considered is that I might be able to ask a newer AI model to re-write a method to improve it, or make it more efficient. The excellent documentation that results due to process assists future developers and AI as the documentation clearly spells out what the code "should" be doing and why.
I don't think I could have gotten this far with as little time as I've invested in this without the AI.
Provide exact instructions in the method comments
When designing the methods and functions you want the AI to implement, explicitly state what and how the method should operate. This gives the AI hints as to what you expect. Then when prompting include additional instructions
Example:
// ReadBloom reads the bloom.Filter from the provided store using blob.ReadRange()
// using the offsets provided by Info.
func (d *Decoder) ReadBloom(info *Info, b ReadOnlyBlob) (*bloom.Filter, error) {
return nil, nil // TODO
}
Prompt:
Provide an implementation of sstable.Decoder.ReadBloom() using
the same error verbage as ReadInfo().
Add TODOs to complex methods for the AI to follow
When asking the AI to implement methods which require multiple steps or utilizes different parts of the code base, I
got better results when I added //TODO
comments in the method. For example:
// Build returns the SSTable in it's encoded form
func (bu *Builder) Build() *Table {
// TODO: Finalize the last block if it's not empty
// TODO: Encode blocks using block.Encode()
// TODO: Build the bloom filter using bu.bloomBuilder.Build() if the number of keys
// is greater than bu.conf.MinFilterKeys
// TODO: Build and encode the flatbuf.SsTableIndexT using bu.blocks[].Meta
// TODO: Build and encode sstable.Info using encodeInfo() from flatbuf.go
// TODO: Append the info offset as a uint32
}
For complex methods, Ask the AI to explain first
Use the /ask
command in aider to ask how the AI would implement a method. This avoids having the AI make
changes the FIX those changes it got wrong because it didn't understand your requirements.
Often times I will make several /ask
follow-ups after adding //TODO
or improving the method comment until
it understands what the method is supposed todo.
Git Add before asking the AI to make changes
Run aider with auto-commits: false
and commit your changes before asking the AI to make code changes. This
allows you to quickly roll back if the AI gets it really wrong.