repository
0.0.0-20200513105008-eecbd4b1ad93
Repository: https://github.com/jmpotato/index-kv.git
Documentation: pkg.go.dev
# README
index-kv
📒A simple index model for special k-v storage
Spec
- CPU 8 cores
- Memory 4G
- Disk HDD 4T
Key Point
- 1T disordered data in a single file on disk.
- Data struct: (key_size uint64, key bytes, value_size uint64, value bytes)
key_size uint64
key bytes
1B <= size < 1KBvalue_size uint64
value bytes
1B <= size < 1MB
- Getting muti values by keys concurrently.
- Pretreatment wiil be include in the total cost.
Solution
Bloom Filter
checks whether key existsLRU Cache
speeds up the queryingHash & Sharding
-> Index/Offset- Chunk struct: (key_hash uint64, offset uint64)
key_hash uint64
The hash value of keyoffset uint64
The offset of key in the real data file
Unit test
data_test.go
Unit tests for data generator.chunk_test.go
Unit tests for chunk Create/Append/Get.cache_test.go
Unit tests for LRU cache.index_test.go
Unit tests for Index model, including Create/Get/MGet.
Benchmark
Because I'm using an old poor 13-inch MBP Early 2015 which only has less than 100GB disk storage and very low CPU performance. The best I can do is to generate around 100000 pairs k-v using the random k-v data generator I wrote. So the benchmarks below may not be very accurate. Sorry :-(
- Create index for 100000 pairs k-v random disordered data(52.51GB)
- Time cost: 233.050s or 3.88mins
- Storage cost: 29320 Chunks, total 1.8MB
goos: darwin
goarch: amd64
pkg: github.com/JmPotato/index-kv/test
BenchmarkIndexCreate
2020/05/13 18:03:07 Chunk file not found. Create index first...
BenchmarkIndexCreate-4 1 231705919706 ns/op 52982909896 B/op 1593649 allocs/op
PASS
ok github.com/JmPotato/index-kv/test 233.050s
-
Get random 10000 keys with index and cache
- Time cost: 4500ms total, 0.45ms per key
-
Concurrently get random 10000 keys with index and cache
- Time cost: 3602ms total, 0.36ms per key