Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
0
Description
It looks like we can shrink the sizeof(Item) from 120 bytes (which is jemalloc bin 128) to 80 bytes (which is jemalloc bin 80), so effectively by 30%, with moderate tradeoffs.
This can be worthwhile, as these Item objects are stored in the CheckpointManager and the readyQ, and when the value is resident, the main memory usage in checkpoints and readyQ comes from these Item objects (as the Blob is shared with the HT).
Current layout
struct ItemMetaData {
|
uint64_t cas; // offset 0, size 8
|
cb::uint48_t revSeqno; // offset 8, size 6
|
// 2-byte padding
|
uint32_t flags; // offset 16, size 4
|
// 4-byte padding
|
time_t exptime; // offset 24, size 8
|
// final size: 32
|
};
|
|
class Item {
|
void* ItemIface_vptr; // offset 0, size 8
|
int32_t RCValue_rc_refcount; // offset 8, size 4
|
// 4-byte padding
|
ItemMetaData metaData; // offset 16, size 32
|
value_t value; // offset 48, size 8
|
StoredDocKey key; // offset 56, size 32
|
int64_t bySeqno; // offset 88, size 8
|
cb::uint48_t prepareSeqno; // offset 96, size 6
|
Vbid vbucketId; // offset 102, size 2
|
queue_op op; // offset 104, size 1
|
uint8_t flags_5bits; // offset 105, size 1
|
uint8_t datatype_3bits; // offset 106, size 1
|
Requirements durabilityReqs; // offset 108, size 4
|
time_point queuedTime; // offset 112, size 8
|
// final size: 120
|
};
|
Proposed layout
struct ItemMetaData_V2 {
|
uint64_t cas; // offset 0, size 8
|
cb::uint48_t revSeqno; // offset 8, size 6
|
cb::uint48_t exptime; // offset 14, size 6 (note 1)
|
uint32_t flags; // offset 20, size 4
|
// final size: 24
|
};
|
|
class Item {
|
void* ItemIface_vptr; // offset 0, size 8
|
int32_t RCValue_rc_refcount; // offset 8, size 4
|
Requirements durabilityReqs; // offset 12, size 4 (use 4 byte padding)
|
ItemMetaData_V2 metaData; // offset 16, size 24
|
value_t value; // offset 40, size 8
|
char* key; // offset 48, size 8 (note 2)
|
int64_t bySeqno; // offset 56, size 8
|
time_point<cb::uint48_t> queuedTime; // offset 64, size 6 (note 3)
|
cb::uint48_t prepareSeqno; // offset 70, size 6
|
Vbid vbucketId; // offset 76, size 2
|
queue_op op; // offset 78, size 1
|
uint8_t flags_and_datatype_8its; // offset 79, size 1 (note 4)
|
// final size: 80
|
};
|
Notes:
1. This requires use to use a 48-bit integer to represent the expiry time, which is a Unix timestamp in seconds. Todays timestamps take 31 bits to store. Saves us 8 bytes, by removing padding. The MCBP protocol seems to use 4 bytes for this field.
2. The key is currently stored in a std::string, which can use SSO to remove an allocation of up to 15 bytes on GCC (see https://github.com/elliotgoodrich/SSO-23). Including the CollectionID, this is only applicable for document keys < 13-14 bytes in the best case, which seems perhaps short to be practical, although we should estimate sizes from existing data. We can store the key as a classic C-style string, or inline in the object, as for StoredValue (which I'm not sure how to display here), and then access via DocKeyView.
3. The queuedTime has microsecond granularity on the steady_clock. With 48-bits, we get 214 years of maximum duration, and implementing offset compression is something that is not difficult to test for correctness. We only need this field for time measurements (stats).
4. Finally, we can merge the 5 bits of flags with the 3 bits needed for the datatype. We've already done this for StoredValue, saving on padding, and making us fit perfectly in the 80 byte jemalloc bin.
Besides increase in complexity, the only other downside seems to be losing SSO, however I suspect we are not necessarily always benefiting from it.
The space-savings seems worth it, and it would be interesting to test with a hacky toybuild, before doing the full implementation.
Even if we did have to later add more metadata to Item, we have 16 bytes before the 96 byte jemalloc bin, which would still be a 25% space saving compared to 128 bytes, so I think it is worth investigating.
Attachments
Gerrit Reviews
For Gerrit Dashboard: MB-62017 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
211244,2 | MB-62017: Use accessors for ItemMetaData members | master | kv_engine | Status: NEW | +1 | +1 |
211245,2 | MB-62017: Use smaller integer type for exptime in ItemMetaData | master | kv_engine | Status: NEW | +1 | +1 |
211246,2 | MB-62017: Use 48-bits to store the queuedTime of Item | master | kv_engine | Status: NEW | +1 | +1 |
211247,2 | MB-62017: Use 3 bits to store the datatype in Item | master | kv_engine | Status: NEW | +1 | +1 |
211248,2 | WIP MB-62017: Test with smaller DocKey | master | kv_engine | Status: NEW | +1 | +1 |