Description
Observed Behaviour
When loading data using pillowfight with with --json and --random-body; while the body is random, it seems to be the same every time - e.g.:
/opt/couchbase/bin/cbc-pillowfight --username Administrator --password asdasd --num-items 1000000 --batch-size 10 --num-threads 4 --min-size 232 --max-size 232 --json --random-body --populate-only --collection _default.c_1M
|
Results in documents:
# /opt/couchbase/bin/cbc-cat -U localhost/default -u Administrator -P asdasd --collection c_1M 00000000000000997618
|
00000000000000997618 CAS=0x1755d6ca3b400000, Flags=0x0, Size=233, Datatype=0x01(JSON)
|
{"Field_1":"1o9YNP te mDEoEM","Field_2":"nzfAgywOHppc1i50","Field_3":"IbWt17VfehRTEuE2","Field_4":"2UBir889vvluOqvw","Field_5":"BpYCuTRI9IBOadOd","Field_6":"7nwyvCF281wXp0r1","Field_7":"oqCThrBr8bdjn0xs","Field_8":"o2ZSFESNDnI4na3K"}
|
# /opt/couchbase/bin/cbc-cat -U localhost/default -u Administrator -P asdasd --collection c_1M 00000000000000997788
|
00000000000000997788 CAS=0x1755d6ca3b760000, Flags=0x0, Size=233, Datatype=0x01(JSON)
|
{"Field_1":"1o9YNP te mDEoEM","Field_2":"nzfAgywOHppc1i50","Field_3":"IbWt17VfehRTEuE2","Field_4":"2UBir889vvluOqvw","Field_5":"BpYCuTRI9IBOadOd","Field_6":"7nwyvCF281wXp0r1","Field_7":"oqCThrBr8bdjn0xs","Field_8":"o2ZSFESNDnI4na3K"}
|
Expected Behaviour
When asking for "random" bodies I would expect to see each document having a different random value, not them all having the same one.
Analysis
Digging into the pillowfight code, this appears to be due to using a min and max value size which are the same (232 in this case) - only one "representative" random value is generated:
docgen.h |
|
129
|
static std::vector< size_t > gen_graded_sizes(uint32_t minsz, uint32_t maxsz, int grades = 10) |
130
|
{
|
131
|
std::vector< size_t > ret; |
132
|
|
133
|
size_t diff = maxsz - minsz; |
134
|
size_t factor = diff / grades; |
135
|
if (factor == 0 || minsz == maxsz) { |
136
|
ret.push_back(maxsz);
|
137
|
} else { |
138
|
for (int ii = 0; ii < grades + 1; ii++) { |
139
|
size_t size = minsz + (factor * ii); |
140
|
ret.push_back(size);
|
141
|
}
|
142
|
}
|
143
|
return ret; |
144
|
}
|
Note the minsz == maxsz condition which only adds one element to set of returned values.
I'm marking this as a bug as while one could argue this behaviour is "correct", IMO this breaks the principle of least surprise - if a user asks for "random" bodies they probably expect multiple different random bodies; not just the same one over and over...