Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-1592

pillowfight generates identical "random" values when min and max value size are identical

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 3.3.6
    • 3.3.5
    • tools
    • None
    • 0

    Description

      Observed Behaviour
      When loading data using pillowfight with with --json and --random-body; while the body is random, it seems to be the same every time - e.g.:

      /opt/couchbase/bin/cbc-pillowfight --username Administrator --password asdasd --num-items 1000000 --batch-size 10 --num-threads 4 --min-size 232 --max-size 232 --json --random-body --populate-only --collection _default.c_1M
      

      Results in documents:

      # /opt/couchbase/bin/cbc-cat -U localhost/default -u Administrator -P asdasd --collection c_1M 00000000000000997618
      00000000000000997618 CAS=0x1755d6ca3b400000, Flags=0x0, Size=233, Datatype=0x01(JSON)
      {"Field_1":"1o9YNP te mDEoEM","Field_2":"nzfAgywOHppc1i50","Field_3":"IbWt17VfehRTEuE2","Field_4":"2UBir889vvluOqvw","Field_5":"BpYCuTRI9IBOadOd","Field_6":"7nwyvCF281wXp0r1","Field_7":"oqCThrBr8bdjn0xs","Field_8":"o2ZSFESNDnI4na3K"}
      # /opt/couchbase/bin/cbc-cat -U localhost/default -u Administrator -P asdasd --collection c_1M 00000000000000997788
      00000000000000997788 CAS=0x1755d6ca3b760000, Flags=0x0, Size=233, Datatype=0x01(JSON)
      {"Field_1":"1o9YNP te mDEoEM","Field_2":"nzfAgywOHppc1i50","Field_3":"IbWt17VfehRTEuE2","Field_4":"2UBir889vvluOqvw","Field_5":"BpYCuTRI9IBOadOd","Field_6":"7nwyvCF281wXp0r1","Field_7":"oqCThrBr8bdjn0xs","Field_8":"o2ZSFESNDnI4na3K"}
      

      Expected Behaviour
      When asking for "random" bodies I would expect to see each document having a different random value, not them all having the same one.

      Analysis

      Digging into the pillowfight code, this appears to be due to using a min and max value size which are the same (232 in this case) - only one "representative" random value is generated:

      docgen.h

      129
          static std::vector< size_t > gen_graded_sizes(uint32_t minsz, uint32_t maxsz, int grades = 10)
      130
          {
      131
              std::vector< size_t > ret;
      132
       
      133
              size_t diff = maxsz - minsz;
      134
              size_t factor = diff / grades;
      135
              if (factor == 0 || minsz == maxsz) {
      136
                  ret.push_back(maxsz);
      137
              } else {
      138
                  for (int ii = 0; ii < grades + 1; ii++) {
      139
                      size_t size = minsz + (factor * ii);
      140
                      ret.push_back(size);
      141
                  }
      142
              }
      143
              return ret;
      144
          }
      

      Note the minsz == maxsz condition which only adds one element to set of returned values.

      I'm marking this as a bug as while one could argue this behaviour is "correct", IMO this breaks the principle of least surprise - if a user asks for "random" bodies they probably expect multiple different random bodies; not just the same one over and over...

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            avsej Sergey Avseyev
            drigby Dave Rigby (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty