Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49348

HLC generated CAS value may be the same for multiple docs in a vBucket

    XMLWordPrintable

Details

    • Triaged
    • 1
    • Unknown
    • KV 2021-Nov

    Description

      Need to update affects versions.

      The nextHLC() function (https://github.com/couchbase/kv_engine/blob/15b36716b5cd32f337b0a58251071ad953b9911c/engines/ep/src/hlc.h#L81) may be called concurrently for two (or more) different docs in a vBucket if those docs belong to different HashBuckets. Generally this is updated under the checkpoint manager lock, but getLocked calls also update the cas in some situtations which is where we hit this. If two threads call it concurrently, get the same time, and we are not in logical clock mode, then the two items end up having the same cas (and we update the maxHLC once). Really we should be checking the result of the maxHLC update and only returning the cas if true.

      This is probably benign, we can already set cas freely via set with meta and this is for two different documents, but it's worth correcting before something falls afoul of it.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Build couchbase-server-7.1.0-1648 contains kv_engine commit 878bce6 with commit message:
          MB-49348: Template HLC on Clock type

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1648 contains kv_engine commit 878bce6 with commit message: MB-49348 : Template HLC on Clock type

          Build couchbase-server-7.1.0-1649 contains kv_engine commit 90c1d67 with commit message:
          MB-49348: Correct logical clock race in setting CAS

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1649 contains kv_engine commit 90c1d67 with commit message: MB-49348 : Correct logical clock race in setting CAS

          Build couchbase-server-7.1.0-1649 contains platform commit 004a18a with commit message:
          MB-49348: Have atomic_setIfBigger return status

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1649 contains platform commit 004a18a with commit message: MB-49348 : Have atomic_setIfBigger return status

          Build couchbase-server-7.1.0-1651 contains kv_engine commit 8f13825 with commit message:
          MB-49348: Fix race in time based CAS

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1651 contains kv_engine commit 8f13825 with commit message: MB-49348 : Fix race in time based CAS

          Ben Huddleston - Please help with step to validate the defect.

          ritam.sharma Ritam Sharma added a comment - Ben Huddleston  - Please help with step to validate the defect.
          ben.huddleston Ben Huddleston added a comment - - edited

          Ritam Sharma this will be very hard to hit in a QE test. There's two specific issues here. It might be possible to test the logical clock bug if you can set the system clock to return the same value for every call and then do a lot of operations to the same vBucket as quickly as possible. You'd then want to compare cas results of all the operations and ensure that they're unique. If two are the same then we hit the bug. The steps to hit the time based cas issue are the same, except it's much harder to hit as after the first cas generation we'll drop into logical clock mode.

          Let me know if you want to try to reproduce the logical clock issue, or if you'd rather me close this out with the unit tests as verification.

          ben.huddleston Ben Huddleston added a comment - - edited Ritam Sharma this will be very hard to hit in a QE test. There's two specific issues here. It might be possible to test the logical clock bug if you can set the system clock to return the same value for every call and then do a lot of operations to the same vBucket as quickly as possible. You'd then want to compare cas results of all the operations and ensure that they're unique. If two are the same then we hit the bug. The steps to hit the time based cas issue are the same, except it's much harder to hit as after the first cas generation we'll drop into logical clock mode. Let me know if you want to try to reproduce the logical clock issue, or if you'd rather me close this out with the unit tests as verification.

          Build couchbase-server-7.1.0-1711 contains kv_engine commit 111518f with commit message:
          MB-49348: Remove unnecessary loop

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1711 contains kv_engine commit 111518f with commit message: MB-49348 : Remove unnecessary loop

          Verified by unit tests.

          ben.huddleston Ben Huddleston added a comment - Verified by unit tests.

          People

            ben.huddleston Ben Huddleston
            ben.huddleston Ben Huddleston
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty