Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52065

Slow operations and timeouts

    XMLWordPrintable

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.0.2
    • Morpheus
    • couchbase-bucket
    • None
    • Untriaged
    • 1
    • Unknown

    Description

      Server Build: 7.0.2 build 6703

      Java SDK: 3.2.6

      Scenario:

      • 3 nodes cluster, replicas=2, durability=majority
      • Cluster quota: 24GB, 8 CPU per VM
      • 2 collections, ~100m documents
      • kvTimeout=10 seconds
      • Replace or Insert all documents in a row one by one, 1000 transactions per second
      • On each VM: Average  %idle 35-40%, Average  %memused 65-75%
      • bucket resident 40%

      Observation:

      For some seconds the following exception is thrown multiple times:

      May  6 04:49:58 XXXza-profiling za-profiling-profiles[-]: log:com.couchbase.client.core.error.AmbiguousTimeoutException: SubdocMutateRequest, Reason: TIMEOUT {"cancelled":true,"clientContext": {"isIdempotent":true}

      ,"completed":true,"coreId":"0x3eb621a800000004","idempotent":false,"lastChannelId":"3EB621A800000004/00000000B45256BE","lastDispatchedFrom":"10.233.90.21:45798","lastDispatchedTo":"lhvmsrv3196-lb.lh.upstreamsystems.com:11210","reason":"TIMEOUT","requestId":76219536,"requestType":"SubdocMutateRequest","retried":0,"service":{"bucket":"profiles","collection":"user_profile","documentId":"_sXK-QJ8iJSOUq5aiW1sKQ","opaque":"0x48aabb2","scope":"data","type":"kv","vbucket":624},"timeoutMs":10000,"timings":{"encodingMicros":33,"totalMicros":10006238}}

       

      So, mutations were canceled after the configured timeout of 10 seconds without any retry. All timeouts originated from the same VM.

       

      At the same time, Couchbase server prints the following:

      memcached.log 

      2022-05-06T04:49:51.220646+00:00 WARNING 181: Slow operation: {"bucket":"profiles","cid":"3EB621A800000004/00000000B45256BE/48aabb2","command":"SUBDOC_MULTI_MUTATION","duration":"13 s","packet": {"bodylen":514,"cas":0,"datatype":"raw","extlen":0,"key":"<ud>._sXK-QJ8iJSOUq5aiW1sKQ</ud>","keylen":23,"magic":"ClientRequest","opaque":2997586436,"opcode":"SUBDOC_MULTI_MUTATION","vbucket":624}

      ,"peer":"{\"ip\":\"10.168.104.52\",\"port\":20810}","trace":"request=2124376153275375:12597635 execute=2124376153275375:30 execute=2124376172404327:4026 execute=2124388750889725:21","worker_tid":140409586050816}

      2022-05-06T04:49:51.225152+00:00 WARNING 181: Slow operation: {"bucket":"profiles","cid":"3EB621A800000004/00000000B45256BE/48ab206","command":"GET_CLUSTER_CONFIG","duration":"10 s","packet": {"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":112364036,"opcode":"GET_CLUSTER_CONFIG","vbucket":0}

      ,"peer":"{\"ip\":\"10.168.104.52\",\"port\":20810}","response":"Success","trace":"request=2124378363635916:10396399 execute=2124388760002294:32","worker_tid":140409586050816}

      2022-05-06T04:49:51.316574+00:00 INFO (profiles) Compaction of vb:980 done (ok). purged tombstones:0, prepares:3310, prepareBytes:2208372 collection_items_erased:alive:0,deleted:0, size/items/tombstones/purge_seqno pre{69107811, 100594, 0, 0}, post{38113379, 97284, 0, 0}

       

      Is this the expected behavior? Could you please elaborate on how could be solved? Do we need more CPU or RAM?

       

      Other memcached.logs during timeouts:

      2022-05-06T04:49:51.220646+00:00 WARNING 181: Slow operation: {"bucket":"profiles","cid":"3EB621A800000004/00000000B45256BE/48aabb2","command":"SUBDOC_MULTI_MUTATION","duration":"13 s","packet": {"bodylen":514,"cas":0,"datatype":"raw","extlen":0,"key":"<ud>._sXK-QJ8iJSOUq5aiW1sKQ</ud>","keylen":23,"magic":"ClientRequest","opaque":2997586436,"opcode":"SUBDOC_MULTI_MUTATION","vbucket":624}

      ,"peer":"{\"ip\":\"10.168.104.52\",\"port\":20810}","trace":"request=2124376153275375:12597635 execute=2124376153275375:30 execute=2124376172404327:4026 execute=2124388750889725:21","worker_tid":140409586050816}

      2022-05-06T04:49:51.225152+00:00 WARNING 181: Slow operation: {"bucket":"profiles","cid":"3EB621A800000004/00000000B45256BE/48ab206","command":"GET_CLUSTER_CONFIG","duration":"10 s","packet": {"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":112364036,"opcode":"GET_CLUSTER_CONFIG","vbucket":0}

      ,"peer":"{\"ip\":\"10.168.104.52\",\"port\":20810}","response":"Success","trace":"request=2124378363635916:10396399 execute=2124388760002294:32","worker_tid":140409586050816}

      2022-05-06T01:51:16.912275+00:00 WARNING 278: Slow operation: {"bucket":"profiles","cid":"922D88E200000004/00000000CC379779/10d0826","command":"SUBDOC_MULTI_MUTATION","duration":"13 s","packet": {"bodylen":447,"cas":0,"datatype":"raw","extlen":0,"key":"<ud>.NSAuHgG_mOysWsdmZtGFXQ</ud>","keylen":23,"magic":"ClientRequest","opaque":638061825,"opcode":"SUBDOC_MULTI_MUTATION","vbucket":618}

      ,"peer":"{\"ip\":\"10.168.104.34\",\"port\":23027}","trace":"request=2113661864500353:12579017 execute=2113661864500353:29 execute=2113661869268788:3369 execute=2113674443490934:26","worker_tid":140409681430272}

      2022-05-06T01:51:16.915075+00:00 WARNING 278: Slow operation: {"bucket":"profiles","cid":"922D88E200000004/00000000CC379779/10d084f","command":"GET_CLUSTER_CONFIG","duration":"12 s","packet": {"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":1325927681,"opcode":"GET_CLUSTER_CONFIG","vbucket":0}

      ,"peer":"{\"ip\":\"10.168.104.34\",\"port\":23027}","response":"Success","trace":"request=2113662086593708:12363373 execute=2113674449949489:18","worker_tid":140409681430272}

      2022-05-06T02:46:48.799027+00:00 WARNING 206: Slow operation: {"bucket":"profiles","cid":"DD2D9FC700000004/000000006EF5156E/4300fee","command":"SUBDOC_MULTI_MUTATION","duration":"1742 ms","packet": {"bodylen":581,"cas":0,"datatype":"raw","extlen":0,"key":"<ud>.OLb7xePfXaZ_xhrjaMg1oQ</ud>","keylen":23,"magic":"ClientRequest","opaque":3993972740,"opcode":"SUBDOC_MULTI_MUTATION","vbucket":507}

      ,"peer":"{\"ip\":\"10.168.104.37\",\"port\":21623}","trace":"request=2117004586707730:1741527 execute=2117004586707730:274 execute=2117004595922489:1900 execute=2117006328214807:20","worker_tid":140409681430272}

      2022-05-06T02:46:48.800708+00:00 WARNING 206: Slow operation: {"bucket":"profiles","cid":"DD2D9FC700000004/000000006EF5156E/4301216","command":"GET_CLUSTER_CONFIG","duration":"970 ms","packet": {"bodylen":0,"cas":0,"datatype":"raw","extlen":0,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":370290692,"opcode":"GET_CLUSTER_CONFIG","vbucket":0}

      ,"peer":"{\"ip\":\"10.168.104.37\",\"port\":21623}","response":"Success","trace":"request=2117005365845751:969758 execute=2117006335587993:16","worker_tid":140409681430272}

      2022-05-06T02:47:34.723142+00:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:394' on thread NonIoPool0: 8144 ms

      Attachments

        1. bucket_resident.png
          bucket_resident.png
          78 kB
        2. screenshot-1.png
          screenshot-1.png
          44 kB
        3. screenshot-2.png
          screenshot-2.png
          58 kB
        4. screenshot-3.png
          screenshot-3.png
          65 kB
        5. SUBDOC-10d0826.png
          SUBDOC-10d0826.png
          29 kB
        6. vb_active_resident_items_ratio.png
          vb_active_resident_items_ratio.png
          42 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.2.0-1271 contains kv_engine commit 3ec1263 with commit message:
            MB-52334: Revert "MB-52065: Add notify_io_complete trace span"

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1271 contains kv_engine commit 3ec1263 with commit message: MB-52334: Revert " MB-52065 : Add notify_io_complete trace span"

            Build couchbase-server-8.0.0-1007 contains kv_engine commit 3ec1263 with commit message:
            MB-52334: Revert "MB-52065: Add notify_io_complete trace span"

            build-team Couchbase Build Team added a comment - Build couchbase-server-8.0.0-1007 contains kv_engine commit 3ec1263 with commit message: MB-52334: Revert " MB-52065 : Add notify_io_complete trace span"

            Any update here Trond Norbye? Thanks!

            lefteris Lefteris Katiforis added a comment - Any update here Trond Norbye ? Thanks!
            trond Trond Norbye added a comment -

            Unfortunately not. I'm busy investigating a different issue right now...

            trond Trond Norbye added a comment - Unfortunately not. I'm busy investigating a different issue right now...
            aaron.labeau Aaron LaBeau added a comment -

            This question was also posted in the forums by the same company back on May 9th.  Given this has been over a month anything to report on this?  

            https://forums.couchbase.com/t/slow-operations-and-timeouts/33531

            aaron.labeau Aaron LaBeau added a comment - This question was also posted in the forums by the same company back on May 9th.  Given this has been over a month anything to report on this?   https://forums.couchbase.com/t/slow-operations-and-timeouts/33531

            People

              owend Daniel Owen
              lefteris Lefteris Katiforis
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty