Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37188

Processing DCP_SEQNO_ACKNOWLEDGEMENT before the topology is setup

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.5.0
    • 6.5.1
    • couchbase-bucket
    • None
    • Triaged
    • No
    • KV-Engine Mad-Hatter GA, KV Sprint 2020-January

    Description

      Related to MB-37161

      Attempting to process a DCP_SEQNO_ACK before the first chain is set removes the connection, and so will result in rebalance failure.
      Dependent on what ns_server does we might never manage to get past this, because re-triggering the rebalance could hit exactly the same situation.

      2019-12-04T17:57:52.233463-08:00 ERROR 56: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 172.23.121.225:47004 - 172.23.121.226:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007fca9c06a010","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":1,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":854},"refcount":1}] - closing connection ([ 172.23.121.225:47004 - 172.23.121.226:11209 (<ud>@ns_server</ud>) ]): ActiveDurabilityMonitor::State::processSeqnoAck vb:854 FirstChain not set
      2019-12-04T17:57:52.233542-08:00 INFO 56: (No Engine) DCP (Producer) eq_dcpq:replication:ns_1@172.23.121.226->ns_1@172.23.121.225:default - Removing connection [ 172.23.121.225:47004 - 172.23.121.226:11209 (<ud>@ns_server</ud>) ]
      2019-12-04T17:57:52.233568-08:00 WARNING 56: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.121.226->ns_1@172.23.121.225:default - (vb:854) Stream closing, sent until seqno 2 remaining items 0, reason: The stream closed early because the conn was disconnected
      ... all other streams on this connection also close ...
      ...
      # ns_server sets topology on all vBuckets, including vb:854 which encountered the error above:
      2019-12-04T17:57:56.845170-08:00 INFO (default) VBucket::setState: transitioning vb:854 with high seqno:2 from:active to:active meta:{"topology":[["ns_1@172.23.121.226","ns_1@172.23.121.225"]]}
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            james.harrison James Harrison added a comment - - edited

            Repro-ed upgrading vagrants; failed on the second repro attempt. The last node to upgrade was 101, and during the rebalance to add it back to the cluster (same routine as in MB-37161) the Expects failure is logged on 102 - but the rebalance was successful (according to the UI).

            2019-12-09T11:57:48.587372+00:00 INFO (default) VBucket::setState: transitioning vb:385 with high seqno:0 from:active to:active
            ...
            2019-12-09T12:00:51.885911+00:00 INFO 54: (default) DCP (Producer) eq_dcpq:replication:ns_1@10.112.192.102->ns_1@10.112.192.101:default - (vb:385) Creating stream with start seqno 1 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
            ...
            2019-12-09T12:00:52.055026+00:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 10.112.192.101:47180 - 10.112.192.102:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f5df2162410","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":44,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":385},"refcount":1}] - closing connection ([ 10.112.192.101:47180 - 10.112.192.102:11209 (<ud>@ns_server</ud>) ]): ActiveDurabilityMonitor::State::processSeqnoAck vb:385 FirstChain not set
            2019-12-09T12:00:52.560579+00:00 INFO (default) VBucket::setState: transitioning vb:385 with high seqno:1 from:active to:active meta:{"topology":[["ns_1@10.112.192.102","ns_1@10.112.192.101"]]}
            ...
            2019-12-09T12:00:52.657006+00:00 INFO 51: (default) DCP (Producer) eq_dcpq:replication:ns_1@10.112.192.102->ns_1@10.112.192.101:default - (vb:385) Creating stream with start seqno 1 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
            

            The topology is set about 1.6s after the stream is created, or about 0.5s after the failed seqno ack.

            Given that this did not fail the rebalance (I assumed it would, as it tears down the DCP connection) this might not be a critical issue.

            james.harrison James Harrison added a comment - - edited Repro-ed upgrading vagrants; failed on the second repro attempt. The last node to upgrade was 101, and during the rebalance to add it back to the cluster (same routine as in MB-37161 ) the Expects failure is logged on 102 - but the rebalance was successful (according to the UI). 2019-12-09T11:57:48.587372+00:00 INFO (default) VBucket::setState: transitioning vb:385 with high seqno:0 from:active to:active ... 2019-12-09T12:00:51.885911+00:00 INFO 54: (default) DCP (Producer) eq_dcpq:replication:ns_1@10.112.192.102->ns_1@10.112.192.101:default - (vb:385) Creating stream with start seqno 1 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none ... 2019-12-09T12:00:52.055026+00:00 ERROR 54: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 10.112.192.101:47180 - 10.112.192.102:11209 (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f5df2162410","ewouldblock":false,"packet":{"bodylen":8,"cas":0,"datatype":"raw","extlen":8,"key":"<ud></ud>","keylen":0,"magic":"ClientRequest","opaque":44,"opcode":"DCP_SEQNO_ACKNOWLEDGED","vbucket":385},"refcount":1}] - closing connection ([ 10.112.192.101:47180 - 10.112.192.102:11209 (<ud>@ns_server</ud>) ]): ActiveDurabilityMonitor::State::processSeqnoAck vb:385 FirstChain not set 2019-12-09T12:00:52.560579+00:00 INFO (default) VBucket::setState: transitioning vb:385 with high seqno:1 from:active to:active meta:{"topology":[["ns_1@10.112.192.102","ns_1@10.112.192.101"]]} ... 2019-12-09T12:00:52.657006+00:00 INFO 51: (default) DCP (Producer) eq_dcpq:replication:ns_1@10.112.192.102->ns_1@10.112.192.101:default - (vb:385) Creating stream with start seqno 1 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none The topology is set about 1.6s after the stream is created, or about 0.5s after the failed seqno ack. Given that this did not fail the rebalance (I assumed it would, as it tears down the DCP connection) this might not be a critical issue.
            james.harrison James Harrison added a comment - - edited

            For repros now 2 of 4 runs demo this behaviour. This is with rather small buckets; I'm unsure whether that is relevant. Again, the rebalance succeeded, and the cluster appears healthy.

            james.harrison James Harrison added a comment - - edited For repros now 2 of 4 runs demo this behaviour. This is with rather small buckets; I'm unsure whether that is relevant. Again, the rebalance succeeded, and the cluster appears healthy.

            Build couchbase-server-6.5.1-6040 contains kv_engine commit abc6aa4 with commit message:
            MB-37188: Tolerate seqno ack before topology received

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6040 contains kv_engine commit abc6aa4 with commit message: MB-37188 : Tolerate seqno ack before topology received

            Build couchbase-server-7.0.0-1184 contains kv_engine commit abc6aa4 with commit message:
            MB-37188: Tolerate seqno ack before topology received

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1184 contains kv_engine commit abc6aa4 with commit message: MB-37188 : Tolerate seqno ack before topology received

            not able to reproduced. verified by doing upgrade to 6.5.1-6224 from 5.5.1

            anitha.kuberan Anitha Kuberan added a comment - not able to reproduced. verified by doing upgrade to 6.5.1-6224 from 5.5.1

            Build couchbase-server-6.6.0-7519 contains kv_engine commit abc6aa4 with commit message:
            MB-37188: Tolerate seqno ack before topology received

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7519 contains kv_engine commit abc6aa4 with commit message: MB-37188 : Tolerate seqno ack before topology received

            People

              james.harrison James Harrison
              owend Daniel Owen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty