Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38390

DCP Deletion V2 packets are sent regardless of whether the DELETE_TIMES feature is negotiated

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      Steps to Reproduce

      1. Create a single node cluster of Couchbase Server 5.0.1
      2. Add a Couchbase Server 6.5.0 node and rebalance
      3. Create a bucket, my example uses test
      4. Create a document that resides on the 6.5.0 node, this command works if you followed my steps :

        /opt/couchbase/bin/cbc create -U "couchbase://localhost/test" -u Administrator -P password testing1234 -V test
        

      5. Delete the document that was just created:

        /opt/couchbase/bin/cbc rm -U "couchbase://localhost/test" -u Administrator -P password testing1234
        

      6. Add a new 6.5.0 node and mark the other 6.5.0 node for removal (so that you guarantee the document will move from one 6.5.x node to the other).
      7. Rebalance

      Expected Result
      Rebalance succeeds

      Actual Result

      Rebalance fails due to the error:

      Rebalance exited with reason {mover_crashed,
                                    {unexpected_exit,
                                     {'EXIT',<0.21330.0>,
                                      {{bulk_set_vbucket_state_failed,
                                        [{'ns_1@10.143.171.103',
                                          {'EXIT',
                                           {{{{{child_interrupted,
                                                {'EXIT',<24454.7397.0>,
                                                 socket_closed}},
      

      All subsequent attempts to rebalance all fail.

      Analysis

      This is because DCP deletion packets between the two 6.5.0 nodes are being incorrectly handled:

      2020-03-22T18:26:15.302067-07:00 WARNING 58: Invalid format specified for "DCP_DELETION" - Status: "Invalid arguments" - Closing connection. Packet:[{"bodylen":32,"cas":1584926732193038336,"datatype":"raw","extlen":21,"keylen":11,"magic":"ClientRequest","opaque":13,"opcode":"DCP_DELETION","vbucket":739}] Reason:"Request must include extras of length 18"
      

      This then closes the DCP connection and ultimately leads to the rebalance failure.

      The invalid DCP deletion packet is for the following doc:

      Doc seq: 2
           id: (collection:0x0:default) testing1234
           rev: 2
           content_meta: 3
           size (on disk): 0
           cas: 1584926732193038336, expiry: 1584926732, flags: 0, datatype: 0x00 (raw)
           doc deleted (explicit)
      

      DELETE_TIMES has not been negotiated as part of the DCP stream:

      2020-03-22T18:26:15.108609-07:00 INFO (test) EventuallyPersistentEngine::dcpOpen: opening new DCP Consumer handler - stream name:replication:ns_1@10.143.171.102->ns_1@10.143.171.103:test, opaque:0, seqno:0, flags:0b000000000000000000000000000
      00100 value:null
      2020-03-22T18:26:15.108615-07:00 INFO 58: DCP connection opened successfully. INCLUDE_XATTRS [ 127.0.0.1:36102 - 127.0.0.1:11209 (<ud>@ns_server</ud>) ]
      

      This is not negotiated because there is still a 5.0.x node in the cluster that is not DELETE_TIMES aware, however from a packet capture (MB-38390.pcap) it seems that the 6.5 nodes are sending DCP_DELETION_V2 packets, despite it not being negotiated:

      Couchbase Protocol, DCP (Key) Deletion Request, Opcode: 0x58, VBucket: 0x2e3
          Magic: Request (0x80)
          Opcode: DCP (Key) Deletion (0x58)
          Key Length: 11
          Extras Length: 21
          Data Type: 0x00
          VBucket: 739 (0x02e3)
          [Value Length: 0]
          Total Body Length: 32
          Opaque: 0x0000002a
          CAS: 0x15fe19cdc8b10000
          Extras
          Key: testing1234
      

      This is obviously rejected by the consumer as it is expected an extras length of 18, rather than 21, as covered in https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/commands/deletion.md.

      Note: Haven't attached cbcollect_infos as the issue is readily reproducible 100% of the time by following the steps, if needed I can go ahead and generate these though.

      Workaround
      This has the net effect that for any cluster where there are deleted documents, it is impossible to directly upgrade from Couchbase Server <5.5.0 to Couchbase Server 6.5.0.
      The workaround is to upgrade to an intermediate version before moving to 6.5.0, such as Couchbase Server 6.0.4, which through testing does not show the same issue.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thuan Thuan Nguyen
            matt.carabine Matt Carabine (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty