Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50087

Multi-node XDCR test doesn't report replication_changes_left correctly on build 7.1.0-1885

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.1.0
    • 7.1.0
    • XDCR
    • Untriaged
    • 1
    • Unknown

    Description

      The XDCR test will monitor replication_changes_left after XDCR starts. In this run, xdcr reported replication_changes_left = 0, and it resulted in wrong performance number reported. 

      Avg. initial XDCR rate (items/sec), 5 -> 5 (2 source nozzles, 4 target nozzles), 1 bucket x 1G x 1KB, DGM, s=1, c=1000

      build: 7.1.0-1885

      Job: http://perf.jenkins.couchbase.com/job/titan/12877/

      2021-12-13T23:33:11 [INFO] Adding a remote cluster: 172.23.96.100

      2021-12-13T23:33:11 [INFO] Starting replication with parameters {'replicationType': 'continuous', 'fromBucket': 'bucket-1', 'toBucket': 'bucket-1', 'toCluster': 'perf'}

      2021-12-13T23:33:11 [INFO] Getting remote clusters

      2021-12-13T23:33:11 [INFO] Monitoring XDCR queues: bucket-1

      2021-12-13T23:33:33 [INFO] replication_changes_left reached 0

       

      In a good run, replication_changes_left will decrease gradually. 

      http://perf.jenkins.couchbase.com/job/titan/12397/

      2021-11-01T17:41:31 [INFO] Adding a remote cluster: 172.23.96.100

      2021-11-01T17:41:31 [INFO] Starting replication with parameters {'replicationType': 'continuous', 'fromBucket': 'bucket-1', 'toBucket': 'bucket-1', 'toCluster': 'perf'}

      2021-11-01T17:41:31 [INFO] Getting remote clusters

      2021-11-01T17:41:31 [INFO] Monitoring XDCR queues: bucket-1

      2021-11-01T17:41:53 [INFO] replication_changes_left = 991,768,354

      2021-11-01T17:41:55 [INFO] replication_changes_left = 990,671,216

      2021-11-01T17:41:58 [INFO] replication_changes_left = 988,192,619

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          bo-chun.wang Bo-Chun Wang added a comment -

          I have a good run with build 7.1.0-2061. I close this issue.

          http://perf.jenkins.couchbase.com/job/titan/13220/

           

          bo-chun.wang Bo-Chun Wang added a comment - I have a good run with build 7.1.0-2061. I close this issue. http://perf.jenkins.couchbase.com/job/titan/13220/  

          Build couchbase-server-7.1.0-2069 contains goxdcr commit 95073a5 with commit message:
          MB-50087: fix unit test to use the right data file

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2069 contains goxdcr commit 95073a5 with commit message: MB-50087 : fix unit test to use the right data file

          Build couchbase-server-7.1.0-2061 contains goxdcr commit b1bb9cd with commit message:
          MB-50087: manifest unmarshalling after being serialize by p2p vbpayload

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2061 contains goxdcr commit b1bb9cd with commit message: MB-50087 : manifest unmarshalling after being serialize by p2p vbpayload
          neil.huang Neil Huang added a comment -

          I was able to reproduce and validate the fix locally. Please see the next build and verify.

          neil.huang Neil Huang added a comment - I was able to reproduce and validate the fix locally. Please see the next build and verify.
          neil.huang Neil Huang added a comment - - edited

          The problem is XDCR failed to parse the following payload when received manifests from peers:

              "SrcManifests": {
                "1": {
                  "uid": "1",
                  "scopes": [
                    {
                      "collections": null,
                      "name": "_default",
                      "uid": "0"
                    },
                    {
                      "collections": [
                        {
                          "name": "collection-1",
                          "uid": "8"
                        },
                        {
                          "name": "collection-10",
                          "uid": "9"
                        },
                        {
                          "name": "collection-100",
                          "uid": "a"
                        },
                        {
                          "name": "collection-1000",
                          "uid": "b"
          

          Specifically, we’re looking at the default scope without any default collections.

          Looking at the test set up, we see that scope-1 was set up with an empty collection set.
          I think this is being loaded via bulk collection loader… So I think this means that there is no default collection in the default scope for bucket-1.

          00:03:01     "collections": {
          00:03:01         "bucket-1": {
          00:03:01             "_default": {},
          00:03:01             "scope-1": {
          00:03:01                 "collection-1": {
          00:03:01                     "access": 1,
          00:03:01                     "load": 1
          00:03:01                 },
          00:03:01                 "collection-10": {
          00:03:01                     "access": 1,
          00:03:01                     "load": 1
          00:03:01                 },
          00:03:01                 "collection-100": {
          00:03:01                     "access": 1,
          00:03:01                     "load": 1
          00:03:01                 },
          00:03:01                 "collection-1000": {
          00:03:01                     "access": 1,
          00:03:01                     "load": 1
          00:03:01                 },
          00:03:01                 "collection-101": {
          00:03:01                     "access": 1,
          00:03:01                     "load": 1
          00:03:01                 },
          00:03:01                 "collection-102": {
          00:03:01                     "access": 1,
          00:03:01                     "load": 1
          00:03:01                 },
          

          Normally, this seems to work fine when I try to serialize / deserialize manifests directly.
          However, when manifests are serialized and deserialized as part of the payload, the collection key will have a <nil> value instead of an empty array.
          This seems to be causing error from being parsed.

          neil.huang Neil Huang added a comment - - edited The problem is XDCR failed to parse the following payload when received manifests from peers: … "SrcManifests": { "1": { "uid": "1", "scopes": [ { "collections": null, "name": "_default", "uid": "0" }, { "collections": [ { "name": "collection-1", "uid": "8" }, { "name": "collection-10", "uid": "9" }, { "name": "collection-100", "uid": "a" }, { "name": "collection-1000", "uid": "b" Specifically, we’re looking at the default scope without any default collections. Looking at the test set up, we see that scope-1 was set up with an empty collection set. I think this is being loaded via bulk collection loader… So I think this means that there is no default collection in the default scope for bucket-1. 00:03:01 "collections": { 00:03:01 "bucket-1": { 00:03:01 "_default": {}, 00:03:01 "scope-1": { 00:03:01 "collection-1": { 00:03:01 "access": 1, 00:03:01 "load": 1 00:03:01 }, 00:03:01 "collection-10": { 00:03:01 "access": 1, 00:03:01 "load": 1 00:03:01 }, 00:03:01 "collection-100": { 00:03:01 "access": 1, 00:03:01 "load": 1 00:03:01 }, 00:03:01 "collection-1000": { 00:03:01 "access": 1, 00:03:01 "load": 1 00:03:01 }, 00:03:01 "collection-101": { 00:03:01 "access": 1, 00:03:01 "load": 1 00:03:01 }, 00:03:01 "collection-102": { 00:03:01 "access": 1, 00:03:01 "load": 1 00:03:01 }, Normally, this seems to work fine when I try to serialize / deserialize manifests directly. However, when manifests are serialized and deserialized as part of the payload, the collection key will have a <nil> value instead of an empty array. This seems to be causing error from being parsed.

          People

            bo-chun.wang Bo-Chun Wang
            bo-chun.wang Bo-Chun Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty