Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45713

[Volume Test]: Doc mismatch for N1QL handler

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Not a Bug
    • Cheshire-Cat
    • 7.0.0
    • eventing

    Description

      Build 7.0.0 - 4955
      Test - tests/eventing/CC/test_eventing_volume.yml -scope tests/eventing/CC/scope_eventing_volume.yml
      Cycle : 1

      Document mismatch seen for N1QL handler

      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
      �No of docs in source and destination : Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']) : 287054
          EventingHelper().run()
      -  File "eventing_helper.py", line 67, in run
          self.verify_doc(options)
      5  File "eventing_helper.py", line 314, in verify_doc
      p    format(source_collection, source_count["results"][0], destination_collection, binding_count["results"][0]))
      �Exception: No of docs in source and destination don't match: Source Bucket(n1ql_op.scope_0.coll0) : 287136, Destination Bucket(['n1ql_op.scope_1.coll0']): 287054
      

      Attachments

        1. n1ql_error.png
          n1ql_error.png
          275 kB
        2. node_182.png
          node_182.png
          284 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            sujay.gad Sujay Gad added a comment -

            dup of MB-45967

            sujay.gad Sujay Gad added a comment - dup of MB-45967

            The difference in document count is coming from KV node .182 between n1ql_op:scope0:coll0 and n1ql_op:scope1:coll0 There's a difference of 5 documents - 1246782 vs 1246777 :

            Around the same time, we observe around 8 n1ql query execution errors being reported by query engine:

            On query node, we do see that those 8 errors are actually query service giving up with NMVB after 12 retries for a particular key (5 failures of which belong to n1ql_op:scope1:coll0 as destination bucket):

            _time=2021-04-27T13:42:44.085-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>CD075150-122_89131</ud> for Keyspace default:n1ql_op.scope_1.coll1. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:42:44.646-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>E7888C92-92_89275</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:42:46.718-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>4F7D8E46-93_89392</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:42:46.982-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>F11A45C5-101_89245</ud> for Keyspace default:n1ql_op.scope_1.coll1. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:43:27.081-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>47E3F194-103_93074</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:43:36.093-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>47E3F194-103_94083</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:43:49.055-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>CD075150-122_95553</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            _time=2021-04-27T13:43:49.877-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>9F647ACB-100_95709</ud> for Keyspace default:n1ql_op.scope_1.coll1. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud>
            

            NMVB might be expected for a few such documents as a rebalance in operation of a KV node was going on at this time.

            Sujay Gad I would recommend that for n1ql tests, the test runner should probably check for n1ql errors too just to ensure that any missing documents are a result of n1ql query execution failures.

            abhishek.jindal Abhishek Jindal added a comment - The difference in document count is coming from KV node .182 between n1ql_op:scope0:coll0 and n1ql_op:scope1:coll0 There's a difference of 5 documents - 1246782 vs 1246777 : Around the same time, we observe around 8 n1ql query execution errors being reported by query engine: On query node, we do see that those 8 errors are actually query service giving up with NMVB after 12 retries for a particular key (5 failures of which belong to n1ql_op:scope1:coll0 as destination bucket): _time=2021-04-27T13:42:44.085-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>CD075150-122_89131</ud> for Keyspace default:n1ql_op.scope_1.coll1. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:42:44.646-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>E7888C92-92_89275</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:42:46.718-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>4F7D8E46-93_89392</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:42:46.982-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>F11A45C5-101_89245</ud> for Keyspace default:n1ql_op.scope_1.coll1. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:43:27.081-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>47E3F194-103_93074</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:43:36.093-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>47E3F194-103_94083</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:43:49.055-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>CD075150-122_95553</ud> for Keyspace default:n1ql_op.scope_1.coll0. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> _time=2021-04-27T13:43:49.877-07:00 _level=ERROR _msg=Failed to perform UPSERT on key <ud>9F647ACB-100_95709</ud> for Keyspace default:n1ql_op.scope_1.coll1. Error - <ud>unable to complete action after 12 attemps: NOT_MY_VBUCKET</ud> NMVB might be expected for a few such documents as a rebalance in operation of a KV node was going on at this time. Sujay Gad I would recommend that for n1ql tests, the test runner should probably check for n1ql errors too just to ensure that any missing documents are a result of n1ql query execution failures.
            sujay.gad Sujay Gad added a comment - - edited Observed with the latest run on 7.0.0-5046. Logs attached. https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.105.190.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.106.154.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.106.182.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.106.242.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.106.243.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.106.253.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.106.255.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.107.89.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.97.213.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-45713/collectinfo-2021-04-28T043341-ns_1%40172.23.97.214.zip

            Sujay Gad resolving this as a dup of MB-44324 . Feel free to reopen if the issue reoccurs even after MB-44324 is fixed.

            abhishek.jindal Abhishek Jindal added a comment - Sujay Gad  resolving this as a dup of  MB-44324 . Feel free to reopen if the issue reoccurs even after MB-44324 is fixed.

            There are few BUCKET_NOT_FOUND error on n1ql0 function which might cause loss of mutation on destination bucket MB-44324

            cbcollect_info_ns_1@172.23.105.186_20210416-035215/ns_server.eventing.log:321161:2021-04-15T20:54:43.231-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:46814] [lcb,server L:1108 I:2076843402] <172.23.106.243:11210> (CTX=0x7f93ec025850,memcached,SRV=0x7f93f401d1d0,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
            cbcollect_info_ns_1@172.23.105.186_20210416-035215/ns_server.eventing.log:321164:2021-04-15T20:54:43.232-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:46814] [lcb,server L:1108 I:1509480749] <172.23.106.243:11210> (CTX=0x7f93ec01e8c0,memcached,SRV=0x7f93f0021650,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:16248:2021-04-15T20:14:00.234-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,instance L:195 I:3629674106] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:21985:2021-04-15T20:14:00.682-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,instance L:195 I:2712145127] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:24137:2021-04-15T20:14:16.353-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,instance L:195 I:1150934072] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:29218:2021-04-15T20:14:16.702-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,instance L:195 I:2793021526] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208382:2021-04-15T20:54:42.269-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,server L:1108 I:2712145127] <172.23.106.243:11210> (CTX=0x7f60c801ce90,memcached,SRV=0x7f60d40147e0,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208385:2021-04-15T20:54:42.281-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,server L:961 I:2712145127] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f60d40147e0,IX=2) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208396:2021-04-15T20:54:42.294-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,server L:1108 I:2793021526] <172.23.106.243:11210> (CTX=0x7f5aa0003ec0,memcached,SRV=0x7f5aac029080,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208399:2021-04-15T20:54:42.299-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,server L:1108 I:1150934072] <172.23.106.243:11210> (CTX=0x7f5aa4004430,memcached,SRV=0x7f5ab4026710,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025)
            cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208402:2021-04-15T20:54:42.308-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,server L:961 I:2793021526] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f5aac029080,IX=2) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system
            

            ankit.prabhu Ankit Prabhu added a comment - There are few BUCKET_NOT_FOUND error on n1ql0 function which might cause loss of mutation on destination bucket  MB-44324 cbcollect_info_ns_1@172.23.105.186_20210416-035215/ns_server.eventing.log:321161:2021-04-15T20:54:43.231-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:46814] [lcb,server L:1108 I:2076843402] <172.23.106.243:11210> (CTX=0x7f93ec025850,memcached,SRV=0x7f93f401d1d0,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025) cbcollect_info_ns_1@172.23.105.186_20210416-035215/ns_server.eventing.log:321164:2021-04-15T20:54:43.232-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:46814] [lcb,server L:1108 I:1509480749] <172.23.106.243:11210> (CTX=0x7f93ec01e8c0,memcached,SRV=0x7f93f0021650,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025) cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:16248:2021-04-15T20:14:00.234-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,instance L:195 I:3629674106] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:21985:2021-04-15T20:14:00.682-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,instance L:195 I:2712145127] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:24137:2021-04-15T20:14:16.353-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,instance L:195 I:1150934072] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:29218:2021-04-15T20:14:16.702-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,instance L:195 I:2793021526] DNS SRV lookup failed: LCB_ERR_UNKNOWN_HOST (1049). Ignore this if not relying on DNS SRV records cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208382:2021-04-15T20:54:42.269-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,server L:1108 I:2712145127] <172.23.106.243:11210> (CTX=0x7f60c801ce90,memcached,SRV=0x7f60d40147e0,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025) cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208385:2021-04-15T20:54:42.281-07:00 [Info] eventing-consumer [worker_n1ql0_0_0:/tmp/127.0.0.1:8091_0_2873195899.sock:73937] [lcb,server L:961 I:2712145127] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f60d40147e0,IX=2) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208396:2021-04-15T20:54:42.294-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,server L:1108 I:2793021526] <172.23.106.243:11210> (CTX=0x7f5aa0003ec0,memcached,SRV=0x7f5aac029080,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025) cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208399:2021-04-15T20:54:42.299-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,server L:1108 I:1150934072] <172.23.106.243:11210> (CTX=0x7f5aa4004430,memcached,SRV=0x7f5ab4026710,IX=2) Got socket error LCB_ERR_SOCKET_SHUTDOWN (1025) cbcollect_info_ns_1@172.23.105.190_20210416-035215/ns_server.eventing.log:208402:2021-04-15T20:54:42.308-07:00 [Info] eventing-consumer [worker_n1ql1_0_0:/tmp/127.0.0.1:8091_0_1870918132.sock:83634] [lcb,server L:961 I:2793021526] <NOHOST:NOPORT> (CTX=(nil),,SRV=0x7f5aac029080,IX=2) Connection attempt failed. Received LCB_ERR_BUCKET_NOT_FOUND (210) from libcouchbase, received 0 from operating system

            People

              sujay.gad Sujay Gad
              sujay.gad Sujay Gad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty