Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40540

Data miss match with rebalance + curl with different topology

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.6.0
    • Fix Version/s: 6.6.0
    • Component/s: eventing
    • Labels:
      None
    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      Yes

      Description

      Build : 6.6.0-7883 passed on 6.6.0-7873
      Topology 4 nodes : kv, eventing:fts, eventing:fts, index:n1q
      Rebalance in 1 node - eventing:fts

       ./testrunner -i /tmp/testexec.74012.ini -p get-cbcollect-info=True,GROUP=bucket_op_curl -t eventing.eventing_rebalance.EventingRebalance.test_rebalance_in_with_different_topologies,doc-per-day=20,GROUP=bucket_op_curl,services_in=eventing,fts,handler_code=bucket_op_curl_jenkins,reset_services=True,dataset=default,services_init=kv-eventing:fts-eventing:fts-index:n1ql,groups=simple,curl=True,get-cbcollect-info=True,nodes_init=4Exception: missing data in destination bucket. Current : 40310 Expected : 40320  dcp_backlog : 0  TIMERS_IN_PAST : 0 lcb_exceptions : {'172.23.98.16': {}, '172.23.97.90': {}, '172.23.98.164': {}}
      

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          vikas.chaudhary Vikas Chaudhary created issue -
          jeelan.poola Jeelan Poola made changes -
          Field Original Value New Value
          Assignee Jeelan Poola [ jeelan.poola ] Ankit Prabhu [ ankit.prabhu ]
          Hide
          jeelan.poola Jeelan Poola added a comment -
          Show
          jeelan.poola Jeelan Poola added a comment - Vikas Chaudhary No changes in eventing in this area between the specified builds http://172.23.123.43:8000/getchangelog?product=couchbase-server&fromb=6.6.0-7873&tob=6.6.0-7883
          Hide
          ankit.prabhu Ankit Prabhu added a comment -

          I ran the test locally. In my test environment 9 documents mismatched.

          2020-07-20 14:43:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:43:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:44:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:44:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:45:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:45:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:46:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:46:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:47:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:47:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:48:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          2020-07-20 14:48:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ...
          

          Looking at the function code, if there is any timeout or exception, function is not writing it into the bucket.

          function OnUpdate(doc, meta) {
              var request = {
          	path : 'job/test_suite_executor/api/json?tree=jobs[component]'
              }
              try {
              	var response = curl("GET", server, request);
              	log('response body received from server:', response.body);
              	log('response headers received from server:', response.headers);
              	log('response status received from server:', response.status);
              	var res= new Uint8Array(response.body);
              	if(response.status == 200){
              	    dst_bucket[meta.id]=response.body;
              	}
              	else{
              	    dst_bucket[meta.id] = response.status;
              	}
              }
              catch (e) {
              	log('error:', e);
                  }
          }
          function OnDelete(meta) {
              var request = {
          	path : 'job/test_suite_executor/api/json?tree=jobs[component]'
              };
              try {
              	var response = curl("GET", server, request);
              	log('response body received from server:', response.body);
              	log('response headers received from server:', response.headers);
              	log('response status received from server:', response.status);
              	var res= new Uint8Array(response.body);
              	if(response.status == 200){
              	    delete dst_bucket[meta.id];
              	}
              	else{
              	    delete dst_bucket[meta.id];
              	}
              }
              catch (e) {
              	log('error:', e);
                  }
          }
          

          Looking at the function log file from all 3 eventing nodes, we got 9 request timeout messages.

          Function_335895254_test_rebalance_in_with_different_topologies.log
          6868:2020-07-20T14:22:03.035+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          35789:2020-07-20T14:36:00.596+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          35790:2020-07-20T14:36:00.813+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          35791:2020-07-20T14:36:00.937+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          

          Function_335895254_test_rebalance_in_with_different_topologies.log
          7039:2020-07-20T14:22:02.637+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          35942:2020-07-20T14:36:01.642+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          

          Function_335895254_test_rebalance_in_with_different_topologies.log
          35344:2020-07-20T14:36:00.366+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          35345:2020-07-20T14:36:01.132+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 
          35346:2020-07-20T14:36:01.144+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n    at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"}
          

          Show
          ankit.prabhu Ankit Prabhu added a comment - I ran the test locally. In my test environment 9 documents mismatched. 2020-07-20 14:43:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:43:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:44:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:44:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:45:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:45:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:46:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:46:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:47:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:47:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:48:02 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... 2020-07-20 14:48:32 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code Function_335895254_test_rebalance_in_with_different_topologies to complete bucket operations... Current : 40311 Expected : 40320 ... Looking at the function code, if there is any timeout or exception, function is not writing it into the bucket. function OnUpdate(doc, meta) { var request = { path : 'job/test_suite_executor/api/json?tree=jobs[component]' } try { var response = curl("GET", server, request); log('response body received from server:', response.body); log('response headers received from server:', response.headers); log('response status received from server:', response.status); var res= new Uint8Array(response.body); if(response.status == 200){ dst_bucket[meta.id]=response.body; } else{ dst_bucket[meta.id] = response.status; } } catch (e) { log('error:', e); } } function OnDelete(meta) { var request = { path : 'job/test_suite_executor/api/json?tree=jobs[component]' }; try { var response = curl("GET", server, request); log('response body received from server:', response.body); log('response headers received from server:', response.headers); log('response status received from server:', response.status); var res= new Uint8Array(response.body); if(response.status == 200){ delete dst_bucket[meta.id]; } else{ delete dst_bucket[meta.id]; } } catch (e) { log('error:', e); } } Looking at the function log file from all 3 eventing nodes, we got 9 request timeout messages. Function_335895254_test_rebalance_in_with_different_topologies.log 6868:2020-07-20T14:22:03.035+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 35789:2020-07-20T14:36:00.596+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 35790:2020-07-20T14:36:00.813+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 35791:2020-07-20T14:36:00.937+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} Function_335895254_test_rebalance_in_with_different_topologies.log 7039:2020-07-20T14:22:02.637+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 35942:2020-07-20T14:36:01.642+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} Function_335895254_test_rebalance_in_with_different_topologies.log 35344:2020-07-20T14:36:00.366+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 35345:2020-07-20T14:36:01.132+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"} 35346:2020-07-20T14:36:01.144+05:30 [INFO] "error:" {"message":"Unable to perform the request: Timeout was reached","stack":"Error\n at OnUpdate (Function_335895254_test_rebalance_in_with_different_topologies.js:7:21)"}
          Hide
          vikas.chaudhary Vikas Chaudhary added a comment -
          Show
          vikas.chaudhary Vikas Chaudhary added a comment - Ankit Prabhu  timeouts were not captured in the stats  http://qa.sc.couchbase.com/job/test_suite_executor/238116/consoleText
          Hide
          jeelan.poola Jeelan Poola added a comment -

          Vikas Chaudhary curl timeouts may not cause script timeouts. As of today, we do not print curl timeout counts in curl stats.

          "curl": {
                          "delete": 0, 
                          "get": 13434, 
                          "head": 0, 
                          "post": 0, 
                          "put": 0
                      }
          

          I remember something similar happened on another issue recently and we requested you to update the JS code to do bucket ops even on curl-op-exceptions like timeout. Could you please check if all of test-runner code is updated with this?

          Show
          jeelan.poola Jeelan Poola added a comment - Vikas Chaudhary curl timeouts may not cause script timeouts. As of today, we do not print curl timeout counts in curl stats. "curl" : { "delete" : 0 , "get" : 13434 , "head" : 0 , "post" : 0 , "put" : 0 } I remember something similar happened on another issue recently and we requested you to update the JS code to do bucket ops even on curl-op-exceptions like timeout. Could you please check if all of test-runner code is updated with this?
          jeelan.poola Jeelan Poola made changes -
          Assignee Ankit Prabhu [ ankit.prabhu ] Vikas Chaudhary [ vikas.chaudhary ]
          Hide
          vikas.chaudhary Vikas Chaudhary added a comment -

          Actual bug is MB-39862

          Show
          vikas.chaudhary Vikas Chaudhary added a comment - Actual bug is  MB-39862
          vikas.chaudhary Vikas Chaudhary made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]

            People

            Assignee:
            vikas.chaudhary Vikas Chaudhary
            Reporter:
            vikas.chaudhary Vikas Chaudhary
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty