Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45907

Eventing: Data miss match with SBM + multiple kv rebalance

    XMLWordPrintable

Details

    Description

      Build: 7.0.0-5017

      Test

      ./testrunner -i /tmp/testexec.18313.ini -p get-cbcollect-info=True,GROUP=bucket_op,skip_log_scan=False,bucket_storage=couchstore -t eventing.eventing_rebalance_collection.EventingRebalanceCollection.test_eventing_rebalance_with_multiple_kv_nodes,doc-per-day=20,dataset=default,nodes_init=5,services_init=kv-kv-kv-eventing-index:n1ql,groups=simple,reset_services=True,GROUP=bucket_op,java_sdk_client=True,get-cbcollect-info=True,skip_log_scan=False,bucket_storage=couchstore 
       
       
      Exception: missing data in destination bucket. Current : 94233 Expected : 120960  dcp_backlog : 0  TIMERS_IN_PAST : 0 lcb_exceptions : {'172.23.122.171': {}, '172.23.122.172': {}}

      Seeing no lcb exceptions in the stats

      Attachments

        1. 172.23.122.170-20210424-2153-diag.zip
          6.70 MB
        2. 172.23.122.171-20210424-2153-diag.zip
          13.87 MB
        3. 172.23.122.172-20210424-2153-diag.zip
          13.11 MB
        4. 172.23.122.173-20210424-2153-diag.zip
          12.29 MB
        5. 172.23.122.174-20210424-2153-diag.zip
          14.88 MB
        6. 172.23.122.175-20210424-2153-diag.zip
          12.66 MB
        7. 172.23.122.176-20210424-2153-diag.zip
          13.23 MB
        8. 172.23.122.177-20210424-2153-diag.zip
          11.77 MB
        9. mb-45907_testlogs.log
          579 kB
        10. mb-54907_stats
          67 kB
        11. Test_logs.txt
          1.02 MB
        12. test_output.log
          398 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            verified with 7.0.0-5102 build and issue is not observed

            chanabasappa.ghali Chanabasappa Ghali added a comment - verified with 7.0.0-5102 build and issue is not observed
            ankit.prabhu Ankit Prabhu added a comment -

            Looking at test log the data mismatch happened for curl test handler.

            2021-05-05 21:17:00 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code dst_bucket.scope_1.coll_3 to complete bucket operations... Current : 38032 Expected : 40320 ...
            

            If there is some exception while sending/receiving request then function skips writing to destination scope.

            function OnUpdate(doc, meta) {
                var request = {
                    path : 'job/test_suite_executor/api/json?tree=jobs[component]'
                }
                try {
                    var response = curl("GET", server, request);
                    log('response body received from server:', response.body);
                    log('response headers received from server:', response.headers);
                    log('response status received from server:', response.status);
                    var res= new Uint8Array(response.body);
                    if(response.status == 200){
                        var result= couchbase.insert(dst_bucket,meta,response.body);
                        log(result);
                    }
                    else{
                        var result= couchbase.insert(dst_bucket,meta,response.status);
                        log(result);
                    }
                }
                catch (e) {
                    log('error:', e);
                    }
            }
             
             
             
            function OnDelete(meta) {
                var request = {
                    path : 'job/test_suite_executor/api/json?tree=jobs[component]'
                };
                try {
                    var response = curl("GET", server, request);
                    log('response body received from server:', response.body);
                    log('response headers received from server:', response.headers);
                    log('response status received from server:', response.status);
                    var res= new Uint8Array(response.body);
                    var doc_meta={"id":meta.id};
                    var result = couchbase.delete(dst_bucket,doc_meta);
                    log(result);
                }
                catch (e) {
                    log('error:', e);
                    }
            }
            

            Looking at the curl function stats eventing executed all 40320 mutations and since function is in try catch block it won't increment on_update_failure if there is some exception. From stats there are curl failure count as 2286 and curl timeout 2 which gives total of 38032 mutations written to dst collection.

              "app_worker_setting_events_lost": 0,
                        "bkt_ops_cas_mismatch_count": 0,
                        "bucket_op_exception_count": 0,
                        "checkpoint_failure_count": 0,
                        "curl_failure_count": 2286,
                        "curl_max_resp_size_exceeded": 0,
                        "curl_non_200_response": 0,
                        "curl_timeout_count": 2,
                        "dcp_events_lost": 0,
                        "debugger_events_lost": 0,
                        "delete_events_lost": 0,
                        "mutation_events_lost": 0,
                        "n1ql_op_exception_count": 0,
                        "timeout_count": 0,
                        "timer_callback_missing_counter": 0,
                        "timer_context_size_exceeded_counter": 0,
                        "timer_events_lost": 0
            

            Function is logging curl error to application logs.
            Chanabasappa Ghali Could you please check application logs of curl function to check why these curl failure happened?

            One such error is unable to resolve host.

               "last_log": "<ud>\"error:\" {\"message\":\"Unable to perform the request: Couldn't resolve host name\",\"stack\":\"Error\\n    at OnUpdate (curl.js:7:21)\"} </ud>"
            

            ankit.prabhu Ankit Prabhu added a comment - Looking at test log the data mismatch happened for curl test handler. 2021-05-05 21:17:00 | INFO | MainProcess | test_thread | [basetestcase.sleep] sleep for 30 secs. Waiting for handler code dst_bucket.scope_1.coll_3 to complete bucket operations... Current : 38032 Expected : 40320 ... If there is some exception while sending/receiving request then function skips writing to destination scope. function OnUpdate(doc, meta) { var request = { path : 'job/test_suite_executor/api/json?tree=jobs[component]' } try { var response = curl("GET", server, request); log('response body received from server:', response.body); log('response headers received from server:', response.headers); log('response status received from server:', response.status); var res= new Uint8Array(response.body); if(response.status == 200){ var result= couchbase.insert(dst_bucket,meta,response.body); log(result); } else{ var result= couchbase.insert(dst_bucket,meta,response.status); log(result); } } catch (e) { log('error:', e); } }       function OnDelete(meta) { var request = { path : 'job/test_suite_executor/api/json?tree=jobs[component]' }; try { var response = curl("GET", server, request); log('response body received from server:', response.body); log('response headers received from server:', response.headers); log('response status received from server:', response.status); var res= new Uint8Array(response.body); var doc_meta={"id":meta.id}; var result = couchbase.delete(dst_bucket,doc_meta); log(result); } catch (e) { log('error:', e); } } Looking at the curl function stats eventing executed all 40320 mutations and since function is in try catch block it won't increment on_update_failure if there is some exception. From stats there are curl failure count as 2286 and curl timeout 2 which gives total of 38032 mutations written to dst collection. "app_worker_setting_events_lost": 0, "bkt_ops_cas_mismatch_count": 0, "bucket_op_exception_count": 0, "checkpoint_failure_count": 0, "curl_failure_count": 2286, "curl_max_resp_size_exceeded": 0, "curl_non_200_response": 0, "curl_timeout_count": 2, "dcp_events_lost": 0, "debugger_events_lost": 0, "delete_events_lost": 0, "mutation_events_lost": 0, "n1ql_op_exception_count": 0, "timeout_count": 0, "timer_callback_missing_counter": 0, "timer_context_size_exceeded_counter": 0, "timer_events_lost": 0 Function is logging curl error to application logs. Chanabasappa Ghali Could you please check application logs of curl function to check why these curl failure happened? One such error is unable to resolve host. "last_log": "<ud>\"error:\" {\"message\":\"Unable to perform the request: Couldn't resolve host name\",\"stack\":\"Error\\n at OnUpdate (curl.js:7:21)\"} </ud>"

            Exception: missing data in destination bucket. Current : 38032 Expected : 40320  dcp_backlog : 0  TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}

            chanabasappa.ghali Chanabasappa Ghali added a comment - Exception: missing data in destination bucket. Current : 38032 Expected : 40320  dcp_backlog : 0  TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}
            ankit.prabhu Ankit Prabhu added a comment -

            Chanabasappa Ghali, In the latest run, mutation phase of test failed with unable to access jar file

            Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar
            

            2021-05-05 10:56:40 | INFO | MainProcess | Cluster_Thread | [task.execute_for_collection] java -jar java_sdk_client/collections/target/javaclient/javaclient.jar -i 10.112.206.101 -u Administrator -p password -b src_bucket -s _default -c _default -n 40320 -pc 100 -pu 0 -pd 0 -l uniform -dsn 1 -dpx doc_ -dt Person -de 0 -ds 500 -ac False -st 0 -en 40319 -o False -sd False Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar [<FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 890 in _bootstrap>, <FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 932 in _bootstrap_inner>, <FrameSummary file lib/tasks/taskmanager.py, line 34 in run>, <FrameSummary file lib/tasks/task.py, line 102 in step>, <FrameSummary file lib/tasks/task.py, line 6436 in execute>, <FrameSummary file lib/tasks/task.py, line 6424 in execute_for_collection>, <FrameSummary file lib/tasks/future.py, line 265 in set_exception>] Wed May 5 10:56:40 2021 2021-05-05 10:56:43 | INFO | MainProcess | test_thread | [rest_client.rebalance_reached] rebalance reached >100% in 3.006394863128662 seconds 2021-05-05 10:57:10 | INFO | MainProcess | Cluster_Thread | [task.execute_for_collection] java -jar java_sdk_client/collections/target/javaclient/javaclient.jar -i 10.112.206.101 -u Administrator -p password -b src_bucket -s scope_1 -c coll_1 -n 40320 -pc 100 -pu 0 -pd 0 -l uniform -dsn 1 -dpx doc_ -dt Person -de 0 -ds 500 -ac False -st 0 -en 40319 -o False -sd False Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar [<FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 890 in _bootstrap>, <FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 932 in _bootstrap_inner>, <FrameSummary file lib/tasks/taskmanager.py, line 34 in run>, <FrameSummary file lib/tasks/task.py, line 102 in step>, <FrameSummary file lib/tasks/task.py, line 6436 in execute>, <FrameSummary file lib/tasks/task.py, line 6424 in execute_for_collection>, <FrameSummary file lib/tasks/future.py, line 265 in set_exception>]
            

             Could you please re run this test after fixing that error.

            ankit.prabhu Ankit Prabhu added a comment - Chanabasappa Ghali , In the latest run, mutation phase of test failed with unable to access jar file Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar 2021-05-05 10:56:40 | INFO | MainProcess | Cluster_Thread | [task.execute_for_collection] java -jar java_sdk_client/collections/target/javaclient/javaclient.jar -i 10.112.206.101 -u Administrator -p password -b src_bucket -s _default -c _default -n 40320 -pc 100 -pu 0 -pd 0 -l uniform -dsn 1 -dpx doc_ -dt Person -de 0 -ds 500 -ac False -st 0 -en 40319 -o False -sd False Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar [<FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 890 in _bootstrap>, <FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 932 in _bootstrap_inner>, <FrameSummary file lib/tasks/taskmanager.py, line 34 in run>, <FrameSummary file lib/tasks/task.py, line 102 in step>, <FrameSummary file lib/tasks/task.py, line 6436 in execute>, <FrameSummary file lib/tasks/task.py, line 6424 in execute_for_collection>, <FrameSummary file lib/tasks/future.py, line 265 in set_exception>] Wed May 5 10:56:40 2021 2021-05-05 10:56:43 | INFO | MainProcess | test_thread | [rest_client.rebalance_reached] rebalance reached >100% in 3.006394863128662 seconds 2021-05-05 10:57:10 | INFO | MainProcess | Cluster_Thread | [task.execute_for_collection] java -jar java_sdk_client/collections/target/javaclient/javaclient.jar -i 10.112.206.101 -u Administrator -p password -b src_bucket -s scope_1 -c coll_1 -n 40320 -pc 100 -pu 0 -pd 0 -l uniform -dsn 1 -dpx doc_ -dt Person -de 0 -ds 500 -ac False -st 0 -en 40319 -o False -sd False Error: Unable to access jarfile java_sdk_client/collections/target/javaclient/javaclient.jar [<FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 890 in _bootstrap>, <FrameSummary file /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/threading.py, line 932 in _bootstrap_inner>, <FrameSummary file lib/tasks/taskmanager.py, line 34 in run>, <FrameSummary file lib/tasks/task.py, line 102 in step>, <FrameSummary file lib/tasks/task.py, line 6436 in execute>, <FrameSummary file lib/tasks/task.py, line 6424 in execute_for_collection>, <FrameSummary file lib/tasks/future.py, line 265 in set_exception>]  Could you please re run this test after fixing that error.

            With Enterprise Edition 7.0.0 build 5102, same exception

            Exception: missing data in destination bucket. Current : 0 Expected : 40320  dcp_backlog : 0  TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}

            chanabasappa.ghali Chanabasappa Ghali added a comment - With Enterprise Edition 7.0.0 build 5102, same exception Exception: missing data in destination bucket. Current : 0 Expected : 40320  dcp_backlog : 0  TIMERS_IN_PAST : 0 lcb_exceptions : {'10.112.206.104': {}}

            People

              chanabasappa.ghali Chanabasappa Ghali
              vikas.chaudhary Vikas Chaudhary
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty