Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51122

Eventing: Handlers taking long time to bootstrap for RQG test

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      QE TEST

      ./testrunner -i /tmp/testexec.24609.ini -p get-cbcollect-info=False,get-cbcollect-info=True -t eventing.eventing_rqg.EventingRQG.test_random_n1ql,nodes_init=4,services_init=kv-eventing-index-n1ql,template_file=b/resources/rqg/simple_table_db/query_tests_using_templates/query_10000_fields.txt.zip,dataset=default,groups=simple,reset_services=True,skip_cleanup=True,number_of_handler=3,number_of_queries=50
      

      TEST STEP
      Following 3 handlers are created and deployed one by one.

      # Function_468433617_test_random_n1ql022222013147731261
      # Function_468433617_test_random_n1ql022222013147731573
      # Function_468433617_test_random_n1ql022222013147731413
      

      OBSERVATION
      All 3 handlers took approx. 3 and a half minutes to deploy.

      2022-02-22T01:31:48.825-08:00 [Info] SuperSupervisor::SettingsChangeCallback [0] Function: Function_468433617_test_random_n1ql022222013147731261 current state: 1 requested status for deployment: true processing: true
      2022-02-22T01:35:23.527-08:00 [Info] SuperSupervisor::SettingsChangeCallback [1] Function: Function_468433617_test_random_n1ql022222013147731261 deployment done
      

      2022-02-22T01:35:23.529-08:00 [Info] SuperSupervisor::SettingsChangeCallback [1] Function: Function_468433617_test_random_n1ql022222013147731573 current state: 1 requested status for deployment: true processing: true
      2022-02-22T01:39:00.259-08:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: Function_468433617_test_random_n1ql022222013147731573 deployment done
      

      2022-02-22T01:39:00.261-08:00 [Info] SuperSupervisor::SettingsChangeCallback [2] Function: Function_468433617_test_random_n1ql022222013147731413 current state: 1 requested status for deployment: true processing: true
      2022-02-22T01:42:36.743-08:00 [Info] SuperSupervisor::SettingsChangeCallback [3] Function: Function_468433617_test_random_n1ql022222013147731413 deployment done
      

      On 172.23.105.50
      eventing.log
      Following set of messages are logged for each Vbucket and for all 3 handlers as soon as deployment is triggered.
      (1024 * 3)

      2022-02-22T01:38:21.548-08:00 [Info] Connecting to KV on non-TLS port. 
      2022-02-22T01:38:21.550-08:00 [Info] DCPT[eventing:tbahHeMq-1885:getfailoverlog-src_bucket] ##abcd feed started ...
      2022-02-22T01:38:21.733-08:00 [Info] DCPT[eventing:tbahHeMq-1885:getfailoverlog-src_bucket] doReceive(): connection closed
      2022-02-22T01:38:21.733-08:00 [Info] DCPT[eventing:tbahHeMq-1885:getfailoverlog-src_bucket] ##abcd ... stopped
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 Created initial metadata blob
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::checkIfAlreadyEnqueued [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 not enqueued
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::addToEnqueueMap [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 enqueuing
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 Sending streamRequestInfo size: 0
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::processReqStreamMessages [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 reqStreamCh size: 0 msg: &consumer.streamRequestInfo{startSeqNo:0x0, vb:0x356, vbBlob:(*consumer.vbucketKVBlob)(0xc4200d9520)} Got request to stream
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::deleteFromEnqueueMap [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 deleting from enqueue list
      2022-02-22T01:38:21.733-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 855 isNoEnt: true
      2022-02-22T01:38:21.734-08:00 [Info] Consumer::dcpRequestStreamHandle [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 DCP stream start vbKvAddr: 172.23.105.51:11210 vbuuid: 256902485717204 startSeq: 0 snapshotStart: 0 snapshotEnd: 0
      2022-02-22T01:38:21.734-08:00 [Info] Consumer::dcpRequestStreamHandle [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 Going to make DcpRequestStream call
      2022-02-22T01:38:21.734-08:00 [Info] Consumer::dcpRequestStreamHandle [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 Adding entry into inflightDcpStreams
      2022-02-22T01:38:21.734-08:00 [Info] DCPT[eventing:tbahHeMq-1031:{eventing:tbahHeMq-1030:172.23.105.50:8096_172.23.105.51:11210_worker_Function_468433617_test_random_n1ql022222013147731573_0}/0] ##356 STREAMREQ(854) successful
      2022-02-22T01:38:21.734-08:00 [Info] Consumer::dcpRequestStreamHandle [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 Updated checkpoint blob to indicate STREAMREQ was issued
      2022-02-22T01:38:21.734-08:00 [Info] Consumer::processReqStreamMessages [worker_Function_468433617_test_random_n1ql022222013147731573_0:/tmp/127.0.0.1:8091_0_3757019552.sock:130587] vb: 854 DCP stream successfully requested
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            sujay.gad Sujay Gad added a comment - - edited

            For comparison 3 handlers in above mentioned test took approx. 20-25 secs each to deploy on 6.6.5 - 10080.

            sujay.gad Sujay Gad added a comment - - edited For comparison 3 handlers in above mentioned test took approx. 20-25 secs each to deploy on 6.6.5 - 10080.
            abhishek.jindal Abhishek Jindal added a comment - - edited

            Sujay Gad can you also please share the cbcollect / memcached.log from kv node 172.23.105.51 ? We also see metadata checkpoint docs set operations taking long time (~ 180 milliseconds) hence collectively causing time for set operations for 1024 docs to be ~ 184 seconds = 3 minutes.

            The first suspect is either kv is slow in terms of set ops of network issues between kv and eventing node.

            example - time gap here is 205 ms:

            2022-02-22T01:40:02.633-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731413_0:/tmp/127.0.0.1:8091_0_3058001379.sock:130891] vb: 263 isNoEnt: true
            2022-02-22T01:40:02.838-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731413_0:/tmp/127.0.0.1:8091_0_3058001379.sock:130891] vb: 263 Created initial metadata blob
            

            During this time, we request for:

            • Failover logs
            • one set op of checkpoint doc on metadata bucket

            These two operations take up 200 ms combined which are tied to kv or network performance.
            hence, kv logs will help

            abhishek.jindal Abhishek Jindal added a comment - - edited Sujay Gad can you also please share the cbcollect / memcached.log from kv node 172.23.105.51 ? We also see metadata checkpoint docs set operations taking long time (~ 180 milliseconds) hence collectively causing time for set operations for 1024 docs to be ~ 184 seconds = 3 minutes. The first suspect is either kv is slow in terms of set ops of network issues between kv and eventing node. — example - time gap here is 205 ms: 2022-02-22T01:40:02.633-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731413_0:/tmp/127.0.0.1:8091_0_3058001379.sock:130891] vb: 263 isNoEnt: true 2022-02-22T01:40:02.838-08:00 [Info] Consumer::startDcp [worker_Function_468433617_test_random_n1ql022222013147731413_0:/tmp/127.0.0.1:8091_0_3058001379.sock:130891] vb: 263 Created initial metadata blob During this time, we request for: Failover logs one set op of checkpoint doc on metadata bucket These two operations take up 200 ms combined which are tied to kv or network performance. hence, kv logs will help
            jeelan.poola Jeelan Poola added a comment - - edited

            Wayne SiuLynn Straus This is another back port (of MB-45837) that is needed which is related to MB-50947. We fixed quite a few issues during 7.0 which were so far not fixed in 6.6.5. And as we try to backport MB-50947, all related tickets are showing up. Hence we need to approve MB-51122 as well for 6.6.5-MP2.

            jeelan.poola Jeelan Poola added a comment - - edited Wayne Siu Lynn Straus This is another back port (of MB-45837 ) that is needed which is related to MB-50947 . We fixed quite a few issues during 7.0 which were so far not fixed in 6.6.5. And as we try to backport MB-50947 , all related tickets are showing up. Hence we need to approve MB-51122 as well for 6.6.5-MP2.
            wayne Wayne Siu added a comment -

            Added to 6.6.5-MP2.

            wayne Wayne Siu added a comment - Added to 6.6.5-MP2.

            Build couchbase-server-6.6.5-10087 contains eventing commit 998244a with commit message:
            MB-51122 : Open failover log DCP stream for only the required vbucket

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.5-10087 contains eventing commit 998244a with commit message: MB-51122 : Open failover log DCP stream for only the required vbucket
            sujay.gad Sujay Gad added a comment -

            Closing as deployment time is back to normal for handlers in above mentioned test on 6.6.5-10087.

            sujay.gad Sujay Gad added a comment - Closing as deployment time is back to normal for handlers in above mentioned test on 6.6.5-10087.

            People

              sujay.gad Sujay Gad
              sujay.gad Sujay Gad
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty