Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43061

Resume Time - 15x Increase

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: Cheshire-Cat
    • Fix Version/s: 7.0.0
    • Component/s: eventing, test-execution
    • Labels:
    • Environment:
      7.0.0-3874
    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      Unknown

      Description

      We are observing a 15x increase in Resume Time for the following test : 

      Resume time(sec), 1 bucket x 100M x 1KB, 4KV + 1Eventing node, 4KV + 1Eventing node, single function-bucket-op 

      Measurements:

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Show
          prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - Looks like it is arising from build 7.0.0-3854 : http://172.23.123.43:10000/?product=couchbase-server&fromVersion=7.0.0&fromBuild=3853&toVersion=7.0.0&toBuild=3855&f_couchstore=on&f_kv_engine=on&f_tlm=on     
          Hide
          jeelan.poola Jeelan Poola added a comment -

          Prajwal‌ Kiran Kumar‌ No change in eventing between these 2 builds (3853 & 3855). Suspect its a perf test measurement anomaly. Resume should generally take same amount of time as deploy. Could you please double check how time is measured in the test?

          Show
          jeelan.poola Jeelan Poola added a comment - Prajwal‌ Kiran Kumar‌ No change in eventing between these 2 builds (3853 & 3855). Suspect its a perf test measurement anomaly. Resume should generally take same amount of time as deploy. Could you please double check how time is measured in the test?
          Hide
          prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - - edited

          Jeelan Poola Vinayaka Kamath  - On build 3854 ( Observed the Resume time was 1.3s) - the following was observed when run in succession after resuming a paused function 

          prajwalkirankumar@Prajwals-MacBook-Pro ~ % curl -X GET  http://172.23.104.242:8096/api/v1/status/ -u Administrator:password
           {
           "apps": [
            
          {   "composite_status": "paused",   "name": "test",   "num_bootstrapping_nodes": 0,   "num_deployed_nodes": 1,   "deployment_status": true,   "processing_status": false  }
          ],
           "num_eventing_nodes": 1
           }%
           prajwalkirankumar@Prajwals-MacBook-Pro ~ % curl -X GET http://172.23.104.242:8096/api/v1/status/ -u Administrator:password
           {
           "apps": [
            
          {   "composite_status": "deployed",   "name": "test",   "num_bootstrapping_nodes": 0,   "num_deployed_nodes": 1,   "deployment_status": true,   "processing_status": true  }
          ],
           "num_eventing_nodes": 1
           }%
           prajwalkirankumar@Prajwals-MacBook-Pro ~ % curl -X GET http://172.23.104.242:8096/api/v1/status/ -u Administrator:password
           {
           "apps": [
            
          {   "composite_status": "deploying",   "name": "test",   "num_bootstrapping_nodes": 1,   "num_deployed_nodes": 1,   "deployment_status": true,   "processing_status": true  }
          ],
           "num_eventing_nodes": 1
           }%
          

           

          The deployed status that is received causes the test to assume the function deployment is done and doesnt further listen to check the status. This is why we observed very low resume times in builds before 3855. From 3855 onwards , we do not seem to see this behaviour and we are observing higher ( probably actual ) values of resume time as the status returned is deploying. Please let us know if there is any change in the way we need to listen to these APIs to avoid this anomaly. 

          Show
          prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - - edited Jeelan Poola Vinayaka Kamath   - On build 3854 ( Observed the Resume time was 1.3s) - the following was observed when run in succession after resuming a paused function  prajwalkirankumar@Prajwals-MacBook-Pro ~ % curl -X GET http://172.23.104.242:8096/api/v1/status/ -u Administrator:password { "apps": [   {   "composite_status": "paused",   "name": "test",   "num_bootstrapping_nodes": 0,   "num_deployed_nodes": 1,   "deployment_status": true,   "processing_status": false  } ], "num_eventing_nodes": 1 }% prajwalkirankumar@Prajwals-MacBook-Pro ~ % curl -X GET http://172.23.104.242:8096/api/v1/status/ -u Administrator:password { "apps": [   {   "composite_status": "deployed",   "name": "test",   "num_bootstrapping_nodes": 0,   "num_deployed_nodes": 1,   "deployment_status": true,   "processing_status": true  } ], "num_eventing_nodes": 1 }% prajwalkirankumar@Prajwals-MacBook-Pro ~ % curl -X GET http://172.23.104.242:8096/api/v1/status/ -u Administrator:password { "apps": [   {   "composite_status": "deploying",   "name": "test",   "num_bootstrapping_nodes": 1,   "num_deployed_nodes": 1,   "deployment_status": true,   "processing_status": true  } ], "num_eventing_nodes": 1 }%   The deployed status that is received causes the test to assume the function deployment is done and doesnt further listen to check the status. This is why we observed very low resume times in builds before 3855. From 3855 onwards , we do not seem to see this behaviour and we are observing higher ( probably actual ) values of resume time as the status returned is deploying.  Please let us know if there is any change in the way we need to listen to these APIs to avoid this anomaly. 
          Hide
          jeelan.poola Jeelan Poola added a comment -

          Thank you Prajwal‌ Kiran Kumar‌! Unfortunately, as nothing changed in eventing between builds 3853, 3844 & 3855, this purely looks like a race condition. We are trying to improve the accuracy of reported status as part of MB-36552. For the time being, it seems like we should stick to the new values which are aligned with the time it takes to deploy a function.

          Show
          jeelan.poola Jeelan Poola added a comment - Thank you Prajwal‌ Kiran Kumar‌ ! Unfortunately, as nothing changed in eventing between builds 3853, 3844 & 3855, this purely looks like a race condition. We are trying to improve the accuracy of reported status as part of MB-36552 . For the time being, it seems like we should stick to the new values which are aligned with the time it takes to deploy a function.
          Hide
          prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment -

          Okay Thanks Jeelan Poola. Closing this as it is being tracked under MB-36552 .

          Show
          prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - Okay Thanks Jeelan Poola . Closing this as it is being tracked under  MB-36552  .

            People

            Assignee:
            jeelan.poola Jeelan Poola
            Reporter:
            prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty