Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37085

[System Test]: Rebalance failed due to failed_to_update_vbucket_map

    XMLWordPrintable

Details

    • Task
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.5.0
    • 6.5.0
    • test-execution

    Description

      Build: 6.5.0-4890 , not seen on 6.5.0-4821

      Test : MH longevity

      Day: 2nd

      Cycle: 3rd

      Test Step :- Rebalance in 2 kv nodes

      [2019-11-28T01:25:04-08:00, sequoiatools/couchbase-cli:6.5:1fa4ed] server-add -c 172.23.97.74:8091 --server-add https://172.23.96.14 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2019-11-28T01:26:19-08:00, sequoiatools/couchbase-cli:6.5:154a78] server-add -c 172.23.97.74:8091 --server-add https://172.23.96.190 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2019-11-28T01:26:37-08:00, sequoiatools/couchbase-cli:6.5:90411b] rebalance -c 172.23.97.74:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.97.74:8091 -u Administrator -p password]
       
      docker logs 90411b
      docker start 90411b
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2019-11-28T02:19:56-08:00, sequoiatools/cmd:d72260] 60 

      Error

      [user:error,2019-11-28T02:19:48.143-08:00,ns_1@172.23.97.74:<0.15848.0>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {mover_crashed,
                                    {unexpected_exit,
                                     {'EXIT',<0.834.965>,
                                      {failed_to_update_vbucket_map,"HISTORY",583,
                                       {error,[{'ns_1@172.23.96.191',timeout}]}}}}}. 

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            ritam.sharma Ritam Sharma added a comment -

            Closing the issue, since not seen with latest build - 6.5.0-4960

            ritam.sharma Ritam Sharma added a comment - Closing the issue, since not seen with latest build - 6.5.0-4960
            ritam.sharma Ritam Sharma added a comment - - edited

            Will adjust the quota setting on the node, also check if an additional node can help too. Converting this to task, if issue persist will convert to defect.

            ritam.sharma Ritam Sharma added a comment - - edited Will adjust the quota setting on the node, also check if an additional node can help too. Converting this to task, if issue persist will convert to defect.

            Node .96.191 is in swap:

            MemTotal:       24514172 kB
            MemFree:          244924 kB
            MemAvailable:     112272 kB
            Buffers:               0 kB
            Cached:            98644 kB
            ...
            SwapTotal:       3670012 kB
            SwapFree:        1639304 kB
            ...
            

            Which makes ns_server slow and results in said timeout.

            Most memory is used by analytics. It's slightly above its quota of 21GiB (though it's hard to say by how much, since the only statistic available to us is "resident set size" which doesn't include memory that was swapped out). Regardless, it's quite probable that even if analytics adhered to its quota exactly, the system would still be struggling. I think it'd be great if we new exactly what are safe quotas we can run on, but I don't believe this within the scope of this test. So I suggest that the analytics quota be lowered to give ns_server and the rest of quota-less services more room.

            Aliaksey Artamonau Aliaksey Artamonau added a comment - Node .96.191 is in swap: MemTotal: 24514172 kB MemFree: 244924 kB MemAvailable: 112272 kB Buffers: 0 kB Cached: 98644 kB ... SwapTotal: 3670012 kB SwapFree: 1639304 kB ... Which makes ns_server slow and results in said timeout. Most memory is used by analytics. It's slightly above its quota of 21GiB (though it's hard to say by how much, since the only statistic available to us is "resident set size" which doesn't include memory that was swapped out). Regardless, it's quite probable that even if analytics adhered to its quota exactly, the system would still be struggling. I think it'd be great if we new exactly what are safe quotas we can run on, but I don't believe this within the scope of this test. So I suggest that the analytics quota be lowered to give ns_server and the rest of quota-less services more room.

            Vikas Chaudhary, can you please attach the logs (supportal link forces people to use vpn, where it's entirely unnecessary)?

            Aliaksey Artamonau Aliaksey Artamonau added a comment - Vikas Chaudhary , can you please attach the logs (supportal link forces people to use vpn, where it's entirely unnecessary)?

            People

              ritam.sharma Ritam Sharma
              vikas.chaudhary Vikas Chaudhary
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty