Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-1428

Rebalance is hung & beam is at 100% cpu

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • 1.6.0 beta1.1
    • 1.6.0 beta4
    • ns_server
    • None
    • Operating System: All
      Platform: All

    Description

      build 1.6.0beta1-22

      10 - install servers A & B
      20 - memslap/memcachetest them and join them / Rebalance
      30 - remove one / Rebalance
      40 - goto 20

      Eventually, I got into a state where erlang is taking high cpu (70% to 100%), via top.

      And the rebalance never finishes (eg, went to 1 hour mtg and Rebalance is still going).

      Some curl and vbucketmigrator processes...

      on node A...

      $ curl http://10.1.3.184:8080/pools/default{"name":"default","nodes":[{"uptime":"7438","memoryTotal":1048186880,"memoryFree":111570944,"mcdMemoryReserved":799,"mcdMemoryAllocated":799,"clusterMembership":"active","status":"healthy","hostname":"10.1.2.202:8080","version":"1.6.0beta1_48_ga8219db","os":"x86_64-unknown-linux-gnu","ports":

      {"proxy":11211,"direct":11210}

      ,"otpNode":"ns_1@10.1.2.202","otpCookie":"aidwzmjjxonmhhqy"},{"uptime":"7457","memoryTotal":1036898304,"memoryFree":23715840,"mcdMemoryReserved":791,"mcdMemoryAllocated":791,"clusterMembership":"active","status":"healthy","hostname":"10.1.3.184:8080","version":"1.6.0beta1_48_ga8219db","os":"x86_64-unknown-linux-gnu","ports":

      {"proxy":11211,"direct":11210}

      ,"otpNode":"ns_1@10.1.3.184","otpCookie":"aidwzmjjxonmhhqy"}],"buckets":

      {"uri":"/pools/default/buckets"}

      ,"controllers":{"addNode":

      {"uri":"/controller/addNode"}

      ,"rebalance":

      {"uri":"/controller/rebalance"}

      ,"failOver":

      {"uri":"/controller/failOver"}

      ,"reAddNode":

      {"uri":"/controller/reAddNode"}

      ,"ejectNode":

      {"uri":"/controller/ejectNode"}

      ,"testWorkload":{"uri":"/pools/default/controller/testWorkload"}},"balanced":true,"rebalanceStatus":"running","rebalanceProgressUri":"/pools/default/rebalanceProgress","stopRebalanceUri":"/controller/stopRebalance","stats":{"uri":"/pools/default/stats"}}

      1. ps -ef | grep vbuc
        106 6728 4863 0 14:51 ? 00:00:00 ./bin/vbucketmigrator/vbucketmigrator -h 10.1.3.184:11210 -d 10.1.2.202:11210 -v -b 255 -b 254 -b 253 -b 252 -b 251 -b 250 -b 249 -b 248 -b 247 -b 246 -b 245 -b 244 -b 243 -b 242 -b 241 -b 240 -b 239 -b 238 -b 237 -b 236 -b 235 -b 234 -b 233 -b 232 -b 231 -b 230 -b 229 -b 228 -b 227 -b 226 -b 225 -b 224 -b 223 -b 222 -b 221 -b 220 -b 219 -b 218 -b 217 -b 216 -b 215 -b 214 -b 213 -b 212 -b 211 -b 210 -b 209 -b 208 -b 207 -b 206 -b 205 -b 204 -b 203 -b 202 -b 201 -b 200 -b 199 -b 198 -b 197 -b 196 -b 195 -b 194 -b 193 -b 192 -b 191 -b 190 -b 189 -b 188 -b 187 -b 186 -b 185 -b 184 -b 183 -b 182 -b 181 -b 180 -b 179 -b 178 -b 177 -b 176 -b 175 -b 174 -b 173 -b 172 -b 171 -b 170 -b 169 -b 168 -b 167 -b 166 -b 165 -b 164 -b 163 -b 162 -b 161 -b 160 -b 159 -b 158 -b 157 -b 156 -b 155 -b 154 -b 153 -b 152 -b 151 -b 150 -b 149 -b 148 -b 147 -b 146 -b 145 -b 144 -b 143 -b 142 -b 141 -b 140 -b 139 -b 138 -b 137 -b 136 -b 135 -b 134 -b 133 -b 132 -b 131 -b 130 -b 129 -b 128

      on node B...

      $ curl http://10.1.2.202:8080/pools/default
      {"name":"default","nodes":[{"uptime":"7822","memoryTotal":1048186880,"memoryFree":99069952,"mcdMemoryReserved":799,"mcdMemoryAllocated":799,"clusterMembership":"active","status":"healthy","hostname":"10.1.2.202:8080","version":"1.6.0beta1_48_ga8219db","os":"x86_64-unknown-linux-gnu","ports":

      {"proxy":11211,"direct":11210}

      ,"otpNode":"ns_1@10.1.2.202","otpCookie":"aidwzmjjxonmhhqy"},{"uptime":"7841","memoryTotal":1036898304,"memoryFree":30666752,"mcdMemoryReserved":791,"mcdMemoryAllocated":791,"clusterMembership":"active","status":"healthy","hostname":"10.1.3.184:8080","version":"1.6.0beta1_48_ga8219db","os":"x86_64-unknown-linux-gnu","ports":

      {"proxy":11211,"direct":11210}

      ,"otpNode":"ns_1@10.1.3.184","otpCookie":"aidwzmjjxonmhhqy"}],"buckets":

      {"uri":"/pools/default/buckets"}

      ,"controllers":{"addNode":

      {"uri":"/controller/addNode"}

      ,"rebalance":

      {"uri":"/controller/rebalance"}

      ,"failOver":

      {"uri":"/controller/failOver"}

      ,"reAddNode":

      {"uri":"/controller/reAddNode"}

      ,"ejectNode":

      {"uri":"/controller/ejectNode"}

      ,"testWorkload":{"uri":"/pools/default/controller/testWorkload"}},"balanced":true,"rebalanceStatus":"running","rebalanceProgressUri":"/pools/default/rebalanceProgress","stopRebalanceUri":"/controller/stopRebalance","stats":{"uri":"/pools/default/stats"}}

      1. ps -ef | grep vbuc
        999 16126 14634 0 14:54 ? 00:00:00 ./bin/vbucketmigrator/vbucketmigrator -h 10.1.2.202:11210 -d 10.1.3.184:11210 -v -t -b 127

      Attachments

        1. logA.txt
          552 kB
        2. logB.txt
          486 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Unassigned Unassigned
              steve.yen@northscale.com Steve Yen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty