Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7637

bucket does not respond after rebalance: Server error during processing: ["web request failed", {path,"/pools/default/buckets/default"

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.1
    • Fix Version/s: 3.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None

      Description

      http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/175/consoleFull[2013-01-29 21:48:00,345] - [rest_client:896] INFO - rebalance percentage : 97.4609375 %
      [2013-01-29 21:48:10,384] - [rest_client:896] INFO - rebalance percentage : 98.28125 %
      [2013-01-29 21:48:20,432] - [rest_client:896] INFO - rebalance percentage : 99.1015625 %
      [2013-01-29 21:48:30,530] - [task:299] INFO - rebalancing was completed with progress: 100% in 2584.67042708 sec
      [2013-01-29 21:48:32,505] - [data_helper:289] INFO - creating direct client 10.3.121.93:11210 default
      [('testrunner', 325, '<module>', 'result = unittest.TextTestRunner(verbosity=2).run(suite)'), ('/opt/python2.7/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/opt/python2.7/lib/python2.7/unittest/suite.py', 70, '_call', 'return self.run(*args, **kwds)'), ('/opt/python2.7/lib/python2.7/unittest/suite.py', 108, 'run', 'test(result)'), ('/opt/python2.7/lib/python2.7/unittest/case.py', 391, 'call_', 'return self.run(*args, **kwds)'), ('/opt/python2.7/lib/python2.7/unittest/case.py', 327, 'run', 'testMethod()'), ('pytests/rebalance/rebalancein.py', 382, 'rebalance_in_with_ddoc_compaction', 'self.verify_cluster_stats(self.servers[:self.nodes_in + 1])'), ('pytests/basetestcase.py', 368, 'verify_cluster_stats', 'self._wait_for_stats_all_buckets(servers, timeout=(timeout or 120))'), ('pytests/basetestcase.py', 287, '_wait_for_stats_all_buckets', 'task.result(timeout)'), ('lib/tasks/future.py', 161, 'result', 'self.set_exception(TimeoutError())'), ('lib/tasks/future.py', 263, 'set_exception', 'ERROR
      print traceback.extract_stack()')]
      Tue Jan 29 21:53:41 2013
      [2013-01-29 21:53:41,932] - [basetestcase:98] INFO - ============== basetestcase cleanup was started for test #11 rebalance_in_with_ddoc_compaction ==============
      [2013-01-29 21:53:42,080] - [task:351] INFO - Saw ep_queue_size 0 == 0 expected on '10.3.121.93:8091'
      [2013-01-29 21:53:42,147] - [data_helper:289] INFO - creating direct client 10.3.121.93:11210 default
      [2013-01-29 21:53:42,155] - [bucket_helper:127] INFO - deleting existing buckets on [ip:10.3.121.93 port:8091 ssh_username:root, ip:10.3.121.98 port:8091 ssh_username:root, ip:10.3.121.94 port:8091 ssh_username:root, ip:10.3.121.95 port:8091 ssh_username:root, ip:10.3.121.96 port:8091 ssh_username:root, ip:10.3.121.97 port:8091 ssh_username:root, ip:10.3.121.92 port:8091 ssh_username:root]
      [2013-01-29 21:53:42,353] - [rest_client:578] ERROR - http://10.3.121.93:8091/pools/default/buckets/default error 500 reason: unknown ["Unexpected server error, request logged."]
      Exception in thread Cluster_Thread:
      Traceback (most recent call last):
      File "/opt/python2.7/lib/python2.7/threading.py", line 552, in __bootstrap_inner
      self.run()
      File "lib/tasks/taskmanager.py", line 31, in run
      task.step(self)
      File "lib/tasks/task.py", line 55, in step
      self.check(task_manager)
      File "lib/tasks/task.py", line 331, in check
      client = self._get_connection(server)
      File "lib/tasks/task.py", line 365, in _get_connection
      self.conns[server] = MemcachedClientHelper.direct_client(server, self.bucket)
      File "lib/memcached/helper/data_helper.py", line 296, in direct_client
      client.vbucket_count = len(vBuckets)
      TypeError: object of type 'NoneType' has no len()

      [2013-01-29 21:53:45,711] - [bucket_helper:145] INFO - deleted bucket : default from 10.3.121.93
      [2013-01-29 21:53:45,712] - [bucket_helper:215] INFO - waiting for bucket deletion to complete....
      [2013-01-29 21:53:45,723] - [rest_client:109] INFO - existing buckets : []

      after rebalance we tried to check/wait stats ep_queue_size, ep_flusher_todo ( first stacktrace)
      then after 5 minutes we send request to delete bucket and get
      [2013-01-29 21:53:42,353] - [rest_client:578] ERROR - http://10.3.121.93:8091/pools/default/buckets/default error 500 reason: unknown ["Unexpected server error, request logged."]

      [ns_server:info,2013-01-29T21:58:24.639,ns_1@10.3.121.93:<0.32070.194>:compaction_daemon:spawn_bucket_compactor:544]Compacting bucket default with config:
      [{view_fragmentation_threshold,{undefined,undefined}},
      {database_fragmentation_threshold,{undefined,undefined}},

      {parallel_db_and_view_compaction,false}

      ]
      [ns_server:info,2013-01-29T21:58:28.792,ns_1@10.3.121.93:ns_config_rep<0.14294.131>:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.3.121.94'

      [menelaus:warn,2013-01-29T21:58:40.430,ns_1@10.3.121.93:<0.26423.194>:menelaus_web:loop:433]Server error during processing: ["web request failed",

      {path,"/pools/default/buckets/default"}

      ,

      {type,error}

      ,
      {what,{badmatch,not_present}},
      {trace,
      [

      {ns_bucket,live_bucket_nodes,1}

      ,

      {menelaus_stats,basic_stats,1}

      ,

      {menelaus_web_buckets,build_bucket_info,5}

      ,

      {menelaus_web_buckets,handle_bucket_info,5}

      ,

      {menelaus_web_buckets, checking_bucket_access,4}

      ,

      {menelaus_web,loop,3}

      ,

      {mochiweb_http,headers,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}]
      [rebalance:info,2013-01-29T21:58:40.434,ns_1@10.3.121.93:<0.20984.194>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack!

      [rebalance:info,2013-01-29T21:58:40.441,ns_1@10.3.121.93:<0.19872.194>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack!

      [rebalance:info,2013-01-29T21:58:40.443,ns_1@10.3.121.93:<0.19834.194>:ebucketmigrator_srv:do_confirm_sent_messages:701]Got close ack!

      [user:info,2013-01-29T21:58:40.456,ns_1@10.3.121.93:ns_memcached-default<0.11408.184>:ns_memcached:terminate:661]Shutting down bucket "default" on 'ns_1@10.3.121.93' for deletion
      [ns_server:info,2013-01-29T21:58:40.641,ns_1@10.3.121.93:ns_port_memcached<0.14372.131>:ns_port_server:log:171]memcached<0.14372.131>: Tue Jan 29 21:58:40.440409 PST 3: TAP (Consumer) eq_tapq:anon_9978 - disconnected

      I believe that all this time bucket was not available, even though I haven't found anything suspicious in the logs and stats

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        andreibaranouski Andrei Baranouski added a comment - https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.93-1292013-2239-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.92-1292013-2246-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.94-1292013-2242-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.95-1292013-2243-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.96-1292013-2244-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.97-1292013-2245-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-7637/10.3.121.98-1292013-2240-diag.zip
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        The test requested /pools/default/buckets/default in the middle of deleting the bucket. We don't handle this well.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - The test requested /pools/default/buckets/default in the middle of deleting the bucket. We don't handle this well.
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        See my comment above. Should we close it as "won't fix" or just postpone?

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - See my comment above. Should we close it as "won't fix" or just postpone?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Given Aliaksey's input I suggest moving that into 2.1

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Given Aliaksey's input I suggest moving that into 2.1
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        is this something we can fix in the test ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - is this something we can fix in the test ?
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        Seems like test didn't get stats in the first time and it hanged on couple minutes by timeoutError. So we can't identify that something was wrong with bucket by test logs.
        added some changes:
        http://review.couchbase.org/#/c/24380/
        http://review.couchbase.org/#/c/24381/

        will keep this case in mind if it will reproduce later

        Show
        andreibaranouski Andrei Baranouski added a comment - Seems like test didn't get stats in the first time and it hanged on couple minutes by timeoutError. So we can't identify that something was wrong with bucket by test logs. added some changes: http://review.couchbase.org/#/c/24380/ http://review.couchbase.org/#/c/24381/ will keep this case in mind if it will reproduce later
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        what did we decide regarding this bug?

        Show
        andreibaranouski Andrei Baranouski added a comment - what did we decide regarding this bug?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Lets make handling of bucket details during it's deletion part of another story about making in-process-to-be-deleted buckets visible

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Lets make handling of bucket details during it's deletion part of another story about making in-process-to-be-deleted buckets visible
        Hide
        maria Maria McDuff (Inactive) added a comment -

        closing as dupes.

        Show
        maria Maria McDuff (Inactive) added a comment - closing as dupes.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes