Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6049

Vbucket not marked as active in index or wrong vbucket map passed to view merger

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • 2.0-beta
    • None
    • ns_server
    • Security Level: Public
    • None

    Description

      While running the test suite py-view-pre-merge.conf, once and only once so far, reached a case where queries were being retried forever (or too long, but where no progress seemed to happen and the system was basically idle) because vbucket 0 was not marked as active in an index but ns_server was passing a vbucket map to the view merger where vbucket 0 was listed as active:

      [couchdb:info] [2012-07-29 17:42:02] [n_0@192.168.1.80:<0.6954.1>:couch_log:info:39] Set view `default`, group `_design/dev_test_view-b2fa892`, missing partitions: [0]
      [couchdb:info] [2012-07-29 17:42:07] [n_0@192.168.1.80:<0.6978.1>:couch_log:info:39] Set view `default`, group `_design/dev_test_view-b2fa892`, missing partitions: [0]
      [couchdb:info] [2012-07-29 17:42:12] [n_0@192.168.1.80:<0.7006.1>:couch_log:info:39] Set view `default`, group `_design/dev_test_view-b2fa892`, missing partitions: [0]
      (.... repeated lots of times ...)

      In views.1 log (used by ns_server's capi_set_view_manager), noticed that vbucket 0 was marked for cleanup in the main index (where it was previously marked as active) at timestamp "2012-07-29 17:41:55", and requested to be removed from the replica index as well (but it was a no-op since it was not marked as replica).

      The queries that started failing happened around timestamp "2012-07-29 17:42:02", shortly after vbucket 0 was marked for cleanup in main index of node n_0.

      This can be seen in the logs of node n_0 at the end of views.1 and couchdb.1 (state transitions in both logs seem to match each other).

      Not sure if this means that node n_0 was not supposed to mark vbucket 0 for cleanup, or if it later was supposed to mark it again as active. Vbucket 0 doesn't seem to be marked as active in any of the other 3 nodes as well.

      Logs attached.

      Attachments

        1. 10.3.121.104-8091-diag.txt.gz
          1.17 MB
        2. 10.3.121.105-8091-diag.txt.gz
          1.06 MB
        3. 10.3.121.110-8091-diag.txt.gz
          1.08 MB
        4. 10.3.121.111-8091-diag.txt.gz
          3.02 MB
        5. logs-ns-server.tgz
          2.90 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            FilipeManana Filipe Manana (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty