Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Won't Fix
Priority: Blocker
Fix Version/s: 2.0-beta
Affects Version/s: None
Component/s: ns_server
Security Level: Public
Labels:
None

Description

While running the test suite py-view-pre-merge.conf, once and only once so far, reached a case where queries were being retried forever (or too long, but where no progress seemed to happen and the system was basically idle) because vbucket 0 was not marked as active in an index but ns_server was passing a vbucket map to the view merger where vbucket 0 was listed as active:

[couchdb:info] [2012-07-29 17:42:02] [n_0@192.168.1.80:<0.6954.1>:couch_log:info:39] Set view `default`, group `_design/dev_test_view-b2fa892`, missing partitions: [0]
[couchdb:info] [2012-07-29 17:42:07] [n_0@192.168.1.80:<0.6978.1>:couch_log:info:39] Set view `default`, group `_design/dev_test_view-b2fa892`, missing partitions: [0]
[couchdb:info] [2012-07-29 17:42:12] [n_0@192.168.1.80:<0.7006.1>:couch_log:info:39] Set view `default`, group `_design/dev_test_view-b2fa892`, missing partitions: [0]
(.... repeated lots of times ...)

In views.1 log (used by ns_server's capi_set_view_manager), noticed that vbucket 0 was marked for cleanup in the main index (where it was previously marked as active) at timestamp "2012-07-29 17:41:55", and requested to be removed from the replica index as well (but it was a no-op since it was not marked as replica).

The queries that started failing happened around timestamp "2012-07-29 17:42:02", shortly after vbucket 0 was marked for cleanup in main index of node n_0.

This can be seen in the logs of node n_0 at the end of views.1 and couchdb.1 (state transitions in both logs seem to match each other).

Not sure if this means that node n_0 was not supposed to mark vbucket 0 for cleanup, or if it later was supposed to mark it again as active. Vbucket 0 doesn't seem to be marked as active in any of the other 3 nodes as well.

Logs attached.