Details
-
Bug
-
Resolution: Fixed
-
Major
-
3.1.3
-
None
-
Untriaged
-
Yes
Description
The bitmasks in the view group header can suddenly be 0.
Grep ns_server.couchdb.log in
for `"Partition 635 not in active nor passive set"`. You'll find the index header of the group of the state during the crash doesn't contain the correct bitmasks of the partitions:
[couchdb:info,2016-04-15T12:54:44.432,ns_1@172.23.106.14:<0.9362.0>:couch_log:info:41]Set view `default`, main (prod) group `_design/scale`, signature `ba2626a1dc54a5ad249408fb183f1fc2`, terminating with reason: {function_clause,
|
|
...
|
|
{{badmatch,
|
{error,
|
{error,
|
<<"Partition 635 not in active nor passive set">>}}},
|
|
...
|
|
{set_view_index_header,
|
2,
|
1024,
|
0,
|
0,
|
0,
|
The active, passive and cleanup bitmasks are `0`. Although a previous log message indicates that they were write shortly before:
[couchdb:info,2016-04-15T12:54:44.019,ns_1@172.23.106.14:<0.9362.0>:couch_log:info:41]Set view `default`, main (prod) group `_design/scale`, partition states updated
|
active partitions before: [23,24,39,44,45,46,49,50,53,60,63,65,78,79,80,82,83,86,90,99,101,118,119,141,152,153,155,156,166,173,175,176,185,191,200,201,204,211,212,213,214,234,242,246,248,253,268,270,282,291,293,294,325,347,
|
352,362,375,397,404,426,428,429,438,446,448,449,457,462,477,502,503,527,528,539,543,544,549,561,582,593,602,640,643,644,645,661,662,664,675,683,697,723,724,726,743,757,766,769,790,833,877,878,883,890,891,899,900,920,924,931,93
|
5,948,949,976,977,986,1003,1007]
|
active partitions after: [23,24,39,44,45,46,49,50,53,60,63,65,78,79,80,82,83,86,90,99,101,118,119,141,152,153,155,156,166,173,175,176,185,191,200,201,204,211,212,213,214,234,242,246,248,253,268,270,282,291,293,294,325,347,
|
352,362,375,397,404,426,428,429,438,446,448,449,457,462,477,502,503,527,528,539,543,544,549,561,582,593,602,640,643,644,645,661,662,664,675,683,697,723,724,726,743,757,766,769,790,833,877,878,883,890,891,899,900,920,924,931,93
|
5,948,949,976,977,986,1003,1007]
|
passive partitions before: [635,665,756,758,818,822]
|
passive partitions after: [558,635,665,756,758,818,822]
|
cleanup partitions before: []
|
cleanup partitions after: []
|
This happened a short time after duplicated partition versions occurred:
[couchdb:error,2016-04-15T12:54:44.010,ns_1@172.23.106.14:<0.9362.0>:couch_log:error:44]set view `default`, mapreduce_view main (prod) group `_design/scale` have the duplicate partition versions [{23,
|
So it might be related to MB-19221.
Rest of the logs of the cluster:
https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.60.zip
https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.61.zip
https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.62.zip
https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.63.zip
https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.83.zip
https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.106.96.zip
The logs are from a 3.1.5 build, but a similar error occurred also in 3.1.3. The logs were collected as part of MB-19187,
Attachments
Issue Links
- is triggered by
-
MB-19187 rebalance exited {file_already_opened, "/data/@indexes/default
- Open