Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19245

Partition bitmasks are empty

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      The bitmasks in the view group header can suddenly be 0.

      Grep ns_server.couchdb.log in

      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.106.14.zip

      for `"Partition 635 not in active nor passive set"`. You'll find the index header of the group of the state during the crash doesn't contain the correct bitmasks of the partitions:

      [couchdb:info,2016-04-15T12:54:44.432,ns_1@172.23.106.14:<0.9362.0>:couch_log:info:41]Set view `default`, main (prod) group `_design/scale`, signature `ba2626a1dc54a5ad249408fb183f1fc2`, terminating with reason: {function_clause,
       
      ...
       
      {{badmatch,
        {error,
         {error,
          <<"Partition 635 not in active nor passive set">>}}},
       
      ...
       
      {set_view_index_header,
       2,
       1024,
       0,
       0,
       0,
      

      The active, passive and cleanup bitmasks are `0`. Although a previous log message indicates that they were write shortly before:

      [couchdb:info,2016-04-15T12:54:44.019,ns_1@172.23.106.14:<0.9362.0>:couch_log:info:41]Set view `default`, main (prod) group `_design/scale`, partition states updated
      active partitions before:    [23,24,39,44,45,46,49,50,53,60,63,65,78,79,80,82,83,86,90,99,101,118,119,141,152,153,155,156,166,173,175,176,185,191,200,201,204,211,212,213,214,234,242,246,248,253,268,270,282,291,293,294,325,347,
      352,362,375,397,404,426,428,429,438,446,448,449,457,462,477,502,503,527,528,539,543,544,549,561,582,593,602,640,643,644,645,661,662,664,675,683,697,723,724,726,743,757,766,769,790,833,877,878,883,890,891,899,900,920,924,931,93
      5,948,949,976,977,986,1003,1007]
      active partitions after:     [23,24,39,44,45,46,49,50,53,60,63,65,78,79,80,82,83,86,90,99,101,118,119,141,152,153,155,156,166,173,175,176,185,191,200,201,204,211,212,213,214,234,242,246,248,253,268,270,282,291,293,294,325,347,
      352,362,375,397,404,426,428,429,438,446,448,449,457,462,477,502,503,527,528,539,543,544,549,561,582,593,602,640,643,644,645,661,662,664,675,683,697,723,724,726,743,757,766,769,790,833,877,878,883,890,891,899,900,920,924,931,93
      5,948,949,976,977,986,1003,1007]
      passive partitions before:   [635,665,756,758,818,822]
      passive partitions after:    [558,635,665,756,758,818,822]
      cleanup partitions before:   []
      cleanup partitions after:    []
      

      This happened a short time after duplicated partition versions occurred:

      [couchdb:error,2016-04-15T12:54:44.010,ns_1@172.23.106.14:<0.9362.0>:couch_log:error:44]set view `default`, mapreduce_view main (prod) group `_design/scale` have the duplicate partition versions [{23,
      

      So it might be related to MB-19221.

      Rest of the logs of the cluster:

      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.60.zip
      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.61.zip
      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.62.zip
      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.63.zip
      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.105.83.zip
      https://s3.amazonaws.com/scalability-mcafee/MB-19187/1859-2hrs/collectinfo-2016-04-15T195855-ns_1%40172.23.106.96.zip

      The logs are from a 3.1.5 build, but a similar error occurred also in 3.1.3. The logs were collected as part of MB-19187,

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ericcooper Eric Cooper (Inactive)
              vmx Volker Mische
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty