Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4500

reduce map show different count number at each node

    Details

      Description

      Install couchbase server 2.0.0r-266 on 3 nodes at ec2.
      Use mcsoda to load 200000 items to cluster.
      Create a view name one and do some query.
      Shutdown one node (A) and failover
      Check reduce count on view one. Ok
      Reinstall couchbase server 2.0.0r-266 on node A and add it back to cluster.
      Rebalance. Ok
      Check reduce count on view one. Ok
      Shutdown node B and failover.
      Check reduce count on cluster. Ok
      Reinstall couchbase server 2.0.0r-266 on node B and add it back to cluster.
      Rebalance Ok.
      Check reduce count on full cluster. Failed
      Restart couchbase server on 3 nodes of cluster.
      See reduce count is different on each node

      1. log153.gz
        605 kB
        Thuan Nguyen
      2. log187.gz
        3.63 MB
        Thuan Nguyen
      3. log206.gz
        2.70 MB
        Thuan Nguyen
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        steve Steve Yen added a comment -

        marking this resolved as Aliaksey A (standing over my desk here) believe it's fixed

        Show
        steve Steve Yen added a comment - marking this resolved as Aliaksey A (standing over my desk here) believe it's fixed
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Tony, can you retest with latest UI? We think we're hitting indexing timeouts. Newer UI will indicate that.

        Also as part of fixing it we need to be able to specify very big timeouts so that any indexing activity can be performed.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Tony, can you retest with latest UI? We think we're hitting indexing timeouts. Newer UI will indicate that. Also as part of fixing it we need to be able to specify very big timeouts so that any indexing activity can be performed.
        Hide
        dipti Dipti Borkar added a comment -

        Is the reduce result incorrect only if node fails over , cluster gets rebalanced and node gets added?

        Show
        dipti Dipti Borkar added a comment - Is the reduce result incorrect only if node fails over , cluster gets rebalanced and node gets added?
        Hide
        filipe manana filipe manana added a comment -

        So greping each log for the last occurrence of "Set view `default`, group `_design/dev_one`, partition states updated", I can see that node 187 has the index without any active partitions defined.

        log187:

        [couchdb:info] [2011-12-01 18:28:54] [ns_1@10.98.186.187:<0.19609.9>:couch_log:info:39] Set view `default`, group `_design/dev_one`, partition states updated
        abitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, abitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        pbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111, pbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        cbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000, cbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000011111111111111111111111111111111111111111111000000000000000000000000000000000000000000111111111111111111111111111111111111111111

        log153:

        [couchdb:info] [2011-12-01 18:28:54] [ns_1@10.124.193.153:<0.32606.0>:couch_log:info:39] Set view `default`, group `_design/dev_one`, partition states updated
        abitmask before 1111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000, abitmask after 1111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        pbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, pbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        cbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, cbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

        log206:

        [couchdb:info] [2011-12-01 18:28:52] [ns_1@10.90.182.206:<0.31550.0>:couch_log:info:39] Set view `default`, group `_design/dev_one`, partition states updated
        abitmask before 1000000000000000000000000000000000000000000000000000000000000000000000000000001110000011111111111111111111111111111111111111111100000000000000000000000000000000000000000001111111111111111111111111111111111111111111000000000000000000000000000000000000000000, abitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000001110000011111111111111111111111111111111111111111100000000000000000000000000000000000000000001111111111111111111111111111111111111111111000000000000000000000000000000000000000000
        pbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, pbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
        cbitmask before 0111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, cbitmask after 1111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

        In an Erlang shell, we can see that in the whole cluster, we only have 173 active partitions instead of 256:

        2> N206 = couch_set_view_util:decode_bitmask(2#0000000000000000000000000000000000000000000000000000000000000000000000000000001110000011111111111111111111111111111111111111111100000000000000000000000000000000000000000001111111111111111111111111111111111111111111000000000000000000000000000000000000000000).
        [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,
        61,62,63,64,65,66,67,68,69,70|...]
        3> N187 = couch_set_view_util:decode_bitmask(2#0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000).
        []
        4> N153 = couch_set_view_util:decode_bitmask(2#1111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000).
        [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,
        103,104,105,106,107,108,109,110,111,112,113|...]
        5>
        5> io:format("~w~n", [N206]).
        [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,175,176,177]
        ok
        6> io:format("~w~n", [N153]).
        [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255]
        ok
        7>
        7> O1 = ordsets:from_list(N153).
        [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,
        103,104,105,106,107,108,109,110,111,112,113|...]
        8> O2 = ordsets:union(O1, ordsets:from_list(N206)).
        [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,
        61,62,63,64,65,66,67,68,69,70|...]
        9>
        9> io:format("~p~n", [O2]).
        [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,
        67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,
        92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,
        113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,
        132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,
        151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,
        175,176,177,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,
        229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,
        248,249,250,251,252,253,254,255]
        ok
        10> length(O2).
        174

        Is it possible ns_server missed an index state update on node 187?

        Show
        filipe manana filipe manana added a comment - So greping each log for the last occurrence of "Set view `default`, group `_design/dev_one`, partition states updated", I can see that node 187 has the index without any active partitions defined. log187: [couchdb:info] [2011-12-01 18:28:54] [ns_1@10.98.186.187:<0.19609.9>:couch_log:info:39] Set view `default`, group `_design/dev_one`, partition states updated abitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, abitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 pbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000111111111111111111111111111111111111111111, pbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 cbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000, cbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000011111111111111111111111111111111111111111111000000000000000000000000000000000000000000111111111111111111111111111111111111111111 log153: [couchdb:info] [2011-12-01 18:28:54] [ns_1@10.124.193.153:<0.32606.0>:couch_log:info:39] Set view `default`, group `_design/dev_one`, partition states updated abitmask before 1111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000, abitmask after 1111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000 pbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, pbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 cbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, cbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 log206: [couchdb:info] [2011-12-01 18:28:52] [ns_1@10.90.182.206:<0.31550.0>:couch_log:info:39] Set view `default`, group `_design/dev_one`, partition states updated abitmask before 1000000000000000000000000000000000000000000000000000000000000000000000000000001110000011111111111111111111111111111111111111111100000000000000000000000000000000000000000001111111111111111111111111111111111111111111000000000000000000000000000000000000000000, abitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000001110000011111111111111111111111111111111111111111100000000000000000000000000000000000000000001111111111111111111111111111111111111111111000000000000000000000000000000000000000000 pbitmask before 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, pbitmask after 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 cbitmask before 0111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, cbitmask after 1111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 In an Erlang shell, we can see that in the whole cluster, we only have 173 active partitions instead of 256: 2> N206 = couch_set_view_util:decode_bitmask(2#0000000000000000000000000000000000000000000000000000000000000000000000000000001110000011111111111111111111111111111111111111111100000000000000000000000000000000000000000001111111111111111111111111111111111111111111000000000000000000000000000000000000000000). [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60, 61,62,63,64,65,66,67,68,69,70|...] 3> N187 = couch_set_view_util:decode_bitmask(2#0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000). [] 4> N153 = couch_set_view_util:decode_bitmask(2#1111111111111111111111111111111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000). [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102, 103,104,105,106,107,108,109,110,111,112,113|...] 5> 5> io:format("~w~n", [N206] ). [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,175,176,177] ok 6> io:format("~w~n", [N153] ). [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255] ok 7> 7> O1 = ordsets:from_list(N153). [85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102, 103,104,105,106,107,108,109,110,111,112,113|...] 8> O2 = ordsets:union(O1, ordsets:from_list(N206)). [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60, 61,62,63,64,65,66,67,68,69,70|...] 9> 9> io:format("~p~n", [O2] ). [42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66, 67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91, 92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112, 113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131, 132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150, 151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169, 175,176,177,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228, 229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247, 248,249,250,251,252,253,254,255] ok 10> length(O2). 174 Is it possible ns_server missed an index state update on node 187?

          People

          • Assignee:
            Aliaksey Artamonau Aliaksey Artamonau
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes