Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0-developer-preview-4
    • Fix Version/s: 2.0-developer-preview-4
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None

      Description

      Created 10 node cluster. Created a view {"reduce":{"map":"function (doc)

      {\n emit(doc._id, null);\n}

      ","reduce":"_count"} and uploaded 100k json items using mcsoda. Queried the view with stale=false. Result was correct. Started removing nodes one by one from a cluster while running view queries. After second node was removed the view started returning more than 100k items. I figured out that all duplicated rows come from a single node. And on this node all the duplicated rows come from three vbuckets: 215, 216, 217. There was a period of time when these vbuckets were reported by set views both as passive and replicas:

      Set view `default`, main group `_design/dev_test`, partition states updated
      active partitions before: [73,74,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,101,102,103,240,241,242]
      active partitions after: [73,74,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,101,102,103,240,241,242]
      passive partitions before: [215,216,217]
      passive partitions after: [215,216,217]
      cleanup partitions before: []
      cleanup partitions after: []
      replica partitions before: [6,7,8,32,33,34,58,59,60,113,114,115,127,139,140,141,155,164,165,188,189,190,208,211,214,215,216,217,233,236,239,244,249]
      replica partitions after: [6,7,8,32,33,34,58,59,60,113,114,115,127,139,140,141,155,164,165,188,189,190,208,211,214,215,216,217,233,236,239,244,249]
      replicas on transfer before: [215,216,217]
      replicas on transfer after: [215,216,217]

      Sequence of calls that was performed by ns_server seems to be correct. I'm attaching full logs and diag from this node.

      1. add.py
        0.2 kB
        damien
      2. del.py
        0.1 kB
        damien
      3. incorrect_results.tar.bz2
        13.64 MB
        Aliaksey Artamonau
      4. logs.tar.bz2
        380 kB
        Aliaksey Artamonau
      5. ns-diag-20120124155027.txt.bz2
        792 kB
        Aliaksey Artamonau
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        karan Karan Kumar (Inactive) added a comment -

        Confirmed that test_count_reduce_x_docs passes.

        Show
        karan Karan Kumar (Inactive) added a comment - Confirmed that test_count_reduce_x_docs passes.
        Hide
        steve Steve Yen added a comment -

        need repro?

        Show
        steve Steve Yen added a comment - need repro?
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        Results where permanently inconsistent after rebalancing out several nodes. All the nodes where build with the commit you're referring.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - Results where permanently inconsistent after rebalancing out several nodes. All the nodes where build with the commit you're referring.
        Hide
        filipe manana filipe manana added a comment -

        @Aliaksey

        Need more info on how to reproduce this. Are the query results inconsistent during failover or rebalance (or both)? Are they temporary (only during rebalance or failover) or permanent?

        Please make sure all your nodes have the following couchdb commit:
        https://github.com/couchbase/couchdb/commit/43c6b744c8a110c5a1f6f9a2039fcc405cbff1a9

        @Farshid

        Farshid, I ran that test locally, sometimes fails for me too.
        One thing I notice is that the test's queries don't specify ?stale=false. I think this is what making the test fail often.
        I changed locally the test viewtests.ViewTests.test_count_reduce_100k_docs to add stale=false to all queries, and like this the test passes always for me:

        http://friendpaste.com/5OUPCfOUHxEG4HBB0qU7r9

        Can you verify that?

        Show
        filipe manana filipe manana added a comment - @Aliaksey Need more info on how to reproduce this. Are the query results inconsistent during failover or rebalance (or both)? Are they temporary (only during rebalance or failover) or permanent? Please make sure all your nodes have the following couchdb commit: https://github.com/couchbase/couchdb/commit/43c6b744c8a110c5a1f6f9a2039fcc405cbff1a9 @Farshid Farshid, I ran that test locally, sometimes fails for me too. One thing I notice is that the test's queries don't specify ?stale=false. I think this is what making the test fail often. I changed locally the test viewtests.ViewTests.test_count_reduce_100k_docs to add stale=false to all queries, and like this the test passes always for me: http://friendpaste.com/5OUPCfOUHxEG4HBB0qU7r9 Can you verify that?
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        ./testrunner -i b/resources/dev-4-nodes.ini -t viewtests.ViewTests.test_count_reduce_100k_docs

        it happens even with a single node but less frequeent than before

        Show
        farshid Farshid Ghods (Inactive) added a comment - ./testrunner -i b/resources/dev-4-nodes.ini -t viewtests.ViewTests.test_count_reduce_100k_docs it happens even with a single node but less frequeent than before

          People

          • Assignee:
            karan Karan Kumar (Inactive)
            Reporter:
            Aliaksey Artamonau Aliaksey Artamonau
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes