Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6967

dev-views index seems not to be built (at least for _count reduce) during rebalance with consistent view (1DD, 4 views) - build 1868

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta-2
    • Fix Version/s: 2.0
    • Security Level: Public
    • Environment:

      Description

      i tried the following scenario on the latest build:
      1. Create cluster of 2 nodes. load 1M items, 1K each with cbworkloadgen ./cbworkloadgen -n localhost:8091 -i10000000 -t6 -j -s1000
      2. Create 1DD, with 4 simple views, of which 2 had _count reduce function so i can count all documents on that view.
      3. I built the index and confirmed that the _count views return 1M items, i.e. index was built successfully.
      4. I added 2 more nodes to the cluster and rebalance.
      5. During the rebalance, i refreshed the _count views.
      Expected behavior (with consistent views enabled by default) is that i will get the same 1M items during the reblance
      Observed behavior: the number kept changing, and decreasing over time to 900K, 800K..

      Iryna, can you collect the neccesary logs from this cluster and post it on this bug.
      Cluster can be found at: http://184.169.209.178:8091/ (usual credentials)
      Here are all the IP addresses:
      ns_1@10.176.29.176 10.176.29.176 184.169.209.178 ns_1@10.176.9.41
      ns_1@10.176.9.41 10.176.9.41 50.18.23.114 ns_1@10.176.9.41
      ns_1@10.168.103.76 10.168.103.76 204.236.154.91 ns_1@10.176.9.41
      ns_1@10.176.145.104 10.176.145.104 54.241.117.117 ns_1@10.176.9.41

      I did a few things on this cluster like creating and removing indexes/nodes. so look at the logs at the end.

      1. 10.3.3.104-8091-diag.txt.gz
        743 kB
        Deepkaran Salooja
      2. 10.3.3.106-8091-diag.txt.gz
        505 kB
        Deepkaran Salooja
      3. 10.3.3.107-8091-diag.txt.gz
        500 kB
        Deepkaran Salooja
      4. 10.3.3.95-8091-diag.txt.gz
        994 kB
        Deepkaran Salooja
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        sharon Sharon Barr (Inactive) added a comment -

        More info:
        at the end of the rebalance, there were 630K items on this view, and it seems that the index was rebuilt until it reached the 1M number.

        it seems that the index is NOT being built during the rebalance.

        Show
        sharon Sharon Barr (Inactive) added a comment - More info: at the end of the rebalance, there were 630K items on this view, and it seems that the index was rebuilt until it reached the 1M number. it seems that the index is NOT being built during the rebalance.
        Hide
        sharon Sharon Barr (Inactive) added a comment -

        4 logs are 70MB compressed. let me know where you want them to be uploaded.
        i am continuing to mess up with this cluster.

        Show
        sharon Sharon Barr (Inactive) added a comment - 4 logs are 70MB compressed. let me know where you want them to be uploaded. i am continuing to mess up with this cluster.
        Hide
        sharon Sharon Barr (Inactive) added a comment -

        4 logs are 70MB compressed. let me know where you want them to be uploaded.
        i am continuing to mess up with this cluster.

        Show
        sharon Sharon Barr (Inactive) added a comment - 4 logs are 70MB compressed. let me know where you want them to be uploaded. i am continuing to mess up with this cluster.
        Hide
        sharon Sharon Barr (Inactive) added a comment -

        and another important information. there was NO load on the system. so the consistent view feature is not really have an impact here.

        Show
        sharon Sharon Barr (Inactive) added a comment - and another important information. there was NO load on the system. so the consistent view feature is not really have an impact here.
        Hide
        sharon Sharon Barr (Inactive) added a comment -

        Attached are logs from the new nodes that entered the cluster.
        logs from the original nodes are too large to attache.

        The accurate observation is that at least the reduce views are not being (or partially) built. the count drops during the rebalance and stay low after it's done. once i trigger the full view, it is being built up again.

        Show
        sharon Sharon Barr (Inactive) added a comment - Attached are logs from the new nodes that entered the cluster. logs from the original nodes are too large to attache. The accurate observation is that at least the reduce views are not being (or partially) built. the count drops during the rebalance and stay low after it's done. once i trigger the full view, it is being built up again.
        Hide
        sharon Sharon Barr (Inactive) added a comment -

        Attached are logs from the new nodes that entered the cluster.
        logs from the original nodes are too large to attache.

        The accurate observation is that at least the reduce views are not being (or partially) built. the count drops during the rebalance and stay low after it's done. once i trigger the full view, it is being built up again.

        Show
        sharon Sharon Barr (Inactive) added a comment - Attached are logs from the new nodes that entered the cluster. logs from the original nodes are too large to attache. The accurate observation is that at least the reduce views are not being (or partially) built. the count drops during the rebalance and stay low after it's done. once i trigger the full view, it is being built up again.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        What I see in the logs, is that the initial index build is being done during rebalance.

        During rebalance, vbucket state changes are happening all the time, which cause the updater to be stopped and restarted after each state transition. Remember that the initial index build is not resumable, so each restart means it starts from scratch.

        I don't think there's anything that can be done here. Only way I see, is if it were possible for ns_server to know what's the final state for all vbuckets in each node, and then do a single bulk state transition request to the indexes.

        This is something that was always possible since the faster initial index build method was added months ago.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - What I see in the logs, is that the initial index build is being done during rebalance. During rebalance, vbucket state changes are happening all the time, which cause the updater to be stopped and restarted after each state transition. Remember that the initial index build is not resumable, so each restart means it starts from scratch. I don't think there's anything that can be done here. Only way I see, is if it were possible for ns_server to know what's the final state for all vbuckets in each node, and then do a single bulk state transition request to the indexes. This is something that was always possible since the faster initial index build method was added months ago.
        Hide
        deepkaran.salooja Deepkaran Salooja added a comment -

        Couldn't access the cluster. Reproduced the same behavior on Centos64 with latest build(1870). Attaching the diags from 4 nodes.

        Steps followed:

        Load 500K items in default bucket
        Rebalance 1->2 nodes
        Create 1 dev view,
        map
        function (doc, meta)

        { emit(meta.id, null); }

        reduce
        _count

        Query full_set to create index
        curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0'
        {"rows":[

        {"key":null,"value":500000}

        ]
        }

        Rebalance 2->4

        During rebalance, query full_set returns < 500K in the output

        root@ubuntu1104-64:~# curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0'
        {"rows":[

        {"key":null,"value":472640}

        ]
        }
        root@ubuntu1104-64:~# curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0'
        {"rows":[

        {"key":null,"value":453125}

        ]
        }

        After rebalance & indexing is finished, full 500K are returned
        root@ubuntu1104-64:~# curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0'
        {"rows":[

        {"key":null,"value":500000}

        ]
        }

        Show
        deepkaran.salooja Deepkaran Salooja added a comment - Couldn't access the cluster. Reproduced the same behavior on Centos64 with latest build(1870). Attaching the diags from 4 nodes. Steps followed: Load 500K items in default bucket Rebalance 1->2 nodes Create 1 dev view, map function (doc, meta) { emit(meta.id, null); } reduce _count Query full_set to create index curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0' {"rows":[ {"key":null,"value":500000} ] } Rebalance 2->4 During rebalance, query full_set returns < 500K in the output root@ubuntu1104-64:~# curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0' {"rows":[ {"key":null,"value":472640} ] } root@ubuntu1104-64:~# curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0' {"rows":[ {"key":null,"value":453125} ] } After rebalance & indexing is finished, full 500K are returned root@ubuntu1104-64:~# curl -X GET 'http://10.3.3.95:8092/default/_design/dev_d1/_view/v1?full_set=true&connection_timeout=60000&limit=10&skip=0' {"rows":[ {"key":null,"value":500000} ] }
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Actually, it's not doing initial build here, but what I said before it's still valid.

        For node .117 (node 1), what I see is that the index update is being stopped and restarted every 1 or 2 seconds due to vbucket state transition requests from ns_server:

        https://friendpaste.com/4qH8PqvWO4tpGANlSxp0n3

        In each transition, only one vbucket state changes.

        I don't know if things could be better batched or not in ns_server.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Actually, it's not doing initial build here, but what I said before it's still valid. For node .117 (node 1), what I see is that the index update is being stopped and restarted every 1 or 2 seconds due to vbucket state transition requests from ns_server: https://friendpaste.com/4qH8PqvWO4tpGANlSxp0n3 In each transition, only one vbucket state changes. I don't know if things could be better batched or not in ns_server.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Deep,

        in your testing were you running load during rebalancing ?
        did you make sure index update was completed before rebalancing ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - Deep, in your testing were you running load during rebalancing ? did you make sure index update was completed before rebalancing ?
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Deep , Iryna and I had a conversation about this scenario and we will discuss with Filipe and update the ticket to understand consistent views and index updating better

        Show
        farshid Farshid Ghods (Inactive) added a comment - Deep , Iryna and I had a conversation about this scenario and we will discuss with Filipe and update the ticket to understand consistent views and index updating better
        Hide
        deepkaran.salooja Deepkaran Salooja added a comment -

        I didn't run any load during rebalancing. After rebalance is finished, I can see indexing running if I do full_set queries.

        Show
        deepkaran.salooja Deepkaran Salooja added a comment - I didn't run any load during rebalancing. After rebalance is finished, I can see indexing running if I do full_set queries.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Every 1 or 2 seconds should be impossible, Filipe. At least for main index. We are supposed to wait for index building completion for each bucket movement.

        Incoming vbucket transfers are not synchronized w.r.t. each other and it's possible to have some config changes due to that: i.e. we started moving in vbucket 0 and 1 second later we can start moving in vbucket 1, which AFAIK will cause all indexing progress to be thrown out if it's initial index building.

        But there's inherent limit of concurrent incoming vbucket movements. Constant changes should be impossible.

        Replica index is somewhat different currently. My code doesn't wait for replica indexing completion. Which I'll be happy to fix. Does couch_set_view:monitor_partition_update/3 work for replica indexes as well?

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Every 1 or 2 seconds should be impossible, Filipe. At least for main index. We are supposed to wait for index building completion for each bucket movement. Incoming vbucket transfers are not synchronized w.r.t. each other and it's possible to have some config changes due to that: i.e. we started moving in vbucket 0 and 1 second later we can start moving in vbucket 1, which AFAIK will cause all indexing progress to be thrown out if it's initial index building. But there's inherent limit of concurrent incoming vbucket movements. Constant changes should be impossible. Replica index is somewhat different currently. My code doesn't wait for replica indexing completion. Which I'll be happy to fix. Does couch_set_view:monitor_partition_update/3 work for replica indexes as well?
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Well, unless the log timestamps are not well calculated, I see at least 1 state transition per second in average:

        https://friendpaste.com/1Aoo4eeFhzLGFlpAW6lHQI

        And no, it's not possible to wait for replica indexing of particular vbuckets like it is for main index.

        I see lots (if not all, or the majority) of transitions only changing one vbucket state. For example:

        [couchdb:info,2012-10-19T0:39:28.779,ns_1@10.176.145.104:<0.20650.1>:couch_log:info:39]Set view `default`, main group `_design/dev_test`, partition states updated
        active partitions before: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510]
        active partitions after: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510]
        passive partitions before: []
        passive partitions after: [511]
        cleanup partitions before: []
        cleanup partitions after: []
        unindexable partitions: []
        replica partitions before: []
        replica partitions after: []
        replicas on transfer before: []
        replicas on transfer after: []
        pending transition before:
        active: []
        passive: []
        unindexable: []
        pending transition after:
        active: []
        passive: []
        unindexable: []

        Here it made one change only, added vbucket 511 to the passive state.

        Then 4 seconds later it does a single change, to move vbucket 511 from the passive state to the active state:

        [couchdb:info,2012-10-19T0:39:31.262,ns_1@10.176.145.104:<0.20650.1>:couch_log:info:39]Set view `default`, main group `_design/dev_test`, partition states updated
        active partitions before: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510]
        active partitions after: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511]
        passive partitions before: [511]
        passive partitions after: []
        cleanup partitions before: []
        cleanup partitions after: []
        unindexable partitions: []
        replica partitions before: []
        replica partitions after: []
        replicas on transfer before: []
        replicas on transfer after: []
        pending transition before:
        active: []
        passive: []
        unindexable: []
        pending transition after:
        active: []
        passive: []
        unindexable: []

        I understand it might not be possible for ns_server to batch things better, but this doesn't help.

        I can revert some optimizations done several months ago for query performance, which will increase the incremental indexing checkpoint frequency. While they will likely help for such small datasets (and decreasing query performance on high concurrency scenarios), for large datasets, or when there's much load on the system (many ddocs, xdcr, bucket compaction, etc etc) we'll very likely run into the same scenario as here.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Well, unless the log timestamps are not well calculated, I see at least 1 state transition per second in average: https://friendpaste.com/1Aoo4eeFhzLGFlpAW6lHQI And no, it's not possible to wait for replica indexing of particular vbuckets like it is for main index. I see lots (if not all, or the majority) of transitions only changing one vbucket state. For example: [couchdb:info,2012-10-19T0:39:28.779,ns_1@10.176.145.104:<0.20650.1>:couch_log:info:39] Set view `default`, main group `_design/dev_test`, partition states updated active partitions before: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510] active partitions after: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510] passive partitions before: [] passive partitions after: [511] cleanup partitions before: [] cleanup partitions after: [] unindexable partitions: [] replica partitions before: [] replica partitions after: [] replicas on transfer before: [] replicas on transfer after: [] pending transition before: active: [] passive: [] unindexable: [] pending transition after: active: [] passive: [] unindexable: [] Here it made one change only, added vbucket 511 to the passive state. Then 4 seconds later it does a single change, to move vbucket 511 from the passive state to the active state: [couchdb:info,2012-10-19T0:39:31.262,ns_1@10.176.145.104:<0.20650.1>:couch_log:info:39] Set view `default`, main group `_design/dev_test`, partition states updated active partitions before: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510] active partitions after: [256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511] passive partitions before: [511] passive partitions after: [] cleanup partitions before: [] cleanup partitions after: [] unindexable partitions: [] replica partitions before: [] replica partitions after: [] replicas on transfer before: [] replicas on transfer after: [] pending transition before: active: [] passive: [] unindexable: [] pending transition after: active: [] passive: [] unindexable: [] I understand it might not be possible for ns_server to batch things better, but this doesn't help. I can revert some optimizations done several months ago for query performance, which will increase the incremental indexing checkpoint frequency. While they will likely help for such small datasets (and decreasing query performance on high concurrency scenarios), for large datasets, or when there's much load on the system (many ddocs, xdcr, bucket compaction, etc etc) we'll very likely run into the same scenario as here.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        When I said impossible, I was referring to initial index building. In which case waiting for single vbucket will obviously wait for all of them.

        For some next release we'll definitely seek better ways to interact with indexes. The problem, as I pointed our earlier, is that from our high-level perspective we're not very aware of performance implication of what we do.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - When I said impossible, I was referring to initial index building. In which case waiting for single vbucket will obviously wait for all of them. For some next release we'll definitely seek better ways to interact with indexes. The problem, as I pointed our earlier, is that from our high-level perspective we're not very aware of performance implication of what we do.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited

        btw incorrect _count doesn't imply index is not being built. I recommend actually checking if results you expect are there. We have that somewhat controversial "excluding" of reduction values during non-steady-state. And I don't know if we heavily tested this path.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited btw incorrect _count doesn't imply index is not being built. I recommend actually checking if results you expect are there. We have that somewhat controversial "excluding" of reduction values during non-steady-state. And I don't know if we heavily tested this path.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        the user in this case is trying a reduce query which should return the count as expected.

        when you say we have controversial excluding of reduction values during non-steady state ? is that expected behavior in view-engine ?

        do we expect users to see the reduction to return inconsistent results but the view results alway be consistent ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - the user in this case is trying a reduce query which should return the count as expected. when you say we have controversial excluding of reduction values during non-steady state ? is that expected behavior in view-engine ? do we expect users to see the reduction to return inconsistent results but the view results alway be consistent ?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        We expect it to be consistent. When I said 'somewhat controversial' I was pointing out that it actually has to do quite a bit of work to return reduction value in non-steady state, compared to steady state. And I was trying to say I have no idea if we used to test this path a lot.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - We expect it to be consistent. When I said 'somewhat controversial' I was pointing out that it actually has to do quite a bit of work to return reduction value in non-steady state, compared to steady state. And I was trying to say I have no idea if we used to test this path a lot.
        Hide
        iryna iryna added a comment -

        Please take into account that the issue can be reproduced only with development views, production views are working ok.

        Show
        iryna iryna added a comment - Please take into account that the issue can be reproduced only with development views, production views are working ok.
        Hide
        damien damien added a comment -

        Apparently this only happens for development views, which are not supposed to be accurate across a cluster and rebalances.

        Show
        damien damien added a comment - Apparently this only happens for development views, which are not supposed to be accurate across a cluster and rebalances.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        looked at diags posted by Sharon and confirmed they are also on dev views

        {design_documents,[<<"_design/dev_test">>]},
        {indexer_type,main},
        {set,<<"default">>},
        {signature,<<"701550d970bd93f8c7c9736d27d90b51">>},
        {started_on,1350606438},
        {type,blocked_indexer},
        {updated_on,1350606438}],
        [{pid,<<"<0.4324.6>">>},
        {design_documents,[<<"_design/dev_test">>]}

        ,

        {indexer_type,main}

        ,

        {set,<<"default">>}

        ,

        {signature,<<"701550d970bd93f8c7c9736d27d90b51">>}

        ,

        {started_on,1350606564}

        ,

        {type,blocked_indexer}

        ,

        {updated_on,1350606564}

        ],
        [

        {pid,<<"<0.18271.6>">>}

        ,

        {design_documents,[<<"_design/dev_test">>]}

        ,

        Show
        farshid Farshid Ghods (Inactive) added a comment - looked at diags posted by Sharon and confirmed they are also on dev views {design_documents,[<<"_design/dev_test">>]}, {indexer_type,main}, {set,<<"default">>}, {signature,<<"701550d970bd93f8c7c9736d27d90b51">>}, {started_on,1350606438}, {type,blocked_indexer}, {updated_on,1350606438}], [{pid,<<"<0.4324.6>">>}, {design_documents,[<<"_design/dev_test">>]} , {indexer_type,main} , {set,<<"default">>} , {signature,<<"701550d970bd93f8c7c9736d27d90b51">>} , {started_on,1350606564} , {type,blocked_indexer} , {updated_on,1350606564} ], [ {pid,<<"<0.18271.6>">>} , {design_documents,[<<"_design/dev_test">>]} ,
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        im not quite sure why this only happens on dev views and not on production views though.

        Show
        farshid Farshid Ghods (Inactive) added a comment - im not quite sure why this only happens on dev views and not on production views though.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        if this happens for dev views we should add this to release notes that user might get incnsistent results during rebalancing with development views

        Show
        farshid Farshid Ghods (Inactive) added a comment - if this happens for dev views we should add this to release notes that user might get incnsistent results during rebalancing with development views
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        From the view engine perspective, there's no distinction (or concept) about development and production views. They're all treated the same way.
        ns_server on the other hand gives different treatment to them, like for example not triggering index updates for development views.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - From the view engine perspective, there's no distinction (or concept) about development and production views. They're all treated the same way. ns_server on the other hand gives different treatment to them, like for example not triggering index updates for development views.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Alk,

        form ns_server point of view w.r.t rebalancing and consistent views do we treat dev views any differently

        Show
        farshid Farshid Ghods (Inactive) added a comment - Alk, form ns_server point of view w.r.t rebalancing and consistent views do we treat dev views any differently
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Farshid, I could not understand you question completely.

        Pure dev views are not at all affected during rebalance. After all they only cover single vbucket. Even if this vbucket is moved I believe it'll safely use other one.

        If we're speaking about dev views will full_set option set to true, they are no different than any production views.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Farshid, I could not understand you question completely. Pure dev views are not at all affected during rebalance. After all they only cover single vbucket. Even if this vbucket is moved I believe it'll safely use other one. If we're speaking about dev views will full_set option set to true, they are no different than any production views.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        according to the bug description we are getting inconsistent results during rebalancing but does not happen for production views.

        the tester also waits until index is built before running the rebalance and full_Set is used for all view queries.
        if there is no difference between dev views and productions views during rebalancing then we should not be seeing different behavior.

        Show
        farshid Farshid Ghods (Inactive) added a comment - according to the bug description we are getting inconsistent results during rebalancing but does not happen for production views. the tester also waits until index is built before running the rebalance and full_Set is used for all view queries. if there is no difference between dev views and productions views during rebalancing then we should not be seeing different behavior.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Well, there is is no difference with dev with full_set and production views. Thus it appears that this bug is genuine and needs to be fixed.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Well, there is is no difference with dev with full_set and production views. Thus it appears that this bug is genuine and needs to be fixed.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        ok thanks.

        Show
        farshid Farshid Ghods (Inactive) added a comment - ok thanks.
        Hide
        yaseen Rahim Yaseen (Inactive) added a comment -

        Alk, if you think this is a genuine bug, can you please review and do some triage - let me know what we should do with this bug?

        Show
        yaseen Rahim Yaseen (Inactive) added a comment - Alk, if you think this is a genuine bug, can you please review and do some triage - let me know what we should do with this bug?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Ok. I was somewhat wrong. Here's full explanation.

        As per Frank's original idea, development design docs are supposed to be used on subset of data. People are supposed to play with map/reduce functions without throwing entire cluster's power on building full index. When that development process is done, people can try it own on full subset few times. That would build full index. We expect that once this is done, people are satisfied and will publish dev_ design document to production. Our (and stock) indexes have a nice feature where same ddocs are represented by same one index file. So work spent on building full index will not be lost as production design doc will be 100% same as just indexed in full mode dev_ design doc.

        Key point is, system explicitly avoids automagically building or updating that full index underneath dev_ design docs. Because that would waste cluster's resources without explicit human request for that.

        Thus indeed as part of rebalance we trigger and wait for index updates for all non-dev design docs. Explicitly skipping development ddocs. Even then user should still be able to see consistent result with stale=false queries, which would force index update before returning results.

        So indeed not a bug and perhaps worth documenting.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Ok. I was somewhat wrong. Here's full explanation. As per Frank's original idea, development design docs are supposed to be used on subset of data. People are supposed to play with map/reduce functions without throwing entire cluster's power on building full index. When that development process is done, people can try it own on full subset few times. That would build full index. We expect that once this is done, people are satisfied and will publish dev_ design document to production. Our (and stock) indexes have a nice feature where same ddocs are represented by same one index file. So work spent on building full index will not be lost as production design doc will be 100% same as just indexed in full mode dev_ design doc. Key point is, system explicitly avoids automagically building or updating that full index underneath dev_ design docs. Because that would waste cluster's resources without explicit human request for that. Thus indeed as part of rebalance we trigger and wait for index updates for all non-dev design docs. Explicitly skipping development ddocs. Even then user should still be able to see consistent result with stale=false queries, which would force index update before returning results. So indeed not a bug and perhaps worth documenting.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        See above

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - See above
        Hide
        steve Steve Yen added a comment -

        Thanks Aliaksey.

        Based on Aliaksey's explanation, marking this non-blocker and for MC to document this non-obvious but "working as designed" behavior for dev-views.

        Show
        steve Steve Yen added a comment - Thanks Aliaksey. Based on Aliaksey's explanation, marking this non-blocker and for MC to document this non-obvious but "working as designed" behavior for dev-views.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        please note that this behavior is only expected for development views ( full_set = True and partial set ) . It does not apply for production views.

        Show
        farshid Farshid Ghods (Inactive) added a comment - please note that this behavior is only expected for development views ( full_set = True and partial set ) . It does not apply for production views.
        Hide
        mccouch MC Brown (Inactive) added a comment -

        Documentation has been updated with this information in both the view types, and auto-update sections of the manual.

        Show
        mccouch MC Brown (Inactive) added a comment - Documentation has been updated with this information in both the view types, and auto-update sections of the manual.
        Hide
        kzeller kzeller added a comment -

        Added to RN as:

        "If you are using development views, be aware you may see
        inconsistent results if you query a development view during rebalance. For production views,
        you are able to query during rebalance and get results consistent with those you would have recieved
        if no rebalance were occurring."

        Show
        kzeller kzeller added a comment - Added to RN as: "If you are using development views, be aware you may see inconsistent results if you query a development view during rebalance. For production views, you are able to query during rebalance and get results consistent with those you would have recieved if no rebalance were occurring."

          People

          • Assignee:
            mccouch MC Brown (Inactive)
            Reporter:
            sharon Sharon Barr (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes