Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7270

Rebalance constantly exited with reason bad_replicas after rebalance with wamup node( Bad replicators after rebalance: Missing = [{'ns_1@10.3.121.113','ns_1@10.3.121.112',205},,, )

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Incomplete
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None

      Description

      build 1966
      steps:
      1. 5 nodes(ns_1@10.3.121.116','ns_1@10.3.121.114', 'ns_1@10.3.121.115','ns_1@10.3.121.112', 'ns_1@10.3.121.113), 10 bucket*100K
      2. swap rebalance: add 'ns_1@10.3.121.117' , remove ns_1@10.3.121.116'
      3. reboot 10.3.121.113 and add 10.3.121.116 to nodes_wanted
      4. wait while warmup is completed on 10.3.121.113 and start rebalance
      result: Rebalance exited with reason

      {not_all_nodes_are_ready_yet,['ns_1@10.3.121.112','ns_1@10.3.121.114','ns_1@10.3.121.115','ns_1@10.3.121.117']}

      as in MB-7168 Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node
      5. remove 10.3.121.116 node and restart rebalance

      result: attempt to rebalance every time suffers failure

      Rebalance exited with reason bad_replicas
      ns_orchestrator002 ns_1@10.3.121.112 21:02:47 - Tue Nov 27, 2012

      Bad replicators after rebalance:
      Missing = [

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',205}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',206}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',207}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',208}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',209}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',210}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',211}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',212}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',213}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',214}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',215}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',216}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',217}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',218}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',219}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',220}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',221}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',222}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',223}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',224}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',225}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',226}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',227}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',228}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',229}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',230}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',231}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',232}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',233}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',234}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',235}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',236}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',237}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',238}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',239}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',240}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',241}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',242}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',243}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',244}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',245}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',246}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',247}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',248}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',249}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',250}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',251}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',252}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',253}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',254}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',255}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.112',256}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',257}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',258}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',259}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',260}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',261}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',262}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',263}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',264}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',265}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',266}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',267}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',268}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',269}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',270}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',271}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',272}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',273}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',274}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',275}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',276}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',277}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',278}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',279}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',280}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',281}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',282}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',283}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',284}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',285}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',286}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',287}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',288}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',289}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',290}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',291}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',292}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',293}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',294}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',295}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',296}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',297}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',298}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',299}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',300}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',301}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',302}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',303}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',304}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',305}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',306}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.114',307}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',308}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',309}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',310}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',311}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',312}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',313}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',314}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',315}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',316}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',317}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',318}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',319}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',320}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',321}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',322}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',323}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',324}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',325}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',326}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',327}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',328}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',329}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',330}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',331}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',332}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',333}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',334}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',335}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',336}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',337}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',338}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',339}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',340}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',341}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',342}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',343}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',344}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',345}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',346}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',347}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',348}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',349}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',350}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',351}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',352}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',353}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',354}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',355}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',356}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',357}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.115',358}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',359}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',360}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',361}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',362}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',363}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',364}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',365}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',366}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',367}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',368}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',369}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',370}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',371}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',372}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',373}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',374}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',375}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',376}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',377}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',378}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',379}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',380}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',381}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',382}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',383}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',384}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',385}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',386}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',387}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',388}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',389}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',390}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',391}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',392}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',393}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',394}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',395}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',396}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',397}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',398}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',399}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',400}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',401}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',402}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',403}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',404}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',405}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',406}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',407}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',408}

      ,

      {'ns_1@10.3.121.113','ns_1@10.3.121.117',409}

      ]
      Extras = []

      1. 10.3.121.112-8091-diag.txt.gz
        15.55 MB
        Andrei Baranouski
      2. 10.3.121.114-8091-diag.txt.gz
        12.33 MB
        Andrei Baranouski
      3. 10.3.121.115-8091-diag.txt.gz
        12.38 MB
        Andrei Baranouski
      4. 10.3.121.116-8091-diag.txt.gz
        12.54 MB
        Andrei Baranouski
      5. 10.3.121.117-8091-diag.txt.gz
        13.18 MB
        Andrei Baranouski
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I'd really like to have logs from .113. Have node idea why everything else is attached but obviously affected node is not.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I'd really like to have logs from .113. Have node idea why everything else is attached but obviously affected node is not.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I don't have logs from .113 but it appears that memcached port is firewalled. Because:

        [error_logger:error,2012-11-27T10:08:12.931,ns_1@10.3.121.112:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
        =========================CRASH REPORT=========================
        crasher:
        initial call: ebucketmigrator_srv:init/1
        pid: <0.14856.18>
        registered_name: []
        exception error: no match of right hand side value

        {error,ehostunreach}

        in function ebucketmigrator_srv:connect/4
        in call from ebucketmigrator_srv:init/1
        ancestors: ['ns_vbm_new_sup-standard_bucket4',
        'single_bucket_sup-standard_bucket4',<0.28058.0>]
        messages: []
        links: [<0.28110.0>]
        dictionary: []
        trap_exit: false
        status: running
        heap_size: 610
        stack_size: 24
        reductions: 563
        neighbours:

        That causes replication from .113 to not work. And as in past we continue ignoring replicator failures. Because first bucket is balanced there's no actual moves and we don't touch memcached on .113 and thus we fail in such somewhat unusual way because even if no moves are done we still check replicators and naturally they are not established.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I don't have logs from .113 but it appears that memcached port is firewalled. Because: [error_logger:error,2012-11-27T10:08:12.931,ns_1@10.3.121.112:error_logger<0.5.0>:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ebucketmigrator_srv:init/1 pid: <0.14856.18> registered_name: [] exception error: no match of right hand side value {error,ehostunreach} in function ebucketmigrator_srv:connect/4 in call from ebucketmigrator_srv:init/1 ancestors: ['ns_vbm_new_sup-standard_bucket4', 'single_bucket_sup-standard_bucket4',<0.28058.0>] messages: [] links: [<0.28110.0>] dictionary: [] trap_exit: false status: running heap_size: 610 stack_size: 24 reductions: 563 neighbours: That causes replication from .113 to not work. And as in past we continue ignoring replicator failures. Because first bucket is balanced there's no actual moves and we don't touch memcached on .113 and thus we fail in such somewhat unusual way because even if no moves are done we still check replicators and naturally they are not established.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        See above. Reopen and reassign if you disagree for any reason

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - See above. Reopen and reassign if you disagree for any reason
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        Yes, I probably forgot to turn off the firewall after a reboot. Sorry about that. I tried the same scenario. Repeating rebalance passed.
        But it would be nice to handle these situations and inform the user more informative about the real problem. Unnecessarily in this case, warmup was completed on node and node was in pending state( not down).
        So the user can not understand why node is in pending state, although neither logs nor the stats on this node are not informed about it. please close the bug after saying your opinion on this.

        Show
        andreibaranouski Andrei Baranouski added a comment - Yes, I probably forgot to turn off the firewall after a reboot. Sorry about that. I tried the same scenario. Repeating rebalance passed. But it would be nice to handle these situations and inform the user more informative about the real problem. Unnecessarily in this case, warmup was completed on node and node was in pending state( not down). So the user can not understand why node is in pending state, although neither logs nor the stats on this node are not informed about it. please close the bug after saying your opinion on this.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Maybe we'll do something like that in future. It was raised few times already. But I'm still not convinced because there are zillions of ways to mis-configure environment.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Maybe we'll do something like that in future. It was raised few times already. But I'm still not convinced because there are zillions of ways to mis-configure environment.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes