Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11351

ns_server's ns_heart and janitor_agent may get totally stuck if some upr stuff inside ep-engine gets stuck

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 3.0
    • 3.0
    • ns_server
    • Security Level: Public
    • None
    • Untriaged
    • No

    Description

      SUBJ.

      See MB-11349 where I see replication manager being stuck here:

      {<17750.1765.0>,
      [

      {registered_name,'replication_manager-saslbucket'}

      ,

      {status,waiting}

      ,
      {initial_call,{proc_lib,init_p,5}},
      {backtrace,[<<"Program counter: 0x00007fb96712ff00 (gen:do_call/4 + 392)">>,
      <<"CP: 0x0000000000000000 (invalid)">>,<<"arity = 0">>,
      <<>>,
      <<"0x00007fb901e0c498 Return addr 0x00007fb9638ced60 (gen_server:call/3 + 128)">>,
      <<"y(0) #Ref<0.0.189.58898>">>,<<"y(1) infinity">>,
      <<"y(2)

      {setup_replication,[566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583]}">>,
      <<"y(3) '$gen_call'">>,<<"y(4) <0.5452.176>">>,
      <<"y(5) []">>,<<>>,
      <<"0x00007fb901e0c4d0 Return addr 0x00007fb91a7616f0 (upr_sup:'set_desired_replications/2-lc$^3/1-3'/2 + 184)">>,
      <<"y(0) infinity">>,
      <<"y(1) {setup_replication,[566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583]}

      ">>,
      <<"y(2) 'upr_replicator-saslbucket-ns_1@172.23.105.49'">>,
      <<"y(3) Catch 0x00007fb9638ced60 (gen_server:call/3 + 128)">>,
      <<>>,
      <<"0x00007fb901e0c4f8 Return addr 0x00007fb91a761750 (upr_sup:'set_desired_replications/2-lc$^3/1-3'/2 + 280)">>,
      <<"y(0) \"saslbucket\"">>,
      <<"(1) [

      {'ns_1@172.23.105.50',[694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711]}

      ,{'ns_1@172.23.">>,
      <<>>,
      <<"0x00007fb901e0c510 Return addr 0x00007fb91a761750 (upr_sup:'set_desired_replications/2-lc$^3/1-3'/2 + 280)">>,
      <<"(0) {errors,[

      {34,311}

      ,

      {34,310}

      ,

      {34,309}

      ,

      {34,308}

      ,

      {34,305}

      ,

      {34,304}

      ,

      {34,306}

      ,

      {34,307}

      ,

      {34,301}

      ,

      {34,303}

      ,

      {34,300}

      ,{34,">>,
      <<>>,
      <<"0x00007fb901e0c520 Return addr 0x00007fb91a761750 (upr_sup:'set_desired_replications/2-lc$^3/1-3'/2 + 280)">>,
      <<"y(0) ok">>,<<>>,
      <<"0x00007fb901e0c530 Return addr 0x00007fb91a798720 (replication_manager:handle_call/3 + 1104)">>,
      <<"(0) {errors,[

      {34,55}

      ,

      {34,53}

      ,

      {34,54}

      ,

      {34,51}

      ,

      {34,52}

      ,

      {34,49}

      ,

      {34,50}

      ,

      {34,45}

      ,

      {34,47}

      ,

      {34,46}

      ,

      {34,43}

      ,

      {34,44}

      ,

      {34,42}

      ">>,
      <<>>,
      <<"0x00007fb901e0c540 Return addr 0x00007fb9638d3558 (gen_server:handle_msg/5 + 272)">>,
      <<"(0) {state,\"saslbucket\",upr,undefined,[

      {'ns_1@172.23.105.44',\"&'()*+,-./01234567\"},{'ns_1@172.23.105.45',\"¥¦§¨©ª«¬­®">>,
      <<"(1) [{'ns_1@172.23.105.44',"&'()*+,-./01234567"}

      ,

      {'ns_1@172.23.105.45',\"¥¦§¨©ª«¬­®¯°±²³´µ¶·\"}

      ,{'ns_1@172.23.105.47',">>,
      <<"y(2) []">>,<<>>,
      <<"0x00007fb901e0c560 Return addr 0x00007fb9671339f8 (proc_lib:init_p_do_apply/3 + 56)">>,
      <<"y(0) replication_manager">>,
      <<"(1) {state,\"saslbucket\",upr,undefined,[

      {'ns_1@172.23.105.44',\"&'()*+,-./01234567\"},{'ns_1@172.23.105.45',\"¥¦§¨©ª«¬­®">>,
      <<"y(2) 'replication_manager-saslbucket'">>,
      <<"y(3) <0.1735.0>">>,
      <<"(4) {remove_undesired_replications,[{'ns_1@172.23.105.44',"&'()*+,-./01234567"}

      ,{'ns_1@172.23.105.45',\"¥¦§¨©ª«¬­®¯°±">>,
      <<"y(5)

      {<0.1786.0>,#Ref<0.0.189.58522>}

      ">>,
      <<"y(6) Catch 0x00007fb9638d3558 (gen_server:handle_msg/5 + 272)">>,
      <<>>,
      <<"0x00007fb901e0c5a0 Return addr 0x000000000086aff8 (<terminate process normally>)">>,
      <<"y(0) Catch 0x00007fb967133a18 (proc_lib:init_p_do_apply/3 + 88)">>,
      <<>>]},

      {error_handler,error_handler}

      ,
      {garbage_collection,[

      {min_bin_vheap_size,46422}

      ,

      {min_heap_size,233}

      ,

      {fullsweep_after,512}

      ,

      {minor_gcs,1}

      ]},

      That prevents ns_heart from working. As well as janitor_agent (which may be ok).

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              parag Parag Agarwal (Inactive)
              alkondratenko Aleksey Kondratenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty