Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44078

[Collections] : State::highCompletedSeqno) invariant failed: new value (2625) breaks invariant on current value

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_recovery,nodes_init=5,nodes_failover=1,recovery_type=full,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud_durability_MAJORITY'
      

      Steps to Repro
      1 ) Create a 5 node cluster
      2021-02-02 11:03:09,250 | test | INFO | pool-13-thread-6 | [table_view:display:72] Rebalance Overview
      ----------------------++-------------

      Nodes Services Status

      ----------------------++-------------

      172.23.107.94 kv Cluster node
      172.23.107.95 None <--- IN —
      172.23.107.97 None <--- IN —
      172.23.107.98 None <--- IN —
      172.23.107.99 None <--- IN —

      ----------------------++-------------

      2) Create buckets/scope/collections/data
      2021-02-02 11:08:40,836 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      -----------------+--------------------------------------------------+----------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -----------------+--------------------------------------------------+----------

      bucket1 couchbase 2 none 0 6000 524288000 17905625 576815208
      bucket2 ephemeral 2 none 0 6000 524288000 394912810 170
      default couchbase 2 none 0 10000 524288000 21256007 577933078

      -----------------+--------------------------------------------------+----------

      3) Start graceful failover on 172.23.107.99 and do a full recovery

      2021-02-02 11:11:26,023 | test  | INFO    | MainThread | [collections_rebalance:wait_for_failover_or_assert:213] 1 nodes failed over as expected in 0.0309998989105 seconds
      2021-02-02 11:11:57,217 | test  | WARNING | MainThread | [rest_client:get_nodes:1710] 172.23.107.99 - Node not part of cluster inactiveFailed
      

      4) Start rebalance and it fails as shown below

      2021-02-02 11:14:02,581 | test  | ERROR   | pool-13-thread-15 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.107.94', u'tstamp': 1612293239898L, u'shortText': u'message', u'serverTime': u'2021-02-02T11:13:59.898Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.17273.10>,\n                                {{wait_seqno_persisted_failed,"bucket2",407,\n                                  844,\n                                  [{\'ns_1@172.23.107.99\',\n                                    {\'EXIT\',\n                                     {{{badmatch,{error,closed}},\n                                       [{mc_client_binary,cmd_vocal_recv,5,\n                                         [{file,"src/mc_client_binary.erl"},\n                                          {line,158}]},\n                                        {mc_client_binary,\n                                         wait_for_seqno_persistence,3,\n                                         [{file,"src/mc_client_binary.erl"},\n                                          {line,763}]},\n                                        {ns_memcached,\n                                         \'-wait_for_seqno_persistence/3-fun-0-\',\n                                         3,\n                                         [{file,"src/ns_memcached.erl"},\n                                          {line,1391}]},\n                                        {ns_memcached,\n                                         \'-perform_very_long_call/3-fun-0-\',2,\n                                         [{file,"src/ns_memcached.erl"},\n                                          {line,324}]},\n                                        {ns_memcached_sockets_pool,\n                                         \'-executing_on_socket/3-fun-0-\',3,\n                                         [{file,\n                                           "src/ns_memcached_sockets_pool.erl"},\n                                          {line,92}]},\n                                        {async,\'-async_init/4-fun-1-\',3,\n                                         [{file,"src/async.erl"},\n                                          {line,197}]}]},\n                                      {gen_server,call,\n                                       [{\'janitor_agent-bucket2\',\n                                         \'ns_1@172.23.107.99\'},\n                                        {if_rebalance,<0.10836.10>,\n                                         {wait_seqno_persisted,407,844}},\n                                        infinity]}}}}]},\n                                 [{ns_single_vbucket_mover,\n                                   \'-wait_seqno_persisted_many/5-fun-2-\',5,\n                                   [{file,"src/ns_single_vbucket_mover.erl"},\n                                    {line,488}]},\n                                  {proc_lib,init_p,3,\n                                   [{file,"proc_lib.erl"},{line,234}]}]}}}}.\nRebalance Operation Id = 4552e1ed4b74f427f47c1a4b524157c5'}
      2021-02-02 11:14:02,581 | test  | ERROR   | pool-13-thread-15 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.107.99', u'tstamp': 1612293239838L, u'shortText': u'message', u'serverTime': u'2021-02-02T11:13:59.838Z', u'text': u"Control connection to memcached on 'ns_1@172.23.107.99' disconnected. Check logs for details."}
      2021-02-02 11:14:02,582 | test  | ERROR   | pool-13-thread-15 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.107.99', u'tstamp': 1612293239833L, u'shortText': u'message', u'serverTime': u'2021-02-02T11:13:59.833Z', u'text': u"Service 'memcached' exited with status 134. Restarting. Messages:\n2021-02-02T11:13:59.495768-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so(_ZN5folly6detail8function14FunctionTraitsIFvvEE7callBigIZNS_21PriorityThreadFactory9newThreadEONS_8FunctionIS3_EEEUlvE_EEvRNS1_4DataE+0x109) [0x7f34b9e78000+0x2e1ac59]\n2021-02-02T11:13:59.495835-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5folly18NamedThreadFactory9newThreadEONS3_8FunctionIFvvEEEEUlvE_EEEEE6_M_runEv+0x154) [0x7f34b9e78000+0x2e15704]\n2021-02-02T11:13:59.495873-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f34b3c3b000+0xb9dcf]\n2021-02-02T11:13:59.495884-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f34b27e8000+0x7ea5]\n2021-02-02T11:13:59.495917-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f34b241a000+0xfe8dd]\n[*** LOG ERROR ***] [2021-02-02 11:13:59] [spdlog_file_logger] async log: thread pool doesn't exist anymore\n/home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/logger/custom_rotating_file_sink.cc:84:22: runtime error: member call on null pointer of type 'struct pattern_formatter'"}
       
      2021-02-02 11:14:02,582 | test  | ERROR   | pool-13-thread-15 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.107.94', u'tstamp': 1612293239833L, u'shortText': u'message', u'serverTime': u'2021-02-02T11:13:59.833Z', u'text': u'Worker <0.17169.10> (for action {move,{407,\n                                       [\'ns_1@172.23.107.95\',\n                                        \'ns_1@172.23.107.98\',undefined],\n                                       [\'ns_1@172.23.107.95\',\n                                        \'ns_1@172.23.107.99\',\n                                        \'ns_1@172.23.107.98\'],\n                                       []}}) exited with reason {unexpected_exit,\n                                                                 {\'EXIT\',\n                                                                  <0.17273.10>,\n                                                                  {{wait_seqno_persisted_failed,\n                                                                    "bucket2",\n                                                                    407,844,\n                                                                    [{\'ns_1@172.23.107.99\',\n                                                                      {\'EXIT\',\n                                                                       {{{badmatch,\n                                                                          {error,\n                                                                           closed}},\n                                                                         [{mc_client_binary,\n                                                                           cmd_vocal_recv,\n                                                                           5,\n                                                                           [{file,\n                                                                             "src/mc_client_binary.erl"},\n                                                                            {line,\n                                                                             158}]},\n                                                                          {mc_client_binary,\n                                                                           wait_for_seqno_persistence,\n                                                                           3,\n                                                                           [{file,\n                                                                             "src/mc_client_binary.erl"},\n                                                                            {line,\n                                                                             763}]},\n                                                                          {ns_memcached,\n                                                                           \'-wait_for_seqno_persistence/3-fun-0-\',\n                                                                           3,\n                                                                           [{file,\n                                                                             "src/ns_memcached.erl"},\n                                                                            {line,\n                                                                             1391}]},\n                                                                          {ns_memcached,\n                                                                           \'-perform_very_long_call/3-fun-0-\',\n                                                                           2,\n                                                                           [{file,\n                                                                             "src/ns_memcached.erl"},\n                                                                            {line,\n                                                                             324}]},\n                                                                          {ns_memcached_sockets_pool,\n                                                                           \'-executing_on_socket/3-fun-0-\',\n                                                                           3,\n                                                                           [{file,\n                                                                             "src/ns_memcached_sockets_pool.erl"},\n                                                                            {line,\n                                                                             92}]},\n                                                                          {async,\n                                                                           \'-async_init/4-fun-1-\',\n                                                                           3,\n                                                                           [{file,\n                                                                             "src/async.erl"},\n                                                                            {line,\n                                                                             197}]}]},\n                                                                        {gen_server,\n                                                                         call,\n                                                                         [{\'janitor_agent-bucket2\',\n                                                                           \'ns_1@172.23.107.99\'},\n                                                                          {if_rebalance,\n                                                                           <0.10836.10>,\n                                                                           {wait_seqno_persisted,\n                                                                            407,\n                                                                            844}},\n                                                                          infinity]}}}}]},\n                                                                   [{ns_single_vbucket_mover,\n                                                                     \'-wait_seqno_persisted_many/5-fun-2-\',\n                                                                     5,\n                                                                     [{file,\n                                                                       "src/ns_single_vbucket_mover.erl"},\n                                                                      {line,\n                                                                       488}]},\n                                                                    {proc_lib,\n                                                                     init_p,3,\n                                                                     [{file,\n                                                                       "proc_lib.erl"},\n                                                                      {line,\n                                                                       234}]}]}}}'}
      

      Was trying to repro MB-43923 when I ran into this. At this point we see c643ed28-c8fb-42ab-fa658992-797da005.dmp on 172.23.107.99. We did not see this on 7.0.0-4325.

      grep CRITICAL on 172.23.107.99

      memcached.log:2021-02-02T11:13:59.141661-08:00 CRITICAL *** Fatal error encountered during exception handling ***
      memcached.log:2021-02-02T11:13:59.141715-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): std::exception
      memcached.log:2021-02-02T11:13:59.494742-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4356). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/c643ed28-c8fb-42ab-fa658992-797da005.dmp before terminating.
      memcached.log:2021-02-02T11:13:59.494794-08:00 CRITICAL Stack backtrace of crashed thread:
      memcached.log:2021-02-02T11:13:59.495038-08:00 CRITICAL     /opt/couchbase/bin/../lib/libdefault_engine.so(print_backtrace+0xd1) [0x7f34c0ac9000+0x1f4d41]
      memcached.log:2021-02-02T11:13:59.495096-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x852cdb]
      memcached.log:2021-02-02T11:13:59.495137-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x863a1a]
      memcached.log:2021-02-02T11:13:59.495170-08:00 CRITICAL     /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x863d58]
      memcached.log:2021-02-02T11:13:59.495185-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f34b27e8000+0xf630]
      memcached.log:2021-02-02T11:13:59.495213-08:00 CRITICAL     /lib64/libc.so.6(gsignal+0x37) [0x7f34b241a000+0x36387]
      memcached.log:2021-02-02T11:13:59.495244-08:00 CRITICAL     /lib64/libc.so.6(abort+0x148) [0x7f34b241a000+0x37a78]
      memcached.log:2021-02-02T11:13:59.495278-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f34b3c3b000+0x91195]
      memcached.log:2021-02-02T11:13:59.495311-08:00 CRITICAL     /opt/couchbase/bin/memcached() [0x400000+0x86104e]
      memcached.log:2021-02-02T11:13:59.495329-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f34b3c3b000+0x8ef86]
      memcached.log:2021-02-02T11:13:59.495348-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f34b3c3b000+0x8efd1]
      memcached.log:2021-02-02T11:13:59.495418-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so(_ZN10GlobalTask7executeEv+0x23f) [0x7f34b9e78000+0x2e5b60f]
      memcached.log:2021-02-02T11:13:59.495507-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so(_ZZN17FollyExecutorPool9TaskProxy18scheduleViaCPUPoolEvENKUlvE_clEv+0xf06) [0x7f34b9e78000+0x2e2dfc6]
      memcached.log:2021-02-02T11:13:59.495606-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f34b9e78000+0x370e5c6]
      memcached.log:2021-02-02T11:13:59.495658-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f34b9e78000+0x36f6a0a]
      memcached.log:2021-02-02T11:13:59.495706-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f34b9e78000+0x37116e9]
      memcached.log:2021-02-02T11:13:59.495768-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so(_ZN5folly6detail8function14FunctionTraitsIFvvEE7callBigIZNS_21PriorityThreadFactory9newThreadEONS_8FunctionIS3_EEEUlvE_EEvRNS1_4DataE+0x109) [0x7f34b9e78000+0x2e1ac59]
      memcached.log:2021-02-02T11:13:59.495835-08:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so(_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5folly18NamedThreadFactory9newThreadEONS3_8FunctionIFvvEEEEUlvE_EEEEE6_M_runEv+0x154) [0x7f34b9e78000+0x2e15704]
      memcached.log:2021-02-02T11:13:59.495873-08:00 CRITICAL     /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f34b3c3b000+0xb9dcf]
      memcached.log:2021-02-02T11:13:59.495884-08:00 CRITICAL     /lib64/libpthread.so.0() [0x7f34b27e8000+0x7ea5]
      memcached.log:2021-02-02T11:13:59.495917-08:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f34b241a000+0xfe8dd]
      

      cbcollect_info attached.

      Attachments

        1. bt_full_all_threads.txt
          146 kB
        2. bt_full.txt
          9 kB
        3. info_threads.txt
          4 kB
        4. test.log.txt
          83 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty