Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7735

Memcached crash on a node on the destination cluster

    Details

      Description

      Live clusters:
      Source: http://10.6.2.37:8091/
      Destination: http://10.6.2.89:8091/

      On Source:
      default: ~>70% Resident ratio
      saslbucket: ~>40% Resident ratio

      System uptime: >36 hours

      On Destination:
      2 nodes went down: 10.6.2.68, 10.6.2.69

      • "ip seems to have changed" seen for both the nodes
      • ns_1@10.6.2.68
        Server error during processing: ["web request failed", {path,"/pools/default"}, {type,exit},
        {what,
        {timeout, {gen_server,call, [ns_node_disco,nodes_wanted]}}},
        {trace,
        [{gen_server,call,2}, {ns_orchestrator,needs_rebalance,0}, {ns_cluster_membership,is_balanced,0}, {menelaus_web,build_pool_info,4}, {menelaus_web,handle_pool_info,2}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}] (repeated 15 times)

        - ns_1@10.6.2.69
        Server error during processing: ["web request failed",{path,"/pools/default"}

        ,

        {type,exit}

        ,
        {what,
        {{timeout,
        {gen_server,call,
        [

        {'stats_reader-default', 'ns_1@10.6.2.69'}

        ,

        {latest,minute,1}

        ]}},

        {gen_server,call, [menelaus_web_alerts_srv,fetch_alert]}

        }},

        Unknown macro: {trace,[{gen_server,call,2},{diag_handler,diagnosing_timeouts,1},{menelaus_web,build_pool_info,4},{menelaus_web,handle_pool_info,2},{menelaus_web,loop,3},{mochiweb_http,headers,5},{proc_lib,init_p_do_apply,3}]}

        ]

      • Memcached core on 10.6.2.68

      The Back trace:

      [root@pine-11803 data]# gdb /opt/couchbase/bin/memcached core.memcached.17636
      GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
      Copyright (C) 2010 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law. Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-redhat-linux-gnu".
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>...
      Reading symbols from /opt/couchbase/bin/memcached...done.
      [New Thread 17656]
      [New Thread 17645]
      [New Thread 17654]
      [New Thread 17646]
      [New Thread 17652]
      [New Thread 17653]
      [New Thread 17655]
      [New Thread 17636]
      Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done.
      Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
      Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
      Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
      Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libdl.so.2
      Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libm.so.6
      Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib64/librt.so.1
      Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done.
      Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4
      Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
      [Thread debugging using libthread_db enabled]
      Loaded symbols for /lib64/libpthread.so.0
      Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libc.so.6
      Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/ld-linux-x86-64.so.2
      Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
      Loaded symbols for /usr/lib64/libstdc++.so.6
      Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libgcc_s.so.1
      Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
      Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
      Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
      Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/ep.so
      Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
      Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
      Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libnss_files.so.2
      Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
      Program terminated with signal 6, Aborted.
      #0 0x00007fc8afec28a5 in raise () from /lib64/libc.so.6
      Missing separate debuginfos, use: debuginfo-install couchbase-server-2.0.1-153.x86_64
      (gdb) t a a bt

      Thread 8 (Thread 0x7fc8b1376720 (LWP 17636)):
      #0 0x00007fc8aff6a43d in write () from /lib64/libc.so.6
      #1 0x00007fc8aff01033 in _IO_new_file_write () from /lib64/libc.so.6
      #2 0x00007fc8aff00efa in _IO_new_file_xsputn () from /lib64/libc.so.6
      #3 0x00007fc8afef692c in fputs () from /lib64/libc.so.6
      #4 0x00007fc8aeb61143 in logger_log (severity=EXTENSION_LOG_WARNING, client_cookie=<value optimized out>, fmt=0x7fc8aab1bebb "Schedule cleanup of \"%s\"") at extensions/loggers/file_logger.c:275
      #5 0x00007fc8aaad06a4 in TapConnMap::shutdownAllTapConnections (this=0x636e240) at src/tapconnmap.cc:366
      #6 0x00007fc8aaa989d1 in EventuallyPersistentEngine::destroy (this=0x6374000, force=<value optimized out>) at src/ep_engine.cc:1401
      #7 0x00007fc8aaa98ace in EvpDestroy (handle=<value optimized out>, force=false) at src/ep_engine.cc:130
      #8 0x00007fc8adf52bb5 in bucket_shutdown_engine (key=<value optimized out>, nkey=<value optimized out>, val=0x63262a0, nval=<value optimized out>, args=<value optimized out>) at bucket_engine.c:1290
      #9 0x00007fc8adf5966c in genhash_iter (h=0x632a000, iterfunc=0x7fc8adf52b80 <bucket_shutdown_engine>, arg=0x0) at genhash.c:275
      #10 0x00007fc8adf53f46 in bucket_destroy (handle=0x7fc8ae15c640, force=<value optimized out>) at bucket_engine.c:1327
      #11 0x0000000000409777 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7927

      Thread 7 (Thread 0x7fc8a8627700 (LWP 17655)):
      #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00007fc8aaa76f28 in wait (this=0x63a22d0, d=...) at src/syncobject.hh:58
      #2 IdleTask::run (this=0x63a22d0, d=...) at src/dispatcher.cc:336
      #3 0x00007fc8aaa795ea in Dispatcher::run (this=0x636b880) at src/dispatcher.cc:173
      #4 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636b880) at src/dispatcher.cc:28
      #5 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #6 0x00007fc8aff776dd in clone () from /lib64/libc.so.6

      Thread 6 (Thread 0x7fc8a9a29700 (LWP 17653)):
      #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00007fc8aaa76f28 in wait (this=0x63a2120, d=...) at src/syncobject.hh:58
      #2 IdleTask::run (this=0x63a2120, d=...) at src/dispatcher.cc:336
      #3 0x00007fc8aaa795ea in Dispatcher::run (this=0x636ac40) at src/dispatcher.cc:173
      #4 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636ac40) at src/dispatcher.cc:28
      #5 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #6 0x00007fc8aff776dd in clone () from /lib64/libc.so.6

      Thread 5 (Thread 0x7fc8aa638700 (LWP 17652)):
      #0 0x00007fc8aff3b97d in nanosleep () from /lib64/libc.so.6
      #1 0x00007fc8aff70b34 in usleep () from /lib64/libc.so.6
      #2 0x00007fc8aaab67f5 in updateStatsThread (arg=0x1ab44c0) at src/memory_tracker.cc:31
      #3 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #4 0x00007fc8aff776dd in clone () from /lib64/libc.so.6

      Thread 4 (Thread 0x7fc8aeb5d700 (LWP 17646)):
      #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00007fc8aeb614d6 in logger_thead_main (arg=0x1ab4040) at extensions/loggers/file_logger.c:368
      #2 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #3 0x00007fc8aff776dd in clone () from /lib64/libc.so.6

      Thread 3 (Thread 0x7fc8a9028700 (LWP 17654)):
      #0 0x00007fc8b022d7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00007fc8aaa76f28 in wait (this=0x63a2090, d=...) at src/syncobject.hh:58
      #2 IdleTask::run (this=0x63a2090, d=...) at src/dispatcher.cc:336
      #3 0x00007fc8aaa795ea in Dispatcher::run (this=0x636aa80) at src/dispatcher.cc:173
      #4 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636aa80) at src/dispatcher.cc:28
      #5 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #6 0x00007fc8aff776dd in clone () from /lib64/libc.so.6

      Thread 2 (Thread 0x7fc8af772700 (LWP 17645)):
      #0 0x00007fc8aff6a3dd in read () from /lib64/libc.so.6
      #1 0x00007fc8aff01248 in _IO_new_file_underflow () from /lib64/libc.so.6
      #2 0x00007fc8aff02d4e in _IO_default_uflow_internal () from /lib64/libc.so.6
      #3 0x00007fc8afef753a in _IO_getline_info_internal () from /lib64/libc.so.6
      #4 0x00007fc8afef6399 in fgets () from /lib64/libc.so.6
      #5 0x00007fc8af773939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37
      #6 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #7 0x00007fc8aff776dd in clone () from /lib64/libc.so.6
      --Type <return> to continue, or q <return> to quit--

      Thread 1 (Thread 0x7fc8a7c26700 (LWP 17656)):
      #0 0x00007fc8afec28a5 in raise () from /lib64/libc.so.6
      #1 0x00007fc8afec4085 in abort () from /lib64/libc.so.6
      #2 0x0000000000404315 in release_cookie (cookie=<value optimized out>) at daemon/memcached.c:6707
      #3 0x00007fc8adf55009 in bucket_engine_release_cookie (cookie=0x62c3b80) at bucket_engine.c:2565
      #4 0x00007fc8aaa9461a in EventuallyPersistentEngine::releaseCookie (this=0x6374000, cookie=0x62c3b80) at src/ep_engine.cc:1230
      #5 0x00007fc8aaabf376 in TapConnection::releaseReference (this=0x638a000, force=<value optimized out>) at src/tapconnection.cc:110
      #6 0x00007fc8aaad2aeb in TapConnectionReaperCallback::callback(Dispatcher&, SingleThreadedRCPtr<Task>&) () from /opt/couchbase/lib/memcached/ep.so
      #7 0x00007fc8aaa795ea in Dispatcher::run (this=0x636b6c0) at src/dispatcher.cc:173
      #8 0x00007fc8aaa79eeb in launch_dispatcher_thread (arg=0x636b6c0) at src/dispatcher.cc:28
      #9 0x00007fc8b0229851 in start_thread () from /lib64/libpthread.so.0
      #10 0x00007fc8aff776dd in clone () from /lib64/libc.so.6

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        maria Maria McDuff (Inactive) added a comment -

        From: Chisheng Hong <chisheng@couchbase.com>
        Date: Thursday, May 23, 2013 4:29 PM
        To: Mike Wiederhold <mike@couchbase.com>
        Cc: Maria McDuff <maria@couchbase.com>, Ketaki Gangal <Ketaki@couchbase.com>
        Subject: Toy build issue

        Hi Mike,
        You toy build issue is that if I want to create a view in any bucket, that bucket will go down and never come up. You can try on this node http://10.3.2.52:8091/index.html

        -Chisheng

        Show
        maria Maria McDuff (Inactive) added a comment - From: Chisheng Hong <chisheng@couchbase.com> Date: Thursday, May 23, 2013 4:29 PM To: Mike Wiederhold <mike@couchbase.com> Cc: Maria McDuff <maria@couchbase.com>, Ketaki Gangal <Ketaki@couchbase.com> Subject: Toy build issue Hi Mike, You toy build issue is that if I want to create a view in any bucket, that bucket will go down and never come up. You can try on this node http://10.3.2.52:8091/index.html -Chisheng
        Hide
        maria Maria McDuff (Inactive) added a comment -

        CHIYOUNG to merge his fix today.

        Show
        maria Maria McDuff (Inactive) added a comment - CHIYOUNG to merge his fix today.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        pls verify in new upcoming build today.

        Show
        maria Maria McDuff (Inactive) added a comment - pls verify in new upcoming build today.
        Hide
        ketaki Ketaki Gangal added a comment -

        Closing this , not seen on recent runs 2.1.0-701 running for over 40+ hours.

        Show
        ketaki Ketaki Gangal added a comment - Closing this , not seen on recent runs 2.1.0-701 running for over 40+ hours.
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #488 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/488/)
        MB-7735: Fix to a race in notifying / releasing TAP connections (Revision c23c350dd15e8aa4cbe8a81a8f6eadb28ab62dd9)

        Result = SUCCESS
        Chiyoung Seo :
        Files :

        • src/tapconnmap.cc
        • src/tapconnmap.hh
        • src/tapconnection.hh
        • src/ep_engine.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #488 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/488/ ) MB-7735 : Fix to a race in notifying / releasing TAP connections (Revision c23c350dd15e8aa4cbe8a81a8f6eadb28ab62dd9) Result = SUCCESS Chiyoung Seo : Files : src/tapconnmap.cc src/tapconnmap.hh src/tapconnection.hh src/ep_engine.cc

          People

          • Assignee:
            abhinav Abhinav Dangeti
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes