Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-15007

[system tests] memcached crash (core file is truncated)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 4.0.0
    • 4.0.0
    • memcached
    • Security Level: Public
    • None
    • 4.0.0-2093
    • Untriaged
    • Unknown

    Description

      steps:
      1. 3 nodes in cluster, 4 buckets. run data loader more then a day
      2. setup replication from SRC to cluster DEST for all buckets.
      3. rebalance in at SRC cluster
      rebalance in at DEST cluster
      4. Graceful Fail Over(rebalance) for node in SRC cluster, add back(Delta Recovery)
      5. click failover, Hard Fail Over for node in SRC cluster A, add back(Full Recovery) and rebalance
      6. remove node in SRC cluster, stop rebalance. Cancel removing node and rebalance
      7. rebalance out 1 node on SRC cluster
      8. rebalance out 1 node on DEST cluster
      9. rebalance in 2 nodes on SRC cluster

      please note MB-14983 & MB-14984 have been posted

      without any actions with the cluster memcached crash generated on root@172.23.105.157 node. then the node is in pending state

      I know that gdb stack trace is not useful here but anyway:

      gdb /opt/couchbase/bin/memcached -c core.memcached.6391
      GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
      Copyright (C) 2010 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law. Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-redhat-linux-gnu".
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>...
      Reading symbols from /opt/couchbase/bin/memcached...(no debugging symbols found)...done.
      BFD: Warning: /tmp/core.memcached.6391 is truncated: expected core file size >= 3021598720, found: 172032.
      [New Thread 7536]
      [New Thread 6391]
      [New Thread 6405]
      [New Thread 6400]
      [New Thread 6404]
      [New Thread 6402]
      [New Thread 7537]
      [New Thread 7526]
      [New Thread 7528]
      [New Thread 6407]
      [New Thread 7529]
      [New Thread 6401]
      [New Thread 6403]
      [New Thread 6408]
      [New Thread 7531]
      [New Thread 7534]
      [New Thread 6406]
      [New Thread 7530]
      [New Thread 7532]
      [New Thread 7533]
      [New Thread 6409]
      [New Thread 7535]
      Cannot access memory at address 0x7fe10914f168
      Cannot access memory at address 0x7fe10914f168
      Cannot access memory at address 0x7fe10914f168
      Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/ld-linux-x86-64.so.2
      Failed to read a valid object file image from memory.
      Core was generated by `/opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcach'.
      Program terminated with signal 6, Aborted.
      #0 0x00007fe10616e8a5 in ?? ()
      Missing separate debuginfos, use: debuginfo-install couchbase-server-4.0.0-2093.x86_64
      (gdb) bt
      #0 0x00007fe10616e8a5 in ?? ()
      Cannot access memory at address 0x7fe0f54336e8
      (gdb) t a a bt

      Thread 22 (Thread 7535):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0f5e321c0

      Thread 21 (Thread 6409):
      #0 0x00007fe106224f03 in ?? ()
      Cannot access memory at address 0x7fe0fdde9c80

      Thread 20 (Thread 7533):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0f72341c0

      Thread 19 (Thread 7532):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0f7c351c0

      Thread 18 (Thread 7530):
      #0 0x00007fe1070867bb in ?? ()
      Cannot access memory at address 0x7fe0f9039ac0

      Thread 17 (Thread 6406):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0ffbeaeb0

      Thread 16 (Thread 7534):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0f68331c0

      Thread 15 (Thread 7531):
      #0 0x00007fe1070867bb in ?? ()
      Cannot access memory at address 0x7fe0f8638ac0

      Thread 14 (Thread 6408):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0fe7e8eb0

      Thread 13 (Thread 6403):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe1019edeb0

      Thread 12 (Thread 6401):
      #0 0x00007fe10621766d in ?? ()
      Cannot access memory at address 0x7fe102ffec40

      Thread 11 (Thread 7529):
      #0 0x00007fe1070867bb in ?? ()
      Cannot access memory at address 0x7fe0f9a3aac0

      Thread 10 (Thread 6407):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe0ff1e9eb0

      Thread 9 (Thread 7528):
      #0 0x00007fe1070867bb in ?? ()
      Cannot access memory at address 0x7fe0fa43bac0

      Thread 8 (Thread 7526):
      #0 0x00007fe1061e8b8d in ?? ()
      Cannot access memory at address 0x7fe0fd3e8d50

      Thread 7 (Thread 7537):
      #0 0x00007fe107089054 in ?? ()
      Cannot access memory at address 0x7fe0f4a32a50

      Thread 6 (Thread 6402):
      #0 0x00007fe1070867bb in ?? ()
      Cannot access memory at address 0x7fe1025fdcc0

      --Type <return> to continue, or q <return> to quit--
      Thread 5 (Thread 6404):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe100feceb0

      Thread 4 (Thread 6400):
      #0 0x00007fe10621760d in ?? ()
      Cannot access memory at address 0x7fe1041fdc50

      Thread 3 (Thread 6405):
      #0 0x00007fe10708643c in ?? ()
      Cannot access memory at address 0x7fe1005eb6a0

      Thread 2 (Thread 6391):
      #0 0x00007fe106224f03 in ?? ()
      Cannot access memory at address 0x7fff3919ee20

      Thread 1 (Thread 7536):
      #0 0x00007fe10616e8a5 in ?? ()
      Cannot access memory at address 0x7fe0f54336e8
      (gdb)

      after the step #9. rebalance in 2 nodes on SRC cluster:
      src cluster:172.23.105.22,172.23.105.156,172.23.105.157,172.23.105.158,172.23.105.207
      dest cluster: 172.23.105.159, 172.23.105.160, 172.23.105.206

      one more comment: data is not sync in the clusters( can create separate ticket if it's required)

      list of all crashes in the clusters: https://friendpaste.com/3relYTEwZZX44kxwOl2ezi

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty