Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5423

data goes missing, nodes drop into pending state with cluster in steady state

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 2.0-developer-preview-4
    • Fix Version/s: 2.0-beta
    • Component/s: None
    • Security Level: Public
    • Labels:
      None
    • Environment:
      At least on CentOS 6.2, and possibly one other platform

      Description

      There are reports of at least the Web Console showing nodes in a good size cluster (>5 nodes) going into pending state at the web console.

      From the reporter:

      1. A node will fill it's RAM, then go into a PEND state.
      2. I reboot the service on the node.
      3. The node starts up and loads its docs.
      4. It drops all it's documents and goes back into PEND state.
      5. Loads it's docs up again then goes UP state.
      6. Then repeats the loss of docs and PEND/UP state.

      See the thread here:
      http://www.couchbase.com/forums/thread/losing-data-when-restarting-node

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Having some logs would help here. This looks like memcached crash. And even user-level logs would confirm that.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Having some logs would help here. This looks like memcached crash. And even user-level logs would confirm that.
        Hide
        SteveC Steven Cooke added a comment -

        My setup is single node and pretty much what you get "out of the box" so cloning another cluster is not a possible cause. It is not apparent that RAM is filled as after the node comes up RAM is often <10M per bucket

        I believe there are two issues:

        1) the node dropping into pending state. The log messages I get are:

        Module Code: ns_port_server000

        Port server memcached on node 'ns_1@127.0.0.1' exited with status 134. Restarting. Messages: Failed to connect to: "localhost:11213"
        Connection closed by mccouch
        Preloaded 136076 keys (with metadata)
        Trying to connect to mccouch: "localhost:11213"
        Failed to connect to: "localhost:11213"
        Failed to connect to: "localhost:11213"
        Connection closed by mccouch
        Connection closed by mccouch
        Preloaded 3998 keys (with metadata)
        Preloaded 102201 keys (with metadata)
        Trying to connect to mccouch: "localhost:11213"
        Trying to connect to mccouch: "localhost:11213"
        ....

        Followed by
        Module Code: ns_memcached004

        Control connection to memcached on 'ns_1@127.0.0.1' disconnected: {{badmatch,
        {error,
        closed}},
        [

        {mc_client_binary, stats_recv, 4}

        ,

        {mc_client_binary, stats,4}

        ,

        {ns_memcached, do_handle_call, 3}

        ,

        {ns_memcached, handle_call, 3}

        ,

        {gen_server, handle_msg, 5}

        ,

        {proc_lib, init_p_do_apply, 3}

        ]} (repeated 15 times)

        2) The UI agent is either not collecting or not displaying summary stats correctly. The item count is almost always incorrect for all buckets when view from the Data Buckets page but the view of the bucket on port 8092 seems to show the correct item count

        Show
        SteveC Steven Cooke added a comment - My setup is single node and pretty much what you get "out of the box" so cloning another cluster is not a possible cause. It is not apparent that RAM is filled as after the node comes up RAM is often <10M per bucket I believe there are two issues: 1) the node dropping into pending state. The log messages I get are: Module Code: ns_port_server000 Port server memcached on node 'ns_1@127.0.0.1' exited with status 134. Restarting. Messages: Failed to connect to: "localhost:11213" Connection closed by mccouch Preloaded 136076 keys (with metadata) Trying to connect to mccouch: "localhost:11213" Failed to connect to: "localhost:11213" Failed to connect to: "localhost:11213" Connection closed by mccouch Connection closed by mccouch Preloaded 3998 keys (with metadata) Preloaded 102201 keys (with metadata) Trying to connect to mccouch: "localhost:11213" Trying to connect to mccouch: "localhost:11213" .... Followed by Module Code: ns_memcached004 Control connection to memcached on 'ns_1@127.0.0.1' disconnected: {{badmatch, {error, closed}}, [ {mc_client_binary, stats_recv, 4} , {mc_client_binary, stats,4} , {ns_memcached, do_handle_call, 3} , {ns_memcached, handle_call, 3} , {gen_server, handle_msg, 5} , {proc_lib, init_p_do_apply, 3} ]} (repeated 15 times) 2) The UI agent is either not collecting or not displaying summary stats correctly. The item count is almost always incorrect for all buckets when view from the Data Buckets page but the view of the bucket on port 8092 seems to show the correct item count
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Steven,

        do you see something like this in the diags ?

        memcached: stored-value.hh:281: size_t StoredValue::valLength(): Assertion `value->length() == sizeof(uval)' failed.

        Show
        farshid Farshid Ghods (Inactive) added a comment - Steven, do you see something like this in the diags ? memcached: stored-value.hh:281: size_t StoredValue::valLength(): Assertion `value->length() == sizeof(uval)' failed.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -
        Show
        farshid Farshid Ghods (Inactive) added a comment - MB-5021

          People

          • Assignee:
            dipti Dipti Borkar
            Reporter:
            ingenthr Matt Ingenthron
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes