Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4743

Totally messed up data files after night of constant data updating with compaction running

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0-developer-preview-4
    • Fix Version/s: 2.0-developer-preview-4
    • Component/s: view-engine
    • Security Level: Public
    • Labels:
      None

      Description

      To verify that couchdb compaction cannot keep up with mutations I've run simple singe node test case for night. Same 200k documents of 100 bytes each were constantly SET through memcached. Autocompaction threshold was lowered to 2%. So mccouch was constantly writing stuff and compacting.

      On morning I've found data files size to be huge and constant stream of error messages somewhere from couch. I stopped server.

      Now each time I start it starts that constant couch crashes and I cannot get my data back.

      Going to attach diag (search from the bottom for:

      [ns_server:info] [2012-02-01 13:15:14] [nonode@nohost:ns_server_cluster_sup:log_os_info:start_link:27] Manifest:

      to see events since last start attempt). I have data files archived as well and I'm ready to provide them on first request.

      1. diag.bz2
        588 kB
        Aleksey Kondratenko
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        damien damien added a comment -

        Based on what I'm seeing in the logs, this is caused by a failure to rename the compaction file (foo.compact) after we've deleted the old storage file. Possibly because of an out of disk error. Then on opening the database on restart, we get a bad match error when we expect there to be a properly named storage file, but only the compact file exists, causing a bad match.

        Patch here that doesn't delete old storage file until new compacted storage file is properly renamed, and will delete the .compact on open if found:
        http://review.couchbase.org/#change,12969

        Alk, your directory files should now open, but it will delete the .compact file, losing a vbuckets db. If you wish to preserve it, rename it from foo.couch.1.compact to foo.2.couch, then start the couchdb server. With the new fix, in the future it should be impossible to have a compact file without a storage file, assuming the FS hasn't corrupted, so no data loss should be possible.

        Show
        damien damien added a comment - Based on what I'm seeing in the logs, this is caused by a failure to rename the compaction file (foo.compact) after we've deleted the old storage file. Possibly because of an out of disk error. Then on opening the database on restart, we get a bad match error when we expect there to be a properly named storage file, but only the compact file exists, causing a bad match. Patch here that doesn't delete old storage file until new compacted storage file is properly renamed, and will delete the .compact on open if found: http://review.couchbase.org/#change,12969 Alk, your directory files should now open, but it will delete the .compact file, losing a vbuckets db. If you wish to preserve it, rename it from foo.couch.1.compact to foo.2.couch, then start the couchdb server. With the new fix, in the future it should be impossible to have a compact file without a storage file, assuming the FS hasn't corrupted, so no data loss should be possible.

          People

          • Assignee:
            damien damien
            Reporter:
            alkondratenko Aleksey Kondratenko (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes