Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6844

[system test] failed to delete data, cannot locate database file

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta
    • Fix Version/s: 2.0-beta-2
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
    • Environment:
      centos 6.2 64 bit on ec2 build 2.0.0-1808

      Description

      Create a 6 nodes cluster installed with couchbase server 2.0.0-1808. Consistent view is disable.
      Each node has 14 GB RAM and 2 ebs volumes, one for /data and another for /view
      Create 2 bucket and load 9 million items to each bucket.
      Create a doc for bucket.

      When add 2 nodes to cluster, rebalance failed due to one node down (segfault bug MB-6638).
      Stop all loads and restart couchbase server on down node
      During rebalance, got write commit failed error. Check memcached logs, see the following error

      memcached.log.3.txt:Tue Oct 9 00:18:58.901338 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/122.couch.1 option=0 rev=1 retried=2 error=no such file [none]
      memcached.log.3.txt:Tue Oct 9 00:18:58.901368 UTC 3: Warning: failed to open database, vbucketId = 122 fileRev = 1 numDocs = 68
      memcached.log.3.txt:Tue Oct 9 00:18:58.901378 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 122 rev = 1
      memcached.log.3.txt:Tue Oct 9 00:18:58.904685 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/55.couch.1 option=0 rev=1 retried=2 error=no such file [none]
      memcached.log.3.txt:Tue Oct 9 00:18:58.904700 UTC 3: Warning: failed to open database, vbucketId = 55 fileRev = 1 numDocs = 12
      memcached.log.3.txt:Tue Oct 9 00:18:58.904708 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 55 rev = 1
      memcached.log.3.txt:Tue Oct 9 00:23:24.492253 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/344.couch.1 option=0 rev=1 retried=2 error=no such file [none]
      memcached.log.3.txt:Tue Oct 9 00:23:24.492286 UTC 3: Warning: failed to open database, vbucketId = 344 fileRev = 1 numDocs = 67
      memcached.log.3.txt:Tue Oct 9 00:23:24.492295 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 344 rev = 1

      memcached.log.3.txt:Tue Oct 9 00:23:40.153322 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/499.couch.1
      memcached.log.3.txt:Tue Oct 9 00:23:40.153668 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/499.couch.1

      • Database file exists in node

      [root@ip-10-248-109-239 logs]# ls /data/saslbucket/ | grep 499
      499.couch.1
      [root@ip-10-248-109-239 logs]# ls /data/saslbucket/ | grep 122
      122.couch.1

      Will add collect info later

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        karan Karan Kumar (Inactive) added a comment -

        Chiyoung,

        Let me know if this does not fall into ep_engine project. I will re-assign it accordingly.

        Show
        karan Karan Kumar (Inactive) added a comment - Chiyoung, Let me know if this does not fall into ep_engine project. I will re-assign it accordingly.
        Hide
        chiyoung Chiyoung Seo added a comment -

        Jin,

        We still saw the commit failure due to "file not found". Please take a look at it to see if there are any regressions from our recent changes in couchkvstore.

        Show
        chiyoung Chiyoung Seo added a comment - Jin, We still saw the commit failure due to "file not found". Please take a look at it to see if there are any regressions from our recent changes in couchkvstore.
        Hide
        jin Jin Lim added a comment -

        Looking at the logs above, those warnings look harmless. Can't really tell without the full log but all looks apparently the case where subsequent retry must have succeeded.
        Please attache the full logs here. Thanks.

        Show
        jin Jin Lim added a comment - Looking at the logs above, those warnings look harmless. Can't really tell without the full log but all looks apparently the case where subsequent retry must have succeeded. Please attache the full logs here. Thanks.
        Hide
        jin Jin Lim added a comment -

        If the delete data was being issued after either flush/delete vbucket 499, it will be a noop. The logging behavior has slightly changed but the noop logic has been there since the early inception of 2.0.
        Also after flushing and deleting a given vbucket, ep engine recreates a empty vbucket (db file) with the same vbucket id. Thus I bet 499.couch.1 & 122.couch.1 are empty in the above case.

        If the system is still alive, please run the following and paste the result here as well.

        ~/install/bin/couch_dbinfo ~/data/saslbucket/499.couch.1 (and 122.couch.1)

        Thanks!

        Show
        jin Jin Lim added a comment - If the delete data was being issued after either flush/delete vbucket 499, it will be a noop. The logging behavior has slightly changed but the noop logic has been there since the early inception of 2.0. Also after flushing and deleting a given vbucket, ep engine recreates a empty vbucket (db file) with the same vbucket id. Thus I bet 499.couch.1 & 122.couch.1 are empty in the above case. If the system is still alive, please run the following and paste the result here as well. ~/install/bin/couch_dbinfo ~/data/saslbucket/499.couch.1 (and 122.couch.1) Thanks!
        Hide
        jin Jin Lim added a comment -

        Please see my comments requesting further information.

        Show
        jin Jin Lim added a comment - Please see my comments requesting further information.
        Show
        thuan Thuan Nguyen added a comment - Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/ec2/20121008/8nodes-1808-ec2-colinfo-write-commit-failed-20121008-180102.tgz
        Hide
        thuan Thuan Nguyen added a comment - - edited

        Here are the output from Jin command

        [root@ip-10-248-109-239 tmp]# /opt/couchbase/bin/couch_dbinfo /data/saslbucket/499.couch.1
        DB Info (/data/saslbucket/499.couch.1)
        file format version: 10
        update_seq: 8803
        doc count: 8743
        deleted doc count: 60
        data size: 8.63 MB
        B-tree size: 755.0 kB
        total disk size: 9.45 MB

        [root@ip-10-248-109-239 tmp]# /opt/couchbase/bin/couch_dbinfo /data/saslbucket/122.couch.1
        DB Info (/data/saslbucket/122.couch.1)
        file format version: 10
        update_seq: 8821
        doc count: 8753
        deleted doc count: 68
        data size: 8.60 MB
        B-tree size: 756.8 kB
        total disk size: 9.42 MB

        • Disk write commit also failed in other node
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.1.txt:Mon Oct 8 07:35:11.881001 UTC 3: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear"
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.2.txt:Mon Oct 8 18:27:57.567943 UTC 3: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear"
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.3.txt:Mon Oct 8 18:29:46.512624 UTC 3: Warning: failed to parse the session stats json doc!!!
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Mon Oct 8 21:24:24.698305 UTC 3: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear"
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:26:30.516362 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/355.couch.1 option=0 rev=1 retried=2 error=no such file [none]
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:26:30.516422 UTC 3: Warning: failed to open database, vbucketId = 355 fileRev = 1 numDocs = 32
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:26:30.516433 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 355 rev = 1
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.725180 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.725562 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.725970 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1
          /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.726365 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1

        [root@ip-10-248-109-80 tmp]# /opt/couchbase/bin/couch_dbinfo /data/saslbucket/355.couch.1
        DB Info (/data/saslbucket/355.couch.1)
        file format version: 10
        update_seq: 8911
        doc count: 8601
        deleted doc count: 62
        data size: 8.50 MB
        B-tree size: 747.1 kB
        total disk size: 9.63 MB

        [root@ip-10-248-109-80 tmp]# /opt/couchbase/bin/couch_dbinfo /data/saslbucket/552.couch.1
        DB Info (/data/saslbucket/552.couch.1)
        file format version: 10
        update_seq: 8860
        doc count: 8714
        deleted doc count: 73
        data size: 8.59 MB
        B-tree size: 772.2 kB
        total disk size: 9.77 MB

        Show
        thuan Thuan Nguyen added a comment - - edited Here are the output from Jin command [root@ip-10-248-109-239 tmp] # /opt/couchbase/bin/couch_dbinfo /data/saslbucket/499.couch.1 DB Info (/data/saslbucket/499.couch.1) file format version: 10 update_seq: 8803 doc count: 8743 deleted doc count: 60 data size: 8.63 MB B-tree size: 755.0 kB total disk size: 9.45 MB [root@ip-10-248-109-239 tmp] # /opt/couchbase/bin/couch_dbinfo /data/saslbucket/122.couch.1 DB Info (/data/saslbucket/122.couch.1) file format version: 10 update_seq: 8821 doc count: 8753 deleted doc count: 68 data size: 8.60 MB B-tree size: 756.8 kB total disk size: 9.42 MB Disk write commit also failed in other node /opt/couchbase/var/lib/couchbase/logs/memcached.log.1.txt:Mon Oct 8 07:35:11.881001 UTC 3: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear" /opt/couchbase/var/lib/couchbase/logs/memcached.log.2.txt:Mon Oct 8 18:27:57.567943 UTC 3: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear" /opt/couchbase/var/lib/couchbase/logs/memcached.log.3.txt:Mon Oct 8 18:29:46.512624 UTC 3: Warning: failed to parse the session stats json doc!!! /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Mon Oct 8 21:24:24.698305 UTC 3: Warning: failed to load the engine session stats due to IO exception "basic_ios::clear" /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:26:30.516362 UTC 3: Warning: couchstore_open_db failed, name=/data/saslbucket/355.couch.1 option=0 rev=1 retried=2 error=no such file [none] /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:26:30.516422 UTC 3: Warning: failed to open database, vbucketId = 355 fileRev = 1 numDocs = 32 /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:26:30.516433 UTC 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 355 rev = 1 /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.725180 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1 /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.725562 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1 /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.725970 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1 /opt/couchbase/var/lib/couchbase/logs/memcached.log.4.txt:Tue Oct 9 00:28:48.726365 UTC 3: Warning: failed to delete data, cannot locate database file /data/saslbucket/552.couch.1 [root@ip-10-248-109-80 tmp] # /opt/couchbase/bin/couch_dbinfo /data/saslbucket/355.couch.1 DB Info (/data/saslbucket/355.couch.1) file format version: 10 update_seq: 8911 doc count: 8601 deleted doc count: 62 data size: 8.50 MB B-tree size: 747.1 kB total disk size: 9.63 MB [root@ip-10-248-109-80 tmp] # /opt/couchbase/bin/couch_dbinfo /data/saslbucket/552.couch.1 DB Info (/data/saslbucket/552.couch.1) file format version: 10 update_seq: 8860 doc count: 8714 deleted doc count: 73 data size: 8.59 MB B-tree size: 772.2 kB total disk size: 9.77 MB
        Hide
        chiyoung Chiyoung Seo added a comment -

        Jin,

        Tony just uploaded the log files and db_info output.

        Show
        chiyoung Chiyoung Seo added a comment - Jin, Tony just uploaded the log files and db_info output.
        Hide
        jin Jin Lim added a comment -

        Thanks Tony. So this is not the delete noop case since each file has doc count > 0. However, I don't still see the failures are persisting after retry (we always have a window of retry during the compaction). Anyway, will take a look at the full logs.

        Were there any data loss or rebalance failure due to the write commit failure?

        Show
        jin Jin Lim added a comment - Thanks Tony. So this is not the delete noop case since each file has doc count > 0. However, I don't still see the failures are persisting after retry (we always have a window of retry during the compaction). Anyway, will take a look at the full logs. Were there any data loss or rebalance failure due to the write commit failure?
        Hide
        jin Jin Lim added a comment -

        Thanks Tony. So this is not the delete noop case since each file has doc count > 0. However, I don't still see the failures are persisting after retry (we always have a window of retry during the compaction). Anyway, will take a look at the full logs.

        Were there any data loss or rebalance failure due to the write commit failure?

        Show
        jin Jin Lim added a comment - Thanks Tony. So this is not the delete noop case since each file has doc count > 0. However, I don't still see the failures are persisting after retry (we always have a window of retry during the compaction). Anyway, will take a look at the full logs. Were there any data loss or rebalance failure due to the write commit failure?
        Hide
        thuan Thuan Nguyen added a comment -

        Rebalance was able to complete without any issue. Cluster continued running load with mutation, deletion and expiration. So I could not say if it had any data loss.

        Show
        thuan Thuan Nguyen added a comment - Rebalance was able to complete without any issue. Cluster continued running load with mutation, deletion and expiration. So I could not say if it had any data loss.
        Hide
        jin Jin Lim added a comment -

        OK, thanks.

        Show
        jin Jin Lim added a comment - OK, thanks.
        Hide
        jin Jin Lim added a comment -

        There are two different warnings due to similar Couchstore file i/o access error:
        1) Write commit failure due to open_db error, no such file
        2) Delete data failure due to readdir error, no file is found for the given name

        The first case (1) is a normal behavior under the current design of how the compactor deleting a old db file and renaming a newly compacted db file with a new revision number. There is always a narrow window where CouchKVStore cannot open the old db file while the new db file is being compacted. The retry logic in EP engine takes care of this window. According to the log, the failed writes eventually got succeeded after being re-queued by the retry logic thus no persisting write failures on the given vbuckets.

        The second case is a real issue we need to continue to look at. Please either open a new bug (and close this bug) with the description of (2) or update the current bug accordingly.
        Thanks~

        Show
        jin Jin Lim added a comment - There are two different warnings due to similar Couchstore file i/o access error: 1) Write commit failure due to open_db error, no such file 2) Delete data failure due to readdir error, no file is found for the given name The first case (1) is a normal behavior under the current design of how the compactor deleting a old db file and renaming a newly compacted db file with a new revision number. There is always a narrow window where CouchKVStore cannot open the old db file while the new db file is being compacted. The retry logic in EP engine takes care of this window. According to the log, the failed writes eventually got succeeded after being re-queued by the retry logic thus no persisting write failures on the given vbuckets. The second case is a real issue we need to continue to look at. Please either open a new bug (and close this bug) with the description of (2) or update the current bug accordingly. Thanks~
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Jin,

        we need to make that clear for the user so that they dont sense a major issue when they see this error.
        does the current log message in the UI mention which one of these cases we are hitting ?

        if thats the case then we should have this in the 2.0 release notes documentation

        Show
        farshid Farshid Ghods (Inactive) added a comment - Jin, we need to make that clear for the user so that they dont sense a major issue when they see this error. does the current log message in the UI mention which one of these cases we are hitting ? if thats the case then we should have this in the 2.0 release notes documentation
        Hide
        jin Jin Lim added a comment -

        Thanks for the comment. Unfortunately it isn't really easy to distinguish btw persisting write commit failure and temp. write commit failure.
        EP Engine made a modification that increase the number of retry for commit failure that eventually reduce the change of spitting out the false(temp) write commit failure message all over the place. http://review.couchbase.org/#/c/21454/

        Show
        jin Jin Lim added a comment - Thanks for the comment. Unfortunately it isn't really easy to distinguish btw persisting write commit failure and temp. write commit failure. EP Engine made a modification that increase the number of retry for commit failure that eventually reduce the change of spitting out the false(temp) write commit failure message all over the place. http://review.couchbase.org/#/c/21454/
        Hide
        jin Jin Lim added a comment -

        QE tested the toy build with the pending patch set below. No commit failure for delete was found.
        http://review.couchbase.org/#/c/21463/

        Show
        jin Jin Lim added a comment - QE tested the toy build with the pending patch set below. No commit failure for delete was found. http://review.couchbase.org/#/c/21463/
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #439 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/439/)
        MB-6844 requeue delete operation until the snapshot task is complete (Revision 03360987688eb408bdecefe844b843fb16702e04)
        MB-6844 Increase max retry number for couchstore_open_db (Revision 4f9646ca6ade4fc072165eb97571daf0bde6b9df)

        Result = SUCCESS
        Jin Lim :
        Files :

        • src/ep.cc

        Jin Lim :
        Files :

        • src/couch-kvstore/couch-kvstore.hh
        • src/couch-kvstore/couch-kvstore.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #439 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/439/ ) MB-6844 requeue delete operation until the snapshot task is complete (Revision 03360987688eb408bdecefe844b843fb16702e04) MB-6844 Increase max retry number for couchstore_open_db (Revision 4f9646ca6ade4fc072165eb97571daf0bde6b9df) Result = SUCCESS Jin Lim : Files : src/ep.cc Jin Lim : Files : src/couch-kvstore/couch-kvstore.hh src/couch-kvstore/couch-kvstore.cc

          People

          • Assignee:
            jin Jin Lim
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes