Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51609

[System Test] Missing files on disk causing disk write failures

    XMLWordPrintable

Details

    • Bug
    • Resolution: Software failure
    • Critical
    • None
    • 7.1.0
    • couchbase-bucket
    • Enterprise Edition 7.1.0 build 2534

    Description

      On first look issue appears similar to MB-48616.

      QE TEST

      -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
      

      Day - 2
      Cycle - 2
      Scale - 3

      TEST STEP
      Perform auto failover and rebalance out 3 nodes.

      [2022-03-27T11:31:19-07:00, sequoiatools/couchbase-cli:7.1:d2743c] setting-autofailover -c 172.23.108.103:8091 -u Administrator -p password --enable-auto-failover=1 --auto-failover-timeout=5 --max-failovers=3
      [2022-03-27T11:31:25-07:00, sequoiatools/cmd:c99ee9] 5
      [2022-03-27T11:31:35-07:00, sequoiatools/cbutil:8eb342] /cbinit.py 172.23.106.100 root couchbase stop
      [2022-03-27T11:32:12-07:00, sequoiatools/cbutil:768ca3] /cbinit.py 172.23.123.28 root couchbase stop
      [2022-03-27T11:32:23-07:00, sequoiatools/cbutil:38ef80] /cbinit.py 172.23.104.137 root couchbase stop
      [2022-03-27T11:32:30-07:00, sequoiatools/cmd:f08a99] 10
      [2022-03-27T11:32:45-07:00, sequoiatools/couchbase-cli:7.1:56fa38] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      →  
       
      Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
      docker logs 56fa38
      docker start 56fa38
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      

      NOTE
      Rebalance ultimately failed due to eventing rebalance exit for which I'll file a seperate ticket.

      022-03-27T21:32:31.491-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.6860.1221>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}.
      Rebalance Operation Id = 150b7f2f583ed475f683ba426bc46d17
      

      OBSERVATION
      diag.log on 172.23.108.103

      2022-03-27T20:36:04.239-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "default" on node 172.23.108.103.
      2022-03-27T20:36:04.240-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "bucket7" on node 172.23.108.103.
      2022-03-27T20:36:04.240-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "bucket1" on node 172.23.108.103.
      2022-03-27T20:36:04.241-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "bucket8" on node 172.23.108.103.
      2022-03-27T20:36:04.242-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "bucket2" on node 172.23.108.103.
      2022-03-27T20:36:04.242-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "N1QL_SYSTEM_BUCKET" on node 172.23.108.103.
      2022-03-27T20:36:04.243-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "bucket9" on node 172.23.108.103.
      2022-03-27T20:36:04.244-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Write Commit Failure. Disk write failed for item in Bucket "bucket3" on node 172.23.108.103.
      2022-03-27T20:36:04.251-07:00, menelaus_web_alerts_srv:0:info:message(ns_1@172.23.108.103) - Audit Write Failure. Attempt to write to audit log on node "172.23.108.103" was unsuccessful
      

      memcached.log on 172.23.108.103
      Files are missing on disk for vBucket 994 and 995.

      grep "No such file" memcached.log
      2022-03-27T22:33:37.541633-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:995 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/995.couch.boot, option:17
      2022-03-27T22:33:46.273756-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      2022-03-27T22:33:56.898158-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:995 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/995.couch.boot, option:17
      2022-03-27T22:34:00.413034-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      2022-03-27T22:34:11.238371-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:995 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/995.couch.boot, option:17
      2022-03-27T22:34:11.467551-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      2022-03-27T22:34:19.098829-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      2022-03-27T22:34:22.946837-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:995 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/995.couch.boot, option:17
      2022-03-27T22:34:32.674250-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      2022-03-27T22:34:45.195055-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:995 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/995.couch.boot, option:17
      2022-03-27T22:34:51.512954-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      2022-03-27T22:35:05.047719-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:995 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/995.couch.boot, option:17
      2022-03-27T22:35:05.289336-07:00 WARNING (N1QL_SYSTEM_BUCKET) CouchKVStore::openOrCreate: vb:994 Open error:error opening file [No such file or directory], filename:/data/couchbase/N1QL_SYSTEM_BUCKET/994.couch.boot, option:17
      

      Issue is not being observed for Magma longevity run on same build.
      Also this issue was not seen with couchstore longevity run on 7.1.0-2506 (RC2 build).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sujay.gad Sujay Gad
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty