Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27963

Index create is hung with as one vbucket keeps closing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 5.5.0
    • 5.0.0, 5.0.1, 5.5.0
    • couchbase-bucket
    • Untriaged
    • Yes

    Description

      An index create that begins at 2018-02-07T10:37:11.786 is stuck at 99% in our Continuous Integration (CI) test. Sequence of events from Indexer log:

      2018-02-07T10:37:11.786+05:30 [Info] Indexer::handleCreateIndex
      InstId: 9927851863223419080
      Defn: DefnId: 857862965098980483 Name: index_company Using: memory_optimized

      And the indexer keeps retsarting vbucket 904:

      2018-02-07T10:37:16.752+05:30 [Info] KVSender::sendRestartVbuckets Projector 127.0.0.1:10001 Topic MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9 default
      2018-02-07T10:37:16.805+05:30 [Info] KVSender::sendRestartVbuckets Success Projector 127.0.0.1:10001 Topic MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9 default
      2018-02-07T10:37:16.811+05:30 [Info] TK StreamBegin MAINT_STREAM default 904 280178086026004 0
      2018-02-07T10:37:16.811+05:30 [Info] TK StreamEnd MAINT_STREAM default 904 280178086026004 0
       
      From memcached.log.000004.txt log, below errors can be seen:
      2018-02-07T16:49:09.345656Z WARNING (default) CouchKVStore::openDB: error:no such file [No such file or directory], name:/opt/build/ns_server/data/n_1/data/default/904.couch.1, option:2, fileRev:1
      2018-02-07T16:49:09.346310Z WARNING (default) CouchKVStore::openDB: No such file, found:1 alternative files for /opt/build/ns_server/data/n_1/data/default/904.couch
      2018-02-07T16:49:09.346315Z WARNING (default) CouchKVStore::openDB: Found /opt/build/ns_server/data/n_1/data/default/904.couch.3
      2018-02-07T16:49:09.346320Z WARNING (default) CouchKVStore::initScanContext: openDB error:no such file, name:/opt/build/ns_server/data/n_1/data/default/904.couch.1
      2018-02-07T16:49:09.346328Z WARNING 53: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/3 - DCPBackfillDisk::create(): (vb:904) backfill create ended prematurely as the disk cannot be scanned. Associated stream is set to dead state. The vbucket state: active
      2018-02-07T16:49:09.346342Z INFO 53: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/3 - (vb 904) Stream closing, sent until seqno 0 remaining items 0, reason: The stream closed early due to backfill failure
      2018-02-07T16:49:09.346548Z INFO 53: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/3 - (vb 904) Stream closed, 0 items sent from backfill phase, 0 items sent from memory phase, 0 was last seqno sent
      2018-02-07T16:49:15.404606Z INFO 54: DCP connection opened successfully. flags:{INCLUDE_XATTRS } [ 127.0.0.1:38392 - 127.0.0.1:12002 (<ud>@projector</ud>) ]
      2018-02-07T16:49:15.406533Z INFO 54: (default) DCP (Producer) eq_dcpq:secidx:getfailoverlog-default-1518002355402003309 - Removing connection 0x7f63397b0c00
      2018-02-07T16:49:15.417167Z INFO 49: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/0 - (vb 904) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615
      2018-02-07T16:49:15.417187Z INFO 49: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/0 - (vb 904) Scheduling backfill from 1 to 49, reschedule flag : False
      

      These errors are continuously being logged. The CI test included latest changes in kv_engine upto:
      kv_engine is at 652004f, changes since last good build:
      652004f79 MB-27769: DCP Producer: add stats for CheckpointProcessorTask
      2a220a5b5 Merge "Merge remote-tracking branch 'couchbase/spock'"
      408023220 Merge remote-tracking branch 'couchbase/spock'
      3c8b5eb6c MB-27554: [BP] Move numTotalItems from HashTable -> VBucket
      46152ca14 MB-27554: [BP] Don't over-count VBucket item counts
      1ff902e50 MB-27554: [BP] Move flushing code from KVBucket to EPBucket
      3ecac16dd MB-27554: [BP] Make VBucket::getNumNonResidentItems virtual
      6a470ac3e MB-27554: [BP] Centralize HashTable count statistics
      793a949ef MB-27554: [BP] Add descriptions to magic StoredValue::bySeqno values
      fe355bc98 MB-27554: [BP] Move compaction code from KVBucket to EPBucket
      5b5e2d0a4 MB-27554: [BP] Revert 'Update total items using item count from the underlying KVStore'
      84d24a311 MB-27554: [BP] HashTable::setValue() - make private
      c11a6b04b MB-27554: [BP] Use NonNegativeCounter for HashTable::numTempItems
      0bdbe4450 MB-27554: [BP] Change HashTable::numTotalItems to NonNegativeCounter
      fef3ef9db MB-27554: [BP] HashTable: Clarify definition of numItems / getNumInMemoryItems()
      eadd72735 MB-27554: [BP] HashTable: Include numItems in operator<<
      260320a87 MB-27554: [BP] Move unnecesarily public HashTable members to private
      4855b0f2c MB-27554: [BP] Move code which directly changes HashTable state to member functions

      After pinning our CI test to 1823 build, tests seem to be passing now.

      Test environment:
      The tests are running on cluster_run using:
      ./cluster_connect -n2 -s 1500 -I 1500 -T n0:kv+n1ql,n1:kv+index

      Attachments

        1. full_logs.zip
          19.41 MB
        2. logs_Feb19.zip
          27.76 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            drigby Dave Rigby (Inactive)
            prathibha Prathibha Bisarahalli (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty