Details
-
Bug
-
Resolution: Fixed
-
Critical
-
5.0.0, 5.0.1, 5.5.0
-
Untriaged
-
Yes
Description
An index create that begins at 2018-02-07T10:37:11.786 is stuck at 99% in our Continuous Integration (CI) test. Sequence of events from Indexer log:
2018-02-07T10:37:11.786+05:30 [Info] Indexer::handleCreateIndex
InstId: 9927851863223419080
Defn: DefnId: 857862965098980483 Name: index_company Using: memory_optimized
And the indexer keeps retsarting vbucket 904:
2018-02-07T10:37:16.752+05:30 [Info] KVSender::sendRestartVbuckets Projector 127.0.0.1:10001 Topic MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9 default
|
2018-02-07T10:37:16.805+05:30 [Info] KVSender::sendRestartVbuckets Success Projector 127.0.0.1:10001 Topic MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9 default
|
2018-02-07T10:37:16.811+05:30 [Info] TK StreamBegin MAINT_STREAM default 904 280178086026004 0
|
2018-02-07T10:37:16.811+05:30 [Info] TK StreamEnd MAINT_STREAM default 904 280178086026004 0
|
|
From memcached.log.000004.txt log, below errors can be seen:
|
2018-02-07T16:49:09.345656Z WARNING (default) CouchKVStore::openDB: error:no such file [No such file or directory], name:/opt/build/ns_server/data/n_1/data/default/904.couch.1, option:2, fileRev:1
|
2018-02-07T16:49:09.346310Z WARNING (default) CouchKVStore::openDB: No such file, found:1 alternative files for /opt/build/ns_server/data/n_1/data/default/904.couch
|
2018-02-07T16:49:09.346315Z WARNING (default) CouchKVStore::openDB: Found /opt/build/ns_server/data/n_1/data/default/904.couch.3
|
2018-02-07T16:49:09.346320Z WARNING (default) CouchKVStore::initScanContext: openDB error:no such file, name:/opt/build/ns_server/data/n_1/data/default/904.couch.1
|
2018-02-07T16:49:09.346328Z WARNING 53: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/3 - DCPBackfillDisk::create(): (vb:904) backfill create ended prematurely as the disk cannot be scanned. Associated stream is set to dead state. The vbucket state: active
|
2018-02-07T16:49:09.346342Z INFO 53: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/3 - (vb 904) Stream closing, sent until seqno 0 remaining items 0, reason: The stream closed early due to backfill failure
|
2018-02-07T16:49:09.346548Z INFO 53: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/3 - (vb 904) Stream closed, 0 items sent from backfill phase, 0 items sent from memory phase, 0 was last seqno sent
|
2018-02-07T16:49:15.404606Z INFO 54: DCP connection opened successfully. flags:{INCLUDE_XATTRS } [ 127.0.0.1:38392 - 127.0.0.1:12002 (<ud>@projector</ud>) ]
|
2018-02-07T16:49:15.406533Z INFO 54: (default) DCP (Producer) eq_dcpq:secidx:getfailoverlog-default-1518002355402003309 - Removing connection 0x7f63397b0c00
|
2018-02-07T16:49:15.417167Z INFO 49: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/0 - (vb 904) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615
|
2018-02-07T16:49:15.417187Z INFO 49: (default) DCP (Producer) eq_dcpq:secidx:proj-default-MAINT_STREAM_TOPIC_c09638d6f32ba98c1f5fac437ffd03f9-8519938057506553894/0 - (vb 904) Scheduling backfill from 1 to 49, reschedule flag : False
|
These errors are continuously being logged. The CI test included latest changes in kv_engine upto:
kv_engine is at 652004f, changes since last good build:
652004f79 MB-27769: DCP Producer: add stats for CheckpointProcessorTask
2a220a5b5 Merge "Merge remote-tracking branch 'couchbase/spock'"
408023220 Merge remote-tracking branch 'couchbase/spock'
3c8b5eb6c MB-27554: [BP] Move numTotalItems from HashTable -> VBucket
46152ca14 MB-27554: [BP] Don't over-count VBucket item counts
1ff902e50 MB-27554: [BP] Move flushing code from KVBucket to EPBucket
3ecac16dd MB-27554: [BP] Make VBucket::getNumNonResidentItems virtual
6a470ac3e MB-27554: [BP] Centralize HashTable count statistics
793a949ef MB-27554: [BP] Add descriptions to magic StoredValue::bySeqno values
fe355bc98 MB-27554: [BP] Move compaction code from KVBucket to EPBucket
5b5e2d0a4 MB-27554: [BP] Revert 'Update total items using item count from the underlying KVStore'
84d24a311 MB-27554: [BP] HashTable::setValue() - make private
c11a6b04b MB-27554: [BP] Use NonNegativeCounter for HashTable::numTempItems
0bdbe4450 MB-27554: [BP] Change HashTable::numTotalItems to NonNegativeCounter
fef3ef9db MB-27554: [BP] HashTable: Clarify definition of numItems / getNumInMemoryItems()
eadd72735 MB-27554: [BP] HashTable: Include numItems in operator<<
260320a87 MB-27554: [BP] Move unnecesarily public HashTable members to private
4855b0f2c MB-27554: [BP] Move code which directly changes HashTable state to member functions
After pinning our CI test to 1823 build, tests seem to be passing now.
Test environment:
The tests are running on cluster_run using:
./cluster_connect -n2 -s 1500 -I 1500 -T n0:kv+n1ql,n1:kv+index