Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.0.4, 7.1.1
-
Untriaged
-
0
-
Unknown
Description
There is a bug during upgrade from 6.6.x to 7.x when transactional ATR/client record documents are present.
1. There was an optimization done in 7.0(MB-43704) to skip ATR/client records in projector and generate UpdateSeqno message in such a case.
projector/worker.go
isTxn := (m.Opcode == mcd.DCP_MUTATION) && !m.IsJSON() && m.HasXATTR() && bytes.HasPrefix(m.Key, transactionMutationPrefix)
|
if isTxn { |
worker.stats.txnSystemMut.Add(1) |
}
|
|
// If the mutation belongs to a collection other than the |
// ones that are being processed at worker, send UpdateSeqno |
// message to indexer |
// The else case should get executed only incase of MAINT_STREAM |
// (or) when a transactional mutation is being processed in |
// INIT_STREAM |
if collEngines, ok := allEngines[m.CollectionID]; ok && !isTxn { |
processMutation(collEngines)
|
} else { |
// Generate updateSeqno message and propagate it to indexer |
worker.stats.updateSeqno.Add(1) |
if data := v.makeUpdateSeqnoData(m, allEngines); data != nil { |
worker.broadcast2Endpoints(data, worker.runFinCh)
|
} else { |
fmsg := "%v ##%x SYSTEM_EVENT: %v NOT PUBLISHED for vbucket %v\n" |
logging.Errorf(fmsg, logPrefix, m.Opaque, m, vbno)
|
}
|
}
|
2. In a mixed mode cluster, projector on 7.0.4 will skip any document with prefix "_txn:"(i.e. ATR/client records or any user created document with prefix "_txn:")
3. The problem is that a 6.6.5 indexer node cannot understand and process an UpdateSeqno message. So these UpdateSeqno messages will get skipped on the indexer node. If these UpdateSeqno messages correspond to SnapEnd of a DCP snapshot marker, indexer will keep waiting indefinitely and lead to index build hang.
Attachments
Issue Links
- Clones
-
MB-54689 [BP 7.1.4] Index build can hang in mixed mode due to projector skipping transaction records
- Closed