Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55216

[BP 7.2] Index build can hang in mixed mode due to projector skipping transaction records

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      There is a bug during upgrade from 6.6.x to 7.x when transactional ATR/client record documents are present.

      1. There was an optimization done in 7.0(MB-43704) to skip ATR/client records in projector and generate UpdateSeqno message in such a case.

      projector/worker.go

              isTxn := (m.Opcode == mcd.DCP_MUTATION) && !m.IsJSON() && m.HasXATTR() && bytes.HasPrefix(m.Key, transactionMutationPrefix)
              if isTxn {
                  worker.stats.txnSystemMut.Add(1)
              }
       
              // If the mutation belongs to a collection other than the
              // ones that are being processed at worker, send UpdateSeqno
              // message to indexer
              // The else case should get executed only incase of MAINT_STREAM
              // (or) when a transactional mutation is being processed in
              // INIT_STREAM
              if collEngines, ok := allEngines[m.CollectionID]; ok && !isTxn {
                  processMutation(collEngines)
              } else {
                  // Generate updateSeqno message and propagate it to indexer
                  worker.stats.updateSeqno.Add(1)
                  if data := v.makeUpdateSeqnoData(m, allEngines); data != nil {
                      worker.broadcast2Endpoints(data, worker.runFinCh)
                  } else {
                      fmsg := "%v ##%x SYSTEM_EVENT: %v NOT PUBLISHED for vbucket %v\n"
                      logging.Errorf(fmsg, logPrefix, m.Opaque, m, vbno)
                  }
              }
      

      2. In a mixed mode cluster, projector on 7.0.4 will skip any document with prefix "_txn:"(i.e. ATR/client records or any user created document with prefix "_txn:")

      3. The problem is that a 6.6.5 indexer node cannot understand and process an UpdateSeqno message. So these UpdateSeqno messages will get skipped on the indexer node. If these UpdateSeqno messages correspond to SnapEnd of a DCP snapshot marker, indexer will keep waiting indefinitely and lead to index build hang.

      Attachments

        Issue Links

          Activity

            People

              yash.dodderi Yash Dodderi
              amit.kulkarni Amit Kulkarni
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty