Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54681

Index build can hang in mixed mode due to projector skipping transaction records

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 7.1.1, 7.0.4
    • Morpheus
    • secondary-index
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      There is a bug during upgrade from 6.6.x to 7.x when transactional ATR/client record documents are present.

      1. There was an optimization done in 7.0(MB-43704) to skip ATR/client records in projector and generate UpdateSeqno message in such a case.

      projector/worker.go

              isTxn := (m.Opcode == mcd.DCP_MUTATION) && !m.IsJSON() && m.HasXATTR() && bytes.HasPrefix(m.Key, transactionMutationPrefix)
              if isTxn {
                  worker.stats.txnSystemMut.Add(1)
              }
       
              // If the mutation belongs to a collection other than the
              // ones that are being processed at worker, send UpdateSeqno
              // message to indexer
              // The else case should get executed only incase of MAINT_STREAM
              // (or) when a transactional mutation is being processed in
              // INIT_STREAM
              if collEngines, ok := allEngines[m.CollectionID]; ok && !isTxn {
                  processMutation(collEngines)
              } else {
                  // Generate updateSeqno message and propagate it to indexer
                  worker.stats.updateSeqno.Add(1)
                  if data := v.makeUpdateSeqnoData(m, allEngines); data != nil {
                      worker.broadcast2Endpoints(data, worker.runFinCh)
                  } else {
                      fmsg := "%v ##%x SYSTEM_EVENT: %v NOT PUBLISHED for vbucket %v\n"
                      logging.Errorf(fmsg, logPrefix, m.Opaque, m, vbno)
                  }
              }
      

      2. In a mixed mode cluster, projector on 7.0.4 will skip any document with prefix "_txn:"(i.e. ATR/client records or any user created document with prefix "_txn:")

      3. The problem is that a 6.6.5 indexer node cannot understand and process an UpdateSeqno message. So these UpdateSeqno messages will get skipped on the indexer node. If these UpdateSeqno messages correspond to SnapEnd of a DCP snapshot marker, indexer will keep waiting indefinitely and lead to index build hang.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            deepkaran.salooja Deepkaran Salooja added a comment - - edited

            Steps to reproduce:

            1. Setup a 3 node cluster with CB Server 6.6.5 with kv,n1ql,index services.
            2. Create a bucket with 10000 documents (non-json, with xattr, starting with prefix "_txn:")

            ./cbworkloadgen -n 127.0.0.1:9000 -i 10000 -u Administrator -p password --prefix "_txn:" --xattr
            

            3. Create a few indexes without replica.
            4. Upgrade 1 node with CB Server 7.0.4 using swap rebalance.
            5. Upgrade 2nd node with CB Server 7.0.4 using swap rebalance.
            6. Index build during rebalance hangs.
            7. With CB Server with the fix, it should work fine.

            deepkaran.salooja Deepkaran Salooja added a comment - - edited Steps to reproduce: 1. Setup a 3 node cluster with CB Server 6.6.5 with kv,n1ql,index services. 2. Create a bucket with 10000 documents (non-json, with xattr, starting with prefix "_txn:") ./cbworkloadgen -n 127.0.0.1:9000 -i 10000 -u Administrator -p password --prefix "_txn:" --xattr 3. Create a few indexes without replica. 4. Upgrade 1 node with CB Server 7.0.4 using swap rebalance. 5. Upgrade 2nd node with CB Server 7.0.4 using swap rebalance. 6. Index build during rebalance hangs. 7. With CB Server with the fix, it should work fine.

            Build couchbase-server-7.5.0-3371 contains indexing commit 0494f06 with commit message:
            MB-54681 Fix transaction record skipping at projector

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.5.0-3371 contains indexing commit 0494f06 with commit message: MB-54681 Fix transaction record skipping at projector

            Build couchbase-server-8.0.0-1185 contains indexing commit 0494f06 with commit message:
            MB-54681 Fix transaction record skipping at projector

            build-team Couchbase Build Team added a comment - Build couchbase-server-8.0.0-1185 contains indexing commit 0494f06 with commit message: MB-54681 Fix transaction record skipping at projector

            People

              deepkaran.salooja Deepkaran Salooja
              deepkaran.salooja Deepkaran Salooja
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty