Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43259

[System Test] : index building for a partitioned primary index is stuck

    XMLWordPrintable

Details

    Description

      Build : 6.6.1-9198
      Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
      Scale : 3
      Iteration : 2nd (Day 1)

      This seems to be the same issue as MB-37689 / MB-38232.

      Partitioned primary index on CUSTOMER bucket is stuck at 100%

      [2020-11-29T18:13:55-08:00, sequoiatools/cbq:8e2e15] -e=http://172.23.96.252:8093 -u=Administrator -p=password -script=create primary index on `CUSTOMER` partition by hash(rating,result,claim) using GSI with {"num_partition":4}
      

      This index is on nodes 172.23.104.69, 172.23.96.253 & 172.23.99.11

      Some errors in the indexer logs on 172.23.104.69 around the time of index creation:

      [root@cnt7-qe1 logs]# zgrep "2020-11-29T18:1" indexer.log* | grep CUSTOMER | grep "\[Error"
      indexer.log.2.gz:2020-11-29T18:14:25.162-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.104.5:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.162-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.104.5:9999
      indexer.log.2.gz:2020-11-29T18:14:25.176-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.108.103:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.176-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.108.103:9999
      indexer.log.2.gz:2020-11-29T18:14:25.180-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.148:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.181-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.148:9999
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.119:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.97.119:9999
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.121:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err projector.topicMissing
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received projector.topicMissing from 172.23.97.121:9999
      indexer.log.2.gz:2020-11-29T18:14:25.199-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.20:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.199-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.99.20:9999
      indexer.log.2.gz:2020-11-29T18:14:25.204-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.21:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err projector.topicMissing
      indexer.log.2.gz:2020-11-29T18:14:25.204-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received projector.topicMissing from 172.23.99.21:9999
      indexer.log.2.gz:2020-11-29T18:14:25.210-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.25:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.210-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.99.25:9999
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            With the toy build:
            http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/11901/

            which has additional debug information:
            http://review.couchbase.org/c/indexing/+/142036
            http://review.couchbase.org/c/indexing/+/141770/

            system test ran into the issue. Logs:

            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.137.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.155.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.157.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.67.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.69.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.70.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.106.188.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.108.103.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.120.245.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.121.3.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.123.27.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.148.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.251.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.252.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.253.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.95.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.119.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.121.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.122.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.239.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.242.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.98.135.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.11.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.21.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.25.zip

            deepkaran.salooja Deepkaran Salooja added a comment - With the toy build: http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/11901/ which has additional debug information: http://review.couchbase.org/c/indexing/+/142036 http://review.couchbase.org/c/indexing/+/141770/ system test ran into the issue. Logs: url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.137.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.155.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.157.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.67.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.69.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.70.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.106.188.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.108.103.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.120.245.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.121.3.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.123.27.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.148.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.251.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.252.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.253.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.95.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.119.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.121.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.122.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.239.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.242.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.98.135.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.11.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.21.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.25.zip

            Indexer node 172.23.104.137 records the exception:

            2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_
            STREAM. New seqno 284346 vbuuid 71426468951228.  Current Seqno 284317 vbuuid 71426468951228
            2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_
            STREAM. New seqno 284347 vbuuid 71426468951228.  Current Seqno 284346 vbuuid 71426468951228
             

            The exception can be traced back to projector node 172.23.97.119:

             
            2020-12-14T13:08:15.088-08:00 [Error] ENDP[<-(172.23.104.137:9105,3fba)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_77bdf73514f4a2338b5ced2c8a331a08]
            seq order violation for vb = default:410, seq = 284346, command = 2,
            orderState = {snapStart: 284274, snapEnd 284338, snapStarted true, prevSeq: 284317, prevSeqValid: true, errCount: 1},
            mutation = Docidx <ud>(12231D64-51_252392)</ud>, Seqno 284346, Ctime 0, Uuids [2407072873162967354 3243972289125088885 10810843671203773160], Commands [2 2 2]
            

            From the projector logs, it can be seen that the out of bound seqnum was generated within projector as there is no corresponding violation at DCPT.

            deepkaran.salooja Deepkaran Salooja added a comment - Indexer node 172.23.104.137 records the exception: 2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_ STREAM. New seqno 284346 vbuuid 71426468951228. Current Seqno 284317 vbuuid 71426468951228 2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_ STREAM. New seqno 284347 vbuuid 71426468951228. Current Seqno 284346 vbuuid 71426468951228 The exception can be traced back to projector node 172.23.97.119 :   2020-12-14T13:08:15.088-08:00 [Error] ENDP[<-(172.23.104.137:9105,3fba)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_77bdf73514f4a2338b5ced2c8a331a08] seq order violation for vb = default:410, seq = 284346, command = 2, orderState = {snapStart: 284274, snapEnd 284338, snapStarted true, prevSeq: 284317, prevSeqValid: true, errCount: 1}, mutation = Docidx <ud>(12231D64-51_252392)</ud>, Seqno 284346, Ctime 0, Uuids [2407072873162967354 3243972289125088885 10810843671203773160], Commands [2 2 2] From the projector logs, it can be seen that the out of bound seqnum was generated within projector as there is no corresponding violation at DCPT.

            Build couchbase-server-7.0.0-4237 contains indexing commit d7e592d with commit message:
            MB-43259 Stop mutation scatter before draining the mutch in kvdata

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4237 contains indexing commit d7e592d with commit message: MB-43259 Stop mutation scatter before draining the mutch in kvdata

            Closing this issue as the system test did 3.5 days with 6.6.2-9557, issue was not seen.

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Closing this issue as the system test did 3.5 days with 6.6.2-9557, issue was not seen.

            People

              varun.velamuri Varun Velamuri
              deepkaran.salooja Deepkaran Salooja
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty