Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43259

[System Test] : index building for a partitioned primary index is stuck

    XMLWordPrintable

    Details

      Description

      Build : 6.6.1-9198
      Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
      Scale : 3
      Iteration : 2nd (Day 1)

      This seems to be the same issue as MB-37689 / MB-38232.

      Partitioned primary index on CUSTOMER bucket is stuck at 100%

      [2020-11-29T18:13:55-08:00, sequoiatools/cbq:8e2e15] -e=http://172.23.96.252:8093 -u=Administrator -p=password -script=create primary index on `CUSTOMER` partition by hash(rating,result,claim) using GSI with {"num_partition":4}
      

      This index is on nodes 172.23.104.69, 172.23.96.253 & 172.23.99.11

      Some errors in the indexer logs on 172.23.104.69 around the time of index creation:

      [root@cnt7-qe1 logs]# zgrep "2020-11-29T18:1" indexer.log* | grep CUSTOMER | grep "\[Error"
      indexer.log.2.gz:2020-11-29T18:14:25.162-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.104.5:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.162-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.104.5:9999
      indexer.log.2.gz:2020-11-29T18:14:25.176-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.108.103:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.176-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.108.103:9999
      indexer.log.2.gz:2020-11-29T18:14:25.180-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.148:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.181-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.148:9999
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.119:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.97.119:9999
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.121:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err projector.topicMissing
      indexer.log.2.gz:2020-11-29T18:14:25.186-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received projector.topicMissing from 172.23.97.121:9999
      indexer.log.2.gz:2020-11-29T18:14:25.199-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.20:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.199-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.99.20:9999
      indexer.log.2.gz:2020-11-29T18:14:25.204-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.21:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err projector.topicMissing
      indexer.log.2.gz:2020-11-29T18:14:25.204-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received projector.topicMissing from 172.23.99.21:9999
      indexer.log.2.gz:2020-11-29T18:14:25.210-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.99.25:9999 Topic INIT_STREAM_TOPIC_7f3c605a3eeeb5a18047b3e58f9a8373 Buckets [CUSTOMER]. Err genServer.closed
      indexer.log.2.gz:2020-11-29T18:14:25.210-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.99.25:9999
      

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            deepkaran.salooja Deepkaran Salooja added a comment -

            With the toy build:
            http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/11901/

            which has additional debug information:
            http://review.couchbase.org/c/indexing/+/142036
            http://review.couchbase.org/c/indexing/+/141770/

            system test ran into the issue. Logs:

            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.137.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.155.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.157.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.67.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.69.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.70.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.106.188.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.108.103.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.120.245.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.121.3.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.123.27.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.148.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.251.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.252.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.253.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.95.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.119.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.121.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.122.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.239.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.242.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.98.135.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.11.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.21.zip
            url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.25.zip

            Show
            deepkaran.salooja Deepkaran Salooja added a comment - With the toy build: http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/11901/ which has additional debug information: http://review.couchbase.org/c/indexing/+/142036 http://review.couchbase.org/c/indexing/+/141770/ system test ran into the issue. Logs: url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.137.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.155.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.157.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.67.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.69.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.104.70.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.106.188.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.108.103.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.120.245.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.121.3.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.123.27.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.148.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.251.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.252.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.253.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.96.95.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.119.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.121.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.122.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.239.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.97.242.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.98.135.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.11.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.21.zip url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607981767/collectinfo-2020-12-14T213610-ns_1%40172.23.99.25.zip
            Hide
            deepkaran.salooja Deepkaran Salooja added a comment -

            Indexer node 172.23.104.137 records the exception:

            2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_
            STREAM. New seqno 284346 vbuuid 71426468951228.  Current Seqno 284317 vbuuid 71426468951228
            2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_
            STREAM. New seqno 284347 vbuuid 71426468951228.  Current Seqno 284346 vbuuid 71426468951228
             

            The exception can be traced back to projector node 172.23.97.119:

             
            2020-12-14T13:08:15.088-08:00 [Error] ENDP[<-(172.23.104.137:9105,3fba)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_77bdf73514f4a2338b5ced2c8a331a08]
            seq order violation for vb = default:410, seq = 284346, command = 2,
            orderState = {snapStart: 284274, snapEnd 284338, snapStarted true, prevSeq: 284317, prevSeqValid: true, errCount: 1},
            mutation = Docidx <ud>(12231D64-51_252392)</ud>, Seqno 284346, Ctime 0, Uuids [2407072873162967354 3243972289125088885 10810843671203773160], Commands [2 2 2]
            

            From the projector logs, it can be seen that the out of bound seqnum was generated within projector as there is no corresponding violation at DCPT.

            Show
            deepkaran.salooja Deepkaran Salooja added a comment - Indexer node 172.23.104.137 records the exception: 2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_ STREAM. New seqno 284346 vbuuid 71426468951228. Current Seqno 284317 vbuuid 71426468951228 2020-12-14T13:08:15.091-08:00 [Warn] MutationStreamReader::checkAndSetBucketFilter Out-of-bound Seqno. Snapshot 284274-284338 for vb 410 default MAINT_ STREAM. New seqno 284347 vbuuid 71426468951228. Current Seqno 284346 vbuuid 71426468951228 The exception can be traced back to projector node 172.23.97.119 :   2020-12-14T13:08:15.088-08:00 [Error] ENDP[<-(172.23.104.137:9105,3fba)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_77bdf73514f4a2338b5ced2c8a331a08] seq order violation for vb = default:410, seq = 284346, command = 2, orderState = {snapStart: 284274, snapEnd 284338, snapStarted true, prevSeq: 284317, prevSeqValid: true, errCount: 1}, mutation = Docidx <ud>(12231D64-51_252392)</ud>, Seqno 284346, Ctime 0, Uuids [2407072873162967354 3243972289125088885 10810843671203773160], Commands [2 2 2] From the projector logs, it can be seen that the out of bound seqnum was generated within projector as there is no corresponding violation at DCPT.
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-4237 contains indexing commit d7e592d with commit message:
            MB-43259 Stop mutation scatter before draining the mutch in kvdata

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4237 contains indexing commit d7e592d with commit message: MB-43259 Stop mutation scatter before draining the mutch in kvdata
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Closing this issue as the system test did 3.5 days with 6.6.2-9557, issue was not seen.

            Show
            mihir.kamdar Mihir Kamdar added a comment - Closing this issue as the system test did 3.5 days with 6.6.2-9557, issue was not seen.

              People

              Assignee:
              varun.velamuri Varun Velamuri
              Reporter:
              deepkaran.salooja Deepkaran Salooja
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty