Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38232

[BP 6.5.1] [System Test] : index building for a partitioned primary index is stuck on 1 indexer node

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.5.0
    • 6.5.1
    • secondary-index

    Description

      Build : 7.0.0-1210
      Test : -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml
      Scale : 3
      Day : 3

      Seeing that building of a partitioned primary index on CUSTOMER bucket has been stuck at 100% on indexer node 172.23.96.223.

      Some interesting stats for this index :
      "CUSTOMER:#primary:avg_drain_rate":0,
      "CUSTOMER:#primary:avg_item_size":0,
      "CUSTOMER:#primary:build_progress":100,
      "CUSTOMER:#primary:items_count":0,
      "CUSTOMER:#primary:num_docs_indexed":0,
      "CUSTOMER:#primary:num_docs_pending":0,
      "CUSTOMER:mutation_queue_size":0,
      "CUSTOMER:num_mutations_queued":10462620,

      On the other indexer node - 172.23.96.216, where the build progress is also 100% for this index, the same stats have the following values.
      "CUSTOMER:#primary:avg_drain_rate":0,
      "CUSTOMER:#primary:avg_item_size":7,
      "CUSTOMER:#primary:build_progress":100,
      "CUSTOMER:#primary:items_count":148215,
      "CUSTOMER:#primary:num_docs_indexed":148215,
      "CUSTOMER:#primary:num_docs_pending":0,
      "CUSTOMER:mutation_queue_size":0,
      "CUSTOMER:num_mutations_queued":11830267,

      Index creation was initiated at 2020-01-27T03:11:45

      [2020-01-27T03:11:45-08:00, sequoiatools/cbq:ba233f] -e=http://172.23.97.148:8093 -u=Administrator -p=password -script=create primary index on `CUSTOMER` partition by hash(rating,result,claim) using GSI with {"num_partition":4}
      

      The indexer logs on 172.23.96.223 shows some errors before and after the index creation time :

      2020-01-27T02:53:36.556-08:00 [Error] feed.DcpGetSeqnos(): read tcp 172.23.96.223:43674->172.23.96.14:11210: read: connection reset by peer
      2020-01-27T02:53:54.480-08:00 [Error] feed.DcpGetSeqnos(): read tcp 172.23.96.223:43728->172.23.96.14:11210: read: connection reset by peer
      2020-01-27T02:54:23.427-08:00 [Error] feed.DcpGetSeqnos(): EOF
      2020-01-27T02:54:54.467-08:00 [Error] DATP[->dataport ":9105"] worker "172.23.96.14:57938" exit: EOF
      2020-01-27T02:54:54.467-08:00 [Error] DATP[->dataport ":9105"] remote "172.23.96.14:57938" closed
      2020-01-27T02:54:59.550-08:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.96.216:43484.  Error = EOF. Kill Pipe.
      2020-01-27T03:12:04.432-08:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.96.216:9100.  Error = read tcp 172.23.96.223:45848->172.23.96.216:9100: use of closed network connection. Kill Pipe.
      2020-01-27T03:12:04.432-08:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.96.223:9100.  Error = read tcp 172.23.96.223:59288->172.23.96.223:9100: use of closed network connection. Kill Pipe.
      2020-01-27T03:12:04.432-08:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.96.223:59288.  Error = EOF. Kill Pipe.
      2020-01-27T03:12:06.958-08:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.96.216:43732.  Error = EOF. Kill Pipe.
      2020-01-27T03:13:06.827-08:00 [Error] DATP[->dataport ":9103"] Accept() Error: accept tcp [::]:9103: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.122:55304: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.122:55304" exit: read tcp 172.23.96.223:9103->172.23.96.122:55304: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.190:34134: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.190:34134" exit: read tcp 172.23.96.223:9103->172.23.96.190:34134: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.191:33560: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.191:33560" exit: read tcp 172.23.96.223:9103->172.23.96.191:33560: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.97.74:46710: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.97.74:46710" exit: read tcp 172.23.96.223:9103->172.23.97.74:46710: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.254:36816: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.254:36816" exit: read tcp 172.23.96.223:9103->172.23.96.254:36816: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.18:53666: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.18:53666" exit: read tcp 172.23.96.223:9103->172.23.96.18:53666: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.183:44420: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.183:44420" exit: read tcp 172.23.96.223:9103->172.23.96.183:44420: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.207:45080: use of closed network connection
      2020-01-27T03:13:06.828-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.207:45080" exit: read tcp 172.23.96.223:9103->172.23.96.207:45080: use of closed network connection
      2020-01-27T03:13:06.829-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.209:39492: use of closed network connection
      2020-01-27T03:13:06.829-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.209:39492" exit: read tcp 172.23.96.223:9103->172.23.96.209:39492: use of closed network connection
      2020-01-27T03:13:06.829-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.210:47340: use of closed network connection
      2020-01-27T03:13:06.829-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.210:47340" exit: read tcp 172.23.96.223:9103->172.23.96.210:47340: use of closed network connection
      2020-01-27T03:13:06.829-08:00 [Error] receiving packet: read tcp 172.23.96.223:9103->172.23.96.212:41140: use of closed network connection
      2020-01-27T03:13:06.829-08:00 [Error] DATP[->dataport ":9103"] worker "172.23.96.212:41140" exit: read tcp 172.23.96.223:9103->172.23.96.212:41140: use of closed network connection
      2020-01-27T03:13:07.818-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.18:9999 Topic INIT_STREAM_TOPIC_25becdd68259f2ef88ec15e3bb31eeed Buckets [CUSTOMER]. Err genServer.closed
      2020-01-27T03:13:07.818-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.18:9999
      2020-01-27T03:13:07.845-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.190:9999 Topic INIT_STREAM_TOPIC_25becdd68259f2ef88ec15e3bb31eeed Buckets [CUSTOMER]. Err genServer.closed
      2020-01-27T03:13:07.845-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.190:9999
      2020-01-27T03:13:07.861-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.191:9999 Topic INIT_STREAM_TOPIC_25becdd68259f2ef88ec15e3bb31eeed Buckets [CUSTOMER]. Err genServer.closed
      2020-01-27T03:13:07.861-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.191:9999
      2020-01-27T03:13:07.908-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.96.210:9999 Topic INIT_STREAM_TOPIC_25becdd68259f2ef88ec15e3bb31eeed Buckets [CUSTOMER]. Err genServer.closed
      2020-01-27T03:13:07.908-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.96.210:9999
      2020-01-27T03:13:07.952-08:00 [Error] KVSender::sendDelBucketsRequest Unexpected Error During Del Buckets Request Projector 172.23.97.74:9999 Topic INIT_STREAM_TOPIC_25becdd68259f2ef88ec15e3bb31eeed Buckets [CUSTOMER]. Err genServer.closed
      2020-01-27T03:13:07.952-08:00 [Error] KVSender::deleteBucketsFromStream INIT_STREAM CUSTOMER Error Received genServer.closed from 172.23.97.74:9999
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-38232
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-6.5.1-6249 contains indexing commit eb93926 with commit message:
            MB-38232 [BP 6.5.1] use sessionId in sync message

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6249 contains indexing commit eb93926 with commit message: MB-38232 [BP 6.5.1] use sessionId in sync message

            Build couchbase-server-1006.5.1-1086 contains indexing commit eb93926 with commit message:
            MB-38232 [BP 6.5.1] use sessionId in sync message

            build-team Couchbase Build Team added a comment - Build couchbase-server-1006.5.1-1086 contains indexing commit eb93926 with commit message: MB-38232 [BP 6.5.1] use sessionId in sync message

            Closing this based on system test with 6.5.1-6284 which has run for 3days 12hrs.

            girish.benakappa Girish Benakappa added a comment - Closing this based on system test with 6.5.1-6284 which has run for 3days 12hrs.

            Build couchbase-server-1006.5.1-1125 contains indexing commit eb93926 with commit message:
            MB-38232 [BP 6.5.1] use sessionId in sync message

            build-team Couchbase Build Team added a comment - Build couchbase-server-1006.5.1-1125 contains indexing commit eb93926 with commit message: MB-38232 [BP 6.5.1] use sessionId in sync message

            People

              girish.benakappa Girish Benakappa
              deepkaran.salooja Deepkaran Salooja
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty