Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28595

Analytics System Test: With 4 CB buckets and 200 ops/sec in each, analytics cluster goes into bad state and becomes too slow to access. Data ingestion also stopped.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 5.5.0
    • 5.5.0
    • analytics
    • Untriaged
    • Unknown
    • CX Sprint 94

    Description

      With 4 CB buckets and 200 ops/sec in each, analytics cluster goes into bad state and becomes too slow to access. Data ingestion also stopped.

      We ran system-test for 24 hrs. There are 4 KV buckets, and corresponding to each bucket, there is 1 CBAS bucket and a dataset which is non-filtered.

      This is a simple test where we keep on ingesting data and running cbas queries on all 4 datasets at 12 queries/sec for a set duration. This test doesn't involve any rebalance, bucket disconnection or failover.

      I started the run yesterday morning and the cluster was in the bad state yesterday evening. Was not able to see the bucket insights in analytics workbench.
      Although, the cluster is recovered today morning and data ingestion also stared sometime in-between but for one bucket data is mismatching with CB bucket.

      Docs in CB bucket: 823,410
      Docs in cbas dataset:
      [

      { "$1": 823363 }

      ]

      Attaching the cb-collect from yesterdays and today.

      The cluster is live if you want to debug :
      172.23.108.231 (kv)
      172.23.108.232 (cbas)
      172.23.108.233 (cbas)
      172.23.108.234 (kv)

      There are a lot of exceptions in the logs:
      analytics.log.2.gz:org.apache.asterix.common.exceptions.RuntimeDataException: ASX0023: 60.0s passed before getting back the responses from NCs

      analytics.log.1.gz:java.net.ConnectException: Connection refused
      analytics.log.1.gz:org.apache.hyracks.ipc.exceptions.IPCException: java.io.IOException: Connection failed to /172.23.108.233:9112
      analytics.log.1.gz:Caused by: java.io.IOException: Connection failed to /172.23.108.233:9112

      analytics.log.1.gz:org.apache.hyracks.api.exceptions.NetException: Socket Closed
      analytics.log.1.gz:org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException
      analytics.log.1.gz:Caused by: java.io.EOFException

      Yesterday's logs:
      s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.232.zip
      s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.233.zip
      s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.234.zip
      s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.231.zip

      Today's Logs:
      s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.232.zip
      s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.233.zip
      s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.234.zip
      s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.231.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty