Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19525

Bulk Load Failed on Watson Beta

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • 4.5.0
    • 4.5.0
    • couchbase-bucket
    • None
    • Untriaged
    • Unknown

    Description

      Logs here:
      https://s3.amazonaws.com/cb-engineering/davidH/collectinfo-2016-05-05T105949-ns_1%40dhaikney-server-1.c.cb-googbench-101.internal.zip
      https://s3.amazonaws.com/cb-engineering/davidH/collectinfo-2016-05-05T105949-ns_1%40dhaikney-server-2.c.cb-googbench-101.internal.zip
      https://s3.amazonaws.com/cb-engineering/davidH/collectinfo-2016-05-05T105949-ns_1%40dhaikney-server-3.c.cb-googbench-101.internal.zip
      https://s3.amazonaws.com/cb-engineering/davidH/collectinfo-2016-05-05T105949-ns_1%40dhaikney-server-4.c.cb-googbench-101.internal.zip

      4 node cluster, Watson Beta, Ubuntu-14, Running on Google Compute Engine. Full Eviction, 1 replica.
      Loading 100M x 512 byte docs using the following pillowfight invocation:

      cbc-pillowfight -U couchbase://dhaikney-server-1/charlie -I 102400000 -m 512 -M 512 -t 16 -B 10 --sequential --rate-limit 10000
      

      Ops started steady (~100K inserts/sec) but then at around 60M docs, the performance cratered and the rate dropped considerably. I initially suspected a client problem but the memcached logs were rife with warnings of slow ops (e.g.):

      2016-05-05T11:02:39.650695Z WARNING 117: Slow SET operation on connection: 679 ms ([ 10.240.0.10:59579 - 10.240.0.2:11210 ])
      2016-05-05T11:02:41.221269Z WARNING 117: Slow SET operation on connection: 1550 ms ([ 10.240.0.10:59579 - 10.240.0.2:11210 ])
      2016-05-05T11:02:41.247627Z WARNING 119: Slow SET operation on connection: 1466 ms ([ 10.240.0.10:59584 - 10.240.0.2:11210 ])
      2016-05-05T11:02:41.270983Z WARNING 101: Slow SET operation on connection: 2010 ms ([ 10.240.0.10:59629 - 10.240.0.2:11210 ])
      

      • Data set was 100% resident and well below the Low Water Mark. DWQ and DCP queue looked healthy.
      • A quick profile didn't show any obvious CPU bottlenecks (memcached threads were all comfortably under 100%).
      • The workload didn't recover and eventually had to be killed. Restarting with a lighter workload seemed OK (logs captured whilst subsequent workload was running).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dhaikney David Haikney (Inactive)
            dhaikney David Haikney (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty