Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14948

GoXDCR: DCP returns TMP_FAIL to goxdcr after warmup until load stops

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • 4.0.0
    • 4.0.0
    • couchbase-bucket, DCP
    • Security Level: Public
    • None
    • centOS 6.x

    Description

      Build


      4.0.0-2109

      Found during manual testing.

      • XDCR from .186 to .188.
      • bucket : default (41.7% active resident ratio)
      • I was performing some random manual testing on goxdcr when beam.smp crashed because .186 was out of memory. I added 4gb ram to the same node and restarted server. The node had close to 8.5M docs before crash.
      • After the node came up, I started loading. However dcp did not return any keys to goxdcr. outbound mutations kept on increasing, so was the count of ep_dcp_xdcr_items_remaining.
      • I then stopped load when curr_items = ~2.15M (around 2015-05-12T15:54:37 as you can see from goxdcr.log). Immediately dcp started sending keys to goxdcr and replication resumed as you can see from the attached screenshot.

      memcached.log

      I see some dcp streams in backfill state, some in-memory state in memcached.log.

      :
      2015-05-12T15:49:48.460318-07:00 WARNING (default) metadata loaded in 163 s
      2015-05-12T15:51:43.015904-07:00 WARNING (default) Access Scanner task enabled
      2015-05-12T15:51:43.016033-07:00 WARNING (default) warmup completed in 277 s
      

      warmup stats for .186 in stats.log

      default
       
       ep_warmup:                       enabled
       ep_warmup_dups:                  0
       ep_warmup_estimate_time:         41058
       ep_warmup_estimated_key_count:   872373
       ep_warmup_estimated_value_count: 872373
       ep_warmup_item_expired:          0
       ep_warmup_key_count:             872373
       ep_warmup_keys_time:             163041924
       ep_warmup_min_item_threshold:    100
       ep_warmup_min_memory_threshold:  100
       ep_warmup_oom:                   0
       ep_warmup_state:                 done
       ep_warmup_thread:                complete
       ep_warmup_time:                  277597376
       ep_warmup_value_count:           872373
      

      in goxdcr.log

      XmemNozzle 2015-05-12T15:48:08.706-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 0 35 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:48:08.706-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:48:08.707-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 0 37 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:48:08.707-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 0 38 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:48:08.707-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 0 39 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:48:08.707-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0]
      :
      :
      :
      XmemNozzle 2015-05-12T15:54:37.162-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 1 184 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:54:37.162-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 1 185 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:54:37.162-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 1 186 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:54:37.162-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 1 187 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:54:37.162-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 1 188 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:54:37.162-07:00 [ERROR] xmem_b0a4b2ca4dbe46ff9c9a299b9d21cc19/default/default_10.3.4.188:11210_0 Received recoverable error in response. Response status=TMPFAIL, err = <nil>, response=[129 162 0 0 0 0 0 134 0 0 0 0 0 0 1 189 0 0 0 0 0 0 0 0]
      XmemNozzle 2015-05-12T15:54:37.172-07:00 [INFO] Expected 500 response, got all
      XmemNozzle 2015-05-12T15:54:37.175-07:00 [ERROR] received=26500, sent=1559 data buffered=24942
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty