Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32456

panic in go_xdcr.log while running high bucket density test

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.0.0
    • 6.5.0
    • XDCR
    • Untriaged
    • No

    Description

      Build 6.0.0-1693

      While running high bucket density tests, in one of the run we observed following panic.
      Note that this panic occurred only once and rerun on same test does not reproduced it.
      Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/231
      Logs- https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-231/172.23.97.12.zip

      Panic-

      2019-01-01T22:56:33.944-08:00 ERRO GOXDCR.XmemNozzle: xmem_de729116c99d5a3d3552c8e4deff1d21/bucket-19/bucket-19_172.23.96.16:11210_1 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=9634587, msg: </ud>
      2019-01-01T22:56:33.944-08:00 INFO GOXDCR.StatsMgr: de729116c99d5a3d3552c8e4deff1d21/bucket-10/bucket-10-623320309 message
      2019-01-01T22:56:33.944-08:00 INFO GOXDCR.ReplMgr: checkReplicationStatus exited
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x71a578]
       
      goroutine 54 [running]:
      expvar.(*Map).Get(0x0, 0xa3b4a2, 0xe, 0x0, 0x0)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.8.5/go/src/expvar/expvar.go:150 +0x38
      github.com/couchbase/goxdcr/pipeline_svc.updateStatsForReplication(0xc42094aa80, 0xc4a9fb16e0, 0xcc9a20, 0xc420146e00, 0xc420132ed0, 0xc4201503c0, 0xcd27c0, 0xc420172028, 0x3, 0x0)
              goproj/src/github.com/couchbase/goxdcr/pipeline_svc/statistics_manager.go:1347 +0x173
      github.com/couchbase/goxdcr/pipeline_svc.UpdateStats(0xcc47a0, 0xc42014e9e0, 0xccba80, 0xc420166930, 0xcc9a20, 0xc420146e00, 0xc420132ed0, 0xc4201503c0, 0xcd27c0, 0xc420172028)
              goproj/src/github.com/couchbase/goxdcr/pipeline_svc/statistics_manager.go:1248 +0x34f
      github.com/couchbase/goxdcr/replication_manager.(*replicationManager).checkReplicationStatus(0xcfeae0, 0xc420071110)
              goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:358 +0x312
      created by github.com/couchbase/goxdcr/replication_manager.StartReplicationManager.func1
              goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:166 +0x4ae
      [goport(/opt/couchbase/bin/goxdcr)] 2019/01/01 22:56:34 child process exited with status 2
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          yu Yu Sui (Inactive) added a comment - - edited

          The root cause is a nil replication overview stats introduced by replication stop sequence. There was "xmem is stuck" errors due to MB-31764, which could have been a triggering action. I am still trying to figure out how this happened. It does not look like a regression in stats manager itself, though.

          yu Yu Sui (Inactive) added a comment - - edited The root cause is a nil replication overview stats introduced by replication stop sequence. There was "xmem is stuck" errors due to MB-31764 , which could have been a triggering action. I am still trying to figure out how this happened. It does not look like a regression in stats manager itself, though.
          lynn.straus Lynn Straus added a comment -

          Please assess for Mad Hatter.

          lynn.straus Lynn Straus added a comment - Please assess for Mad Hatter.

          Lynn Straus This is fixed in MAd-Hatter. Did you mean it should be backported to Alice?

          yu Yu Sui (Inactive) added a comment - Lynn Straus This is fixed in MAd-Hatter. Did you mean it should be backported to Alice?

          Build couchbase-server-6.5.0-2069 contains goxdcr commit 4f72ae4 with commit message:
          MB-32456 fix panic caused by nil overviewStats

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-2069 contains goxdcr commit 4f72ae4 with commit message: MB-32456 fix panic caused by nil overviewStats
          lynn.straus Lynn Straus added a comment -

          Yu Sui,  thanks for the update.  Fixed in Mad Hatter is fine.  Thanks.

          lynn.straus Lynn Straus added a comment - Yu Sui ,  thanks for the update.  Fixed in Mad Hatter is fine.  Thanks.

          Mahesh Mandhare can you please close this one out?

          arunkumar Arunkumar Senthilnathan added a comment - Mahesh Mandhare can you please close this one out?

          Build 6.5.0-4558

          In latest run we did not see this failure.

          Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/325

          mahesh.mandhare Mahesh Mandhare (Inactive) added a comment - Build 6.5.0-4558 In latest run we did not see this failure. Job-  http://perf.jenkins.couchbase.com/job/arke-multi-bucket/325

          People

            mahesh.mandhare Mahesh Mandhare (Inactive)
            mahesh.mandhare Mahesh Mandhare (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty