Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43413

[XDCR] panic observed in longevity

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.0.5
    • 6.0.5
    • XDCR
    • 1

    Description

      6.0.5-3333

      test: -test tests/integration/test_allFeatures_alice_timers.yml -scope tests/integration/scope_Xattrs_Alice.yml

      18th iteration

      Following panic observed in goxdcr (.103):

      2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 processData exits
      2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0: startUprStreams for [9 10 11 12 13 14 15 16 17 18 19 94 95 104 105 106 107 108 109 110 113 187 188 189 190 197 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 217 218 219 220 221 222]...
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x83982e]
       
      goroutine 12049313 [running]:
      github.com/couchbase/goxdcr/parts.(*dcpStreamReqHelper).getNewVersion(0x0, 0xc424693b60)
      	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:107 +0x3e
      github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStream(0xc42d3b9680, 0x1000d, 0xc42ccfc8d0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:1039 +0x67
      github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStreams_internal(0xc42d3b9680, 0xc42ae18000, 0x2f, 0x2f, 0x2f, 0x2f)
      	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:994 +0xec
      github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStreams(0xc42d3b9680, 0x0, 0x0)
      	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:954 +0x35b
      created by github.com/couchbase/goxdcr/parts.(*DcpNozzle).Start
      	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:579 +0x47b
      [goport(/opt/couchbase/bin/goxdcr)] 2020/12/22 07:11:18 child process exited with status 2
      2020-12-22T07:11:18.526-08:00 INFO GOXDCR.ReplMgr: GOMAXPROCS=4
      2020-12-22T07:11:18.947-08:00 INFO GOXDCR.AuditSvc: Created audit service.
      2020-12-22T07:11:18.947-08:00 INFO GOXDCR.UILogSvc: Created ui log service.
      

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.97.242.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.104.137.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.99.25.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.123.28.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.104.70.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.106.188.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.99.11.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.96.253.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.108.103.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.96.148.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.97.119.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.98.135.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.97.239.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.106.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.104.69.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.121.3.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.104.67.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.99.21.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.99.20.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.96.95.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.97.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.97.121.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1608650875/collectinfo-2020-12-22T152757-ns_1%40172.23.104.5.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            From .103 diag.log:

            2020-12-22T07:11:18.309-08:00, ns_log:0:info:message(ns_1@172.23.108.103) - Service 'goxdcr' exited with status 2. Restarting. Messages:
            panic: runtime error: invalid memory address or nil pointer dereference
            [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x83982e]
             
            goroutine 12049313 [running]:
            github.com/couchbase/goxdcr/parts.(*dcpStreamReqHelper).getNewVersion(0x0, 0xc424693b60)
            	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:107 +0x3e
            github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStream(0xc42d3b9680, 0x1000d, 0xc42ccfc8d0, 0x0, 0x0)
            	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:1039 +0x67
            github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStreams_internal(0xc42d3b9680, 0xc42ae18000, 0x2f, 0x2f, 0x2f, 0x2f)
            	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:994 +0xec
            github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStreams(0xc42d3b9680, 0x0, 0x0)
            	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:954 +0x35b
            created by github.com/couchbase/goxdcr/parts.(*DcpNozzle).Start
            	goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:579 +0x47b
            [goport(/opt/couchbase/bin/goxdcr)] 2020/12/22 07:11:18 child process exited with status 2
            

            arunkumar Arunkumar Senthilnathan added a comment - From .103 diag.log: 2020-12-22T07:11:18.309-08:00, ns_log:0:info:message(ns_1@172.23.108.103) - Service 'goxdcr' exited with status 2. Restarting. Messages: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x83982e]   goroutine 12049313 [running]: github.com/couchbase/goxdcr/parts.(*dcpStreamReqHelper).getNewVersion(0x0, 0xc424693b60) goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:107 +0x3e github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStream(0xc42d3b9680, 0x1000d, 0xc42ccfc8d0, 0x0, 0x0) goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:1039 +0x67 github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStreams_internal(0xc42d3b9680, 0xc42ae18000, 0x2f, 0x2f, 0x2f, 0x2f) goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:994 +0xec github.com/couchbase/goxdcr/parts.(*DcpNozzle).startUprStreams(0xc42d3b9680, 0x0, 0x0) goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:954 +0x35b created by github.com/couchbase/goxdcr/parts.(*DcpNozzle).Start goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go:579 +0x47b [goport(/opt/couchbase/bin/goxdcr)] 2020/12/22 07:11:18 child process exited with status 2
            neil.huang Neil Huang added a comment -

            From the logs it looks like there is concurrent start/stop going on in a busy system:

            2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 has been initialized
            2020-12-22T07:11:18.197-08:00 ERRO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 failed to start. err=Can't move to state 2 - dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0's current state is Stopped, can only move to state []
            2020-12-22T07:11:18.197-08:00 ERRO GOXDCR.GenericPipeline: Pipeline c1d2b29289f41dde31254f98ff83b136/default/remote failed to start, err=dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 : Can't move to state 2 - dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0's current state is Stopped, can only move to state []
            2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 processData starts..........
            2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 processData exits
            2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0: startUprStreams for [9 10 11 12 13 14 15 16 17 18 19 94 95 104 105 106 107 108 109 110 113 187 188 189 190 197 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 217 218 219 220 221 222]...
            panic: runtime error: invalid memory address or nil pointer dereference
            [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x83982e]
            

            There is a window that could potentially result in this situation. Since Start/Stop is not completely synchronized and can occur independently, this is theoretically possible since 5.0 and is now hit. Synchronization primitives can be added to prevent this very small window from happening again.

            neil.huang Neil Huang added a comment - From the logs it looks like there is concurrent start/stop going on in a busy system: 2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 has been initialized 2020-12-22T07:11:18.197-08:00 ERRO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 failed to start. err=Can't move to state 2 - dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0's current state is Stopped, can only move to state [] 2020-12-22T07:11:18.197-08:00 ERRO GOXDCR.GenericPipeline: Pipeline c1d2b29289f41dde31254f98ff83b136/default/remote failed to start, err=dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 : Can't move to state 2 - dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0's current state is Stopped, can only move to state [] 2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 processData starts.......... 2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0 processData exits 2020-12-22T07:11:18.197-08:00 INFO GOXDCR.DcpNozzle: dcp_c1d2b29289f41dde31254f98ff83b136/default/remote_172.23.108.103:11210_0: startUprStreams for [9 10 11 12 13 14 15 16 17 18 19 94 95 104 105 106 107 108 109 110 113 187 188 189 190 197 198 199 200 201 202 203 204 205 207 208 209 210 211 212 213 217 218 219 220 221 222]... panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x83982e] There is a window that could potentially result in this situation. Since Start/Stop is not completely synchronized and can occur independently, this is theoretically possible since 5.0 and is now hit. Synchronization primitives can be added to prevent this very small window from happening again.
            neil.huang Neil Huang added a comment -

            Actually, I spoke too soon. This is caused by MB-32780, which was fixed in 6.5.0 and not backported to 6.0

            neil.huang Neil Huang added a comment - Actually, I spoke too soon. This is caused by MB-32780 , which was fixed in 6.5.0 and not backported to 6.0
            arunkumar Arunkumar Senthilnathan added a comment - Tracked as MB-43416

            People

              neil.huang Neil Huang
              arunkumar Arunkumar Senthilnathan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty