Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60291

[n1ql][upgrade] online upgrade with failover is failing due to panics

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.6.0
    • 7.6.0
    • query
    • None
    • 7.6.0-1980
    • Untriaged
    • 0
    • Unknown

    Description

      _emphasized text_I am not quite sure why it is panicing, but our upgrade test panics only during this upgrade type (online upgrade using failovers). Meaning we failover a server, upgrade it while it is failed over, then recover the node
      4 nodes-
      172.23.217.148:8091 => {'services': ['index', 'kv', 'n1ql']
      172.23.217.149:8091 => {'services': ['fts', 'index', 'kv', 'n1ql'],
      172.23.217.150:8091 => {'services': ['index', 'kv', 'n1ql'],
      172.23.217.151:8091 => {'services': ['index', 'kv', 'n1ql'],

      first node that is upgraded is .149 (then mixed mode testing takes place) -
      2024-01-05 15:27:58,570 - root - INFO - Failing over 172.23.217.149:8091 with graceful=False
      2024-01-05 15:33:42,878 - root - INFO - rebalancing was completed with progress: 100% in 90.16561341285706 sec
      2024-01-05 15:33:42,878 - root - INFO - upgraded 1 servers: [ip:172.23.217.149 port:8091 ssh_username:root]

      then .151 is upgraded-
      2024-01-05 15:34:30,966 - root - INFO - Failing over 172.23.217.151:8091 with graceful=False
      2024-01-05 15:40:15,858 - root - INFO - rebalancing was completed with progress: 100% in 90.23029613494873 sec

      then .150 is upgraded-
      2024-01-05 15:40:16,913 - root - INFO - Failing over 172.23.217.150:8091 with graceful=False
      2024-01-05 15:46:02,700 - root - INFO - rebalancing was completed with progress: 100% in 90.2540225982666 sec

      and finally .148 is upgraded-
      2024-01-05 15:46:03,754 - root - INFO - Failing over 172.23.217.148:8091 with graceful=False
      2024-01-05 15:52:06,830 - root - INFO - rebalancing was completed with progress: 100% in 111.37833738327026 sec
      2024-01-05 15:52:06,831 - root - INFO - successfully upgraded 3 remaining servers: [ip:172.23.217.151 port:8091 ssh_username:root, ip:172.23.217.150 port:8091 ssh_username:root, ip:172.23.217.148 port:8091 ssh_username:root]

      here is the console log (no obvious panics take place from inspecting the console logs)

      http://qa.sc.couchbase.com/job/test_suite_executor/660280/console

      it is important to note that the other upgrade paths that we test are not seeing this panic

      • offline upgrade, online upgrade via swap rebalance, online upgrade via rebalance. I am not sure where in the test the panic is being introduced

      before upgrade - we create UDFs and some cbo stats
      mixed mode - we run various tests including using the above udfs and cbo stats
      fully upgraded - we run various tests including using the udfs and cbo stats from pre upgrade

      and logs from each node will be attached

      Let me know if more info is required here, I am hoping there is something in the logs that points to what is going wrong exactly

      we see 10 panics according to our test, here is an example:
      Stack:

      2024-01-05T15:50:29.852-08:00 [INFO] n1fty: NewFTSIndexer2, server: http://127.0.0.1:8091, namespace: default, bucket: N1QL_SYSTEM_BUCKET, scope: N1QL_SYSTEM_SCOPE, keyspace: N1QL_CBO_STATS
      2024-01-05T15:50:29.851-08:00 [Info] GSIC[default/N1QL_SYSTEM_BUCKET-N1QL_SYSTEM_SCOPE-N1QL_CBO_STATS-1704498629842454637] started ...
      2024-01-05T15:50:29.851-08:00 [Info] Receive security change notification. encryption=false
      2024-01-05T15:50:29.852-08:00 [Info] Certificate refreshed successfully with certFile /opt/couchbase/var/lib/couchbase/config/certs/chain.pem, keyFile /opt/couchbase/var/lib/couchbase/config/certs/pkey.pem, caFile /opt/couchbase/var/lib/couchbase/config/certs/ca.pem
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x2a3dfb5]
       
       
      goroutine 1 [running]:
      github.com/couchbase/query/datastore/couchbase.cbAuthorize({0x3a1d820?, 0xc0000524b0?}, 0x18?, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/datastore/couchbase/auth.go:263 +0x35
      github.com/couchbase/query/datastore/couchbase.(*store).Authorize(0x0?, 0x11e0c00?, 0xc00229f960?)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/datastore/couchbase/couchbase.go:483 +0x2c
      github.com/couchbase/query/planner.seqScanAuth({0xc00236c3c0?, 0xc000134d20?}, 0x4?)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/planner/build_scan.go:894 +0x2a8
      github.com/couchbase/query/planner.allIndexes({0x3a45ab0, 0xc001ccea00}, {0x0, 0x0, 0x70?}, {0x0, 0x0, 0x0?}, 0xc00229fbf8?, 0x0, ...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/planner/build_scan.go:843 +0x474
      github.com/couchbase/query/planner.(*builder).buildPredicateScan(0xc0020e3600, {0x3a45ab0, 0xc001ccea00}, 0xc00056ac60, 0xc0013d9900, {0x3a54680?, 0xc002382c80}, {0x0, 0x0, 0x0}, ...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/planner/build_scan.go:229 +0x6a5
      github.com/couchbase/query/planner.(*builder).buildScan(0xc0020e3600, {0x3a45ab0, 0xc001ccea00}, 0xc00056ac60)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/planner/build_scan.go:166 +0xd45
      github.com/couchbase/query/planner.(*builder).selectScan(0xc0020e3600, {0x3a45ab0?, 0xc001ccea00?}, 0xc00056ac60, 0xff?)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/planner/build_scan.go:61 +0x2da
      github.com/couchbase/query/planner.(*builder).VisitKeyspaceTerm(0xc0020e3600, 0xc00056ac60) 

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ajay.bhullar Ajay Bhullar
              ajay.bhullar Ajay Bhullar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty