Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44749

[System Test] : Plasma crash - plasmaSlice::handleCommandsWorker: panic detected while processing mutation for operation 1 key

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-4603
      Test : Longevity (-test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml)
      Scale : 2
      Iteration : 1st

      On 172.23.96.252, seeing the following errors leading to indexer crash while rebalance is going on. In-memory compression is not turned on in this test.

      2021-03-04T10:06:09.702-08:00 [Fatal] plasmaSlice::handleCommandsWorker: panic detected while processing mutation for operation 1 key = <ud>(^H^F_sync:unusedSeq:56555095^@^@^@)</ud> docid = <ud>(_sync:unusedSeq:56555095)</ud> Index sg_syncDocs_x1, Bucket default, IndexInstId 13320080993727085165, PartitionId 0
      ...
      ...
      2021-03-04T10:06:09.725-08:00 [Fatal] plasmaSlice::handleCommandsWorker: panic detected while processing mutation for operation 1 key = <ud>(^H^F_sync:unusedSeq:56555067^@^@^@)</ud> docid = <ud>(_sync:unusedSeq:56555067)</ud> Index sg_syncDocs_x1, Bucket default, IndexInstId 13320080993727085165, PartitionId 0
      2021-03-04T10:06:09.744-08:00 [Fatal] plasmaSlice::handleCommandsWorker: panic detected while processing mutation for operation 1 key = <ud>(^H^F_sync:unusedSeq:56555090^@^@^@)</ud> docid = <ud>(_sync:unusedSeq:56555090)</ud> Index sg_syncDocs_x1, Bucket default, IndexInstId 13320080993727085165, PartitionId 0
      2021-03-04T10:06:09.767-08:00 [Fatal] goroutine 176366914 [running]:
      github.com/couchbase/indexing/secondary/logging.(*destination).StackTraceAll(...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/logging/logging.go:175
      github.com/couchbase/indexing/secondary/logging.StackTraceAll(0x20ff5f0, 0x2120001)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/logging/logging.go:317 +0x6d
      github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).handleCommandsWorker.func1(0xc03c7aded0, 0xc012e01700)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:596 +0x424
      panic(0x10a1e00, 0xc03c787c70)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/panic.go:679 +0x1b2
      github.com/couchbase/plasma.(*Shard).raisePanic(...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/shard.go:1011
      github.com/couchbase/plasma.(*LSSCtx).raiseCorrupted(0xc0062c3040, 0x1488ee0, 0xc03c787c70)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/lssctx.go:179 +0x206
      github.com/couchbase/plasma.(*Plasma).fatalPanic(0xc00a9b0580, 0xc03c7822a0, 0x21)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/plasma.go:1354 +0x293
      github.com/couchbase/plasma.(*page).lookup(0xc01e86af40, 0xc03c7ad730, 0x7f0c5629c520, 0x0, 0xcbd800)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page.go:766 +0x1f4
      github.com/couchbase/plasma.(*page).Lookup(0xc01e86af40, 0x7f0c5629c520, 0x0, 0x7f0c53ae8800)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page.go:667 +0x105
      github.com/couchbase/plasma.(*Writer).Lookup(0xc0193a47b0, 0x7f0c5629c520, 0x18, 0x20, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/writer.go:135 +0x103
      github.com/couchbase/plasma.(*Writer).LookupKV(0xc0193a47b0, 0xc00be266c0, 0x18, 0x20, 0xc0073ef940, 0xc0073ef978, 0x434e81, 0x12f0150, 0xc0073ef988)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/mvcc.go:395 +0x10c
      github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).deleteSecIndex(0xc012e01700, 0xc00be266c0, 0x18, 0x20, 0xc00be267c0, 0x1d, 0x20, 0x2, 0x0, 0x100000000000000)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:1141 +0x13b
      github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).insertSecIndex(0xc012e01700, 0xc00be267c0, 0x1d, 0x20, 0xc00be266c0, 0x18, 0x20, 0x2, 0x4c2500, 0xc01c295480, ...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:775 +0xb18
      github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).insert(0xc012e01700, 0xc00be267c0, 0x1d, 0x20, 0xc00be266c0, 0x18, 0x20, 0x2, 0xc01ea76200, 0xc01c295480, ...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:733 +0x164
      github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).handleCommandsWorker(0xc012e01700, 0x2)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:610 +0x58a
      created by github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).initWriters
              /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:2980 +0x521
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-44749
          # Subject Branch Project Status CR V

          Activity

            jliang John Liang added a comment -

            In MB-45230, we could see a page got corrupted and GetOp() crashes. If we see unsupported delta, it can be caused by that (since GetOp is called to identify the delta). I can reproduce the following stack in unit test. The bottom of the stack looks different because there are many code path that will access the page (and execute GetOp()).

            unexpected fault address 0xb01dfacedebac1e
            fatal error: fault
            [signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e pc=0x429258e]
             
            goroutine 27 [running]:
            runtime.throw(0x447ea08, 0x5)
            	/usr/local/go/src/runtime/panic.go:608 +0x72 fp=0xc000a2b6b0 sp=0xc000a2b680 pc=0x402d6f2
            runtime.sigpanic()
            	/usr/local/go/src/runtime/signal_unix.go:397 +0x275 fp=0xc000a2b700 sp=0xc000a2b6b0 pc=0x4043095
            github.com/couchbase/plasma.(*pageOp).GetOp(...)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:110
            github.com/couchbase/plasma.(*pageDelta).GetOp(...)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:340
            github.com/couchbase/plasma.(*pageWalker).Op(...)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page_walker.go:77
            github.com/couchbase/plasma.newPgOpIterator(0x4f27260, 0x449f628, 0x0, 0xd8936da, 0x4510b20, 0xc00084f500, 0xc0006521b0, 0xc000a2b8d8, 0xc000a2b9f8, 0x42b8a9d)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/iterator.go:464 +0x1ce fp=0xc000a2b870 sp=0xc000a2b700 pc=0x429258e
            github.com/couchbase/plasma.(*page).collectItems(0xc00084f4c0, 0x4f27260, 0x0, 0xd8936da, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:1002 +0xb4 fp=0xc000a2b928 sp=0xc000a2b870 pc=0x42a94c4
            github.com/couchbase/plasma.(*page).Compact(0xc00084f4c0, 0xc8, 0x4515c01)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:893 +0x79 fp=0xc000a2ba08 sp=0xc000a2b928 pc=0x42a7ce9
            github.com/couchbase/plasma.(*Plasma).trySMOs2(0xc0000fe580, 0x4d0a270, 0x4515c20, 0xc00084f4c0, 0xc0006521b0, 0xc000a2be01, 0x190, 0x19, 0xc8, 0x4, ...)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma.go:1148 +0x84 fp=0xc000a2bdb8 sp=0xc000a2ba08 pc=0x42c0734
            github.com/couchbase/plasma.(*Plasma).trySMOs(0xc0000fe580, 0x4d0a270, 0x4515c20, 0xc00084f4c0, 0xc0006521b0, 0xc00084f401, 0x0)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma.go:1138 +0x96 fp=0xc000a2be20 sp=0xc000a2bdb8 pc=0x42c0686
            github.com/couchbase/plasma.(*Writer).Insert(0xc0000a4150, 0x49028e0, 0xb, 0x10)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/writer.go:74 +0x10a fp=0xc000a2be98 sp=0xc000a2be20 pc=0x42eee0a
            github.com/couchbase/plasma.(*Writer).InsertKV(0xc0000a4150, 0xc0004e0720, 0xb, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/mvcc.go:364 +0x151 fp=0xc000a2bf30 sp=0xc000a2be98 pc=0x42a1d01
            github.com/couchbase/plasma.testCrash.func1(0xc0004e0150, 0xc0000fe580, 0x186a0, 0x0)
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma_test.go:3070 +0x193 fp=0xc000a2bfc0 sp=0xc000a2bf30 pc=0x43831b3
            runtime.goexit()
            	/usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000a2bfc8 sp=0xc000a2bfc0 pc=0x405cd71
            created by github.com/couchbase/plasma.testCrash
            	/Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma_test.go:3061 +0x19e
            

            jliang John Liang added a comment - In MB-45230 , we could see a page got corrupted and GetOp() crashes. If we see unsupported delta, it can be caused by that (since GetOp is called to identify the delta). I can reproduce the following stack in unit test. The bottom of the stack looks different because there are many code path that will access the page (and execute GetOp()). unexpected fault address 0xb01dfacedebac1e fatal error: fault [signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e pc=0x429258e]   goroutine 27 [running]: runtime.throw(0x447ea08, 0x5) /usr/local/go/src/runtime/panic.go:608 +0x72 fp=0xc000a2b6b0 sp=0xc000a2b680 pc=0x402d6f2 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:397 +0x275 fp=0xc000a2b700 sp=0xc000a2b6b0 pc=0x4043095 github.com/couchbase/plasma.(*pageOp).GetOp(...) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:110 github.com/couchbase/plasma.(*pageDelta).GetOp(...) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:340 github.com/couchbase/plasma.(*pageWalker).Op(...) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page_walker.go:77 github.com/couchbase/plasma.newPgOpIterator(0x4f27260, 0x449f628, 0x0, 0xd8936da, 0x4510b20, 0xc00084f500, 0xc0006521b0, 0xc000a2b8d8, 0xc000a2b9f8, 0x42b8a9d) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/iterator.go:464 +0x1ce fp=0xc000a2b870 sp=0xc000a2b700 pc=0x429258e github.com/couchbase/plasma.(*page).collectItems(0xc00084f4c0, 0x4f27260, 0x0, 0xd8936da, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:1002 +0xb4 fp=0xc000a2b928 sp=0xc000a2b870 pc=0x42a94c4 github.com/couchbase/plasma.(*page).Compact(0xc00084f4c0, 0xc8, 0x4515c01) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/page.go:893 +0x79 fp=0xc000a2ba08 sp=0xc000a2b928 pc=0x42a7ce9 github.com/couchbase/plasma.(*Plasma).trySMOs2(0xc0000fe580, 0x4d0a270, 0x4515c20, 0xc00084f4c0, 0xc0006521b0, 0xc000a2be01, 0x190, 0x19, 0xc8, 0x4, ...) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma.go:1148 +0x84 fp=0xc000a2bdb8 sp=0xc000a2ba08 pc=0x42c0734 github.com/couchbase/plasma.(*Plasma).trySMOs(0xc0000fe580, 0x4d0a270, 0x4515c20, 0xc00084f4c0, 0xc0006521b0, 0xc00084f401, 0x0) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma.go:1138 +0x96 fp=0xc000a2be20 sp=0xc000a2bdb8 pc=0x42c0686 github.com/couchbase/plasma.(*Writer).Insert(0xc0000a4150, 0x49028e0, 0xb, 0x10) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/writer.go:74 +0x10a fp=0xc000a2be98 sp=0xc000a2be20 pc=0x42eee0a github.com/couchbase/plasma.(*Writer).InsertKV(0xc0000a4150, 0xc0004e0720, 0xb, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/mvcc.go:364 +0x151 fp=0xc000a2bf30 sp=0xc000a2be98 pc=0x42a1d01 github.com/couchbase/plasma.testCrash.func1(0xc0004e0150, 0xc0000fe580, 0x186a0, 0x0) /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma_test.go:3070 +0x193 fp=0xc000a2bfc0 sp=0xc000a2bf30 pc=0x43831b3 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000a2bfc8 sp=0xc000a2bfc0 pc=0x405cd71 created by github.com/couchbase/plasma.testCrash /Users/johnliang/Source/cheshire/cheshire/goproj/src/github.com/couchbase/plasma/plasma_test.go:3061 +0x19e

            Build couchbase-server-7.0.0-4789 contains plasma commit cadfb1e with commit message:
            MB-44749: Add assertion

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4789 contains plasma commit cadfb1e with commit message: MB-44749 : Add assertion

            Build couchbase-server-7.0.0-4797 contains plasma commit 06868c0 with commit message:
            MB-44749: Disable newSnapshot assertion

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4797 contains plasma commit 06868c0 with commit message: MB-44749 : Disable newSnapshot assertion

            Bulk closing non-fixed issues. Please reopen if necessary

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Bulk closing non-fixed issues. Please reopen if necessary

            Build couchbase-server-7.0.0-4845 contains plasma commit 7933aa5 with commit message:
            MB-44749: Assert active writer when creating snapshot

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4845 contains plasma commit 7933aa5 with commit message: MB-44749 : Assert active writer when creating snapshot

            People

              jliang John Liang
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are 2 open Gerrit changes

                  PagerDuty