Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32055

[5.5.3 System Test] Eventing producer crashed after undeploying a function

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 5.5.3
    • 5.5.3
    • eventing
    • centos1 cluster

    Description

      Build : 5.5.3-4029
      Test : -test tests/integration/test_allFeatures_vulcan.yml -scope tests/integration/scope_Xattrs_Vulcan.yml
      Scale : 3
      Iteration : 1st

      Test was undeploying the function bucket_op_complex_function_integration. Eventing producer crashed while undeploying the function. Following is the stack trace seen in the eventing logs.

      2018-11-16T02:11:46.408-08:00 [Info] DCPT[eventing:tbahHeMq-138:{eventing:tbahHeMq-135:a1d5bb01f1ec555c016eb5dfe0071f30_bucket_op_complex_function_undeploy}/0] ##abcd ... stopped
      2018-11-16T02:11:46.408-08:00 [Info] DCPT[eventing:tbahHeMq-138:{eventing:tbahHeMq-135:a1d5bb01f1ec555c016eb5dfe0071f30_bucket_op_complex_function_undeploy}/0] doReceive(): connection closed
      2018-11-16T02:11:46.408-08:00 [Info] Producer::CleanupMetadataBucket [bucket_op_complex_function:0] Exiting cron timer cleanup routine, mutations till high vb seqnos received
      2018-11-16T02:11:46.408-08:00 [Info] Producer::CleanupMetadataBucket [bucket_op_complex_function:0] Closed dcpFeed spawned for cleaning up metadata bucket artifacts
      2018-11-16T02:11:46.408-08:00 [Info] SuperSupervisor::cleanupProducer [0] App: bucket_op_complex_function Purging timer entries from plasma
      panic: File sync failed: /data/@eventing/bucket_op_complex_function_timer.data/header.data error fsync: bad file descriptor
       
      goroutine 295 [running]:
      panic(0xc3b9a0, 0xc42b7b2b80)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 fp=0xc42ba7b7b0 sp=0xc42ba7b720
      github.com/couchbase/plasma.syncFile(0xc4201f4058)
              goproj/src/github.com/couchbase/plasma/util.go:257 +0x197 fp=0xc42ba7b848 sp=0xc42ba7b7b0
      github.com/couchbase/plasma.(*multiFilelog).Commit(0xc420500400, 0x0, 0x15a5960)
              goproj/src/github.com/couchbase/plasma/log.go:398 +0x11e fp=0xc42ba7b8a0 sp=0xc42ba7b848
      github.com/couchbase/plasma.(*lsStore).flush(0xc4202662c0, 0xc4200121e0)
              goproj/src/github.com/couchbase/plasma/lss.go:205 +0x24a fp=0xc42ba7b920 sp=0xc42ba7b8a0
      github.com/couchbase/plasma.(*lsStore).(github.com/couchbase/plasma.flush)-fm(0xc4200121e0)
              goproj/src/github.com/couchbase/plasma/lss.go:147 +0x34 fp=0xc42ba7b940 sp=0xc42ba7b920
      github.com/couchbase/plasma.(*flushBuffer).Done(0xc4200121e0)
              goproj/src/github.com/couchbase/plasma/lss.go:708 +0xdd fp=0xc42ba7b988 sp=0xc42ba7b940
      github.com/couchbase/plasma.(*lsStore).Sync(0xc4202662c0, 0xc4292b2b00)
              goproj/src/github.com/couchbase/plasma/lss.go:472 +0x93 fp=0xc42ba7b9c0 sp=0xc42ba7b988
      github.com/couchbase/plasma.(*Plasma).PersistAll2(0xc4204a3800, 0x8)
              goproj/src/github.com/couchbase/plasma/persistor.go:187 +0xd0 fp=0xc42ba7ba00 sp=0xc42ba7b9c0
      github.com/couchbase/plasma.(*Plasma).PersistAll(0xc4204a3800)
              goproj/src/github.com/couchbase/plasma/persistor.go:172 +0x37 fp=0xc42ba7ba20 sp=0xc42ba7ba00
      github.com/couchbase/eventing/producer.(*Producer).persistPlasma(0xc4204fc380)
              goproj/src/github.com/couchbase/eventing/producer/plasma_ops.go:47 +0x128 fp=0xc42ba7bf98 sp=0xc42ba7ba20
      runtime.goexit()
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42ba7bfa0 sp=0xc42ba7bf98
      created by github.com/couchbase/eventing/producer.(*Producer).Serve
              goproj/src/github.com/couchbase/eventing/producer/producer.go:204 +0x117f
              
      ...
      ...
      [goport(/opt/couchbase/bin/eventing-producer)] 2018/11/16 02:12:05 child process exited with status 134
      2018-11-16T02:12:05.303-08:00 [Info] Started eventing producer version: evt-5.5.3-4029-ee
      2018-11-16T02:12:05.303-08:00 [Info] Setting IP mode to ipv4
      

      Eventing nodes the cluster during this time : 172.23.96.145, 172.23.96.56
      The panic above is seen in 172.23.96.56

      I understand that with 6.0 release we have moved to a plasma-less design which will avoid the issue altogether, but its still worth looking in this issue for the following reasons:
      1. We still support Eventing in 5.5.x and there might be customers who might not be willing to upgrade to 6.x yet.
      2. This could be a potential bug in Plasma which surfaced only with this issue. Worthwhile to investigate to prevent it surfacing elsewhere.

      This wouldnt be a regression since there are no relevant eventing changes in 5.5.3.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Wayne Siu This wouldnt be a regression since there are no relevant eventing changes in 5.5.3.

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Wayne Siu This wouldnt be a regression since there are no relevant eventing changes in 5.5.3.

          Bug looks to be within Eventing because of the way 2 routines are interacting with each other(as Sundar pointed out) at the time of Function undeploy.

          asingh Abhishek Singh (Inactive) added a comment - Bug looks to be within Eventing because of the way 2 routines are interacting with each other(as Sundar pointed out) at the time of Function undeploy.
          jeelan.poola Jeelan Poola added a comment -

          A potential fix is in the works.

          jeelan.poola Jeelan Poola added a comment - A potential fix is in the works.
          ritam.sharma Ritam Sharma added a comment -

          Vikas Chaudhary - Fix for this issue is merged. Please validate the fix for next build.

          ritam.sharma Ritam Sharma added a comment - Vikas Chaudhary - Fix for this issue is merged. Please validate the fix for next build.

          Build couchbase-server-5.5.3-4031 contains eventing commit 89543fc with commit message:
          MB-32055 Skip spawning plasma persist routine

          build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.3-4031 contains eventing commit 89543fc with commit message: MB-32055 Skip spawning plasma persist routine

          Ritam Sharma We will restart test once current run will complete 3 days. So that we come to know whether there are any new issues or not

          vikas.chaudhary Vikas Chaudhary added a comment - Ritam Sharma We will restart test once current run will complete 3 days. So that we come to know whether there are any new issues or not

          Not seen so far in current run , will keep monitor the run 

          Run : http://qa.sc.couchbase.com/job/centos-systest-launcher/1654/console 

          vikas.chaudhary Vikas Chaudhary added a comment - Not seen so far in current run , will keep monitor the run  Run : http://qa.sc.couchbase.com/job/centos-systest-launcher/1654/console  

          Not seen this crash in the latest system test run with 5.5.3-4033. The test has run for 4.5 days now. Closing the bug

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Not seen this crash in the latest system test run with 5.5.3-4033. The test has run for 4.5 days now. Closing the bug

          People

            asingh Abhishek Singh (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty