Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30460

Avoid concurrent read and writes to dcp opcode counters

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.0.0
    • 5.5.0
    • eventing
    • Untriaged
    • Unknown

    Description

      Eventing service exit observed in centos longevity along with a rebalance failure against RC4 on 3rd day:

      2018-07-12T17:05:59.106-07:00, ns_log:0:info:message(ns_1@172.23.96.56) - Service 'eventing' exited with status 134. Restarting. Messages:
      runtime.gopark(0xe45b40, 0x0, 0xd8ed73, 0x6, 0x18, 0x2)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/proc.go:259 +0x13a fp=0xc42136aba8 sp=0xc42136ab78
      runtime.selectgoImpl(0xc42136af18, 0x0, 0x18)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/select.go:423 +0x1235 fp=0xc42136ae08 sp=0xc42136aba8
      runtime.selectgo(0xc42136af18)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/select.go:238 +0x1c fp=0xc42136ae30 sp=0xc42136ae08
      github.com/couchbase/eventing/consumer.(*Consumer).updateWorkerStats(0xc421f06a00)
      	goproj/src/github.com/couchbase/eventing/consumer/stats_updater.go:136 +0x4c2 fp=0xc42136afa8 sp=0xc42136ae30
      runtime.goexit()
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42136afb0 sp=0xc42136afa8
      created by github.com/couchbase/eventing/consumer.(*Consumer).Serve
      	goproj/src/github.com/couchbase/eventing/consumer/v8_consumer.go:313 +0x1b9d
      [goport(/opt/couchbase/bin/eventing-producer)] 2018/07/12 17:05:59 child process exited with status 134
       
      2018-07-12T17:05:59.118-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {service_rebalance_failed,eventing,
                                       {lost_connection,shutdown}}
      

      This might be due to a network blip as the rebalance failure says lost connection but looking at the eventing log in .56 at the same timestamp, following panic is seen:

      fatal error: concurrent map read and map write
      2018-07-12T17:05:58.632-07:00 [Info] client::Serve [worker_bucket_op_function_2:/tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock:26746] Informing Eventing.Producer to stop Eventing.Consumer instance: consumer => app: bucket_op_function name: worker_bucket_op_function_2 tcpPort: /tmp/127.0.0.1:8091_worker_bucket_op_function_2.sock ospid: 26746 dcpEventProcessed: DCP_STREAMREQ:1762 DCP_STREAMEND:2049 DCP_SNAPSHOT:15394 DCP_MUTATION:297970 DCP_DELETION:924831 v8EventProcessed: THR_MAP:1 HANDLER_CODE:1 FAILURE_STATS:3760 LATENCY_STATS:3760 EXECUTION_STATS:3760 LCB_EXCEPTION_STATS:3760 LOG_LEVEL:1 THR_COUNT:1 V8_INIT:1 V8_LOAD:1 SOURCE_MAP:1
       
      goroutine 1499702 [running]:
      runtime.throw(0xda598e, 0x21)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:566 +0x95 fp=0xc430c65058 sp=0xc430c65038
      runtime.mapaccess2(0xc71a80, 0xc42f9be900, 0xc430c650db, 0x15bf560, 0x4f1a00)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/hashmap.go:340 +0x249 fp=0xc430c650a0 sp=0xc430c65058
      github.com/couchbase/eventing/util.SprintDCPCounts(0xc42f9be900, 0xc4455ca000, 0xa5, 0xc430c651b0, 0x4113ff, 0x15a5c20, 0x7fb14bcfa278)
      	goproj/src/github.com/couchbase/eventing/util/util.go:139 +0xb7 fp=0xc430c65160 sp=0xc430c650a0
      github.com/couchbase/eventing/consumer.(*Consumer).String(0xc42b623400, 0xe415c8, 0xc4410b9800)
      	goproj/src/github.com/couchbase/eventing/consumer/v8_consumer.go:551 +0x47 fp=0xc430c65250 sp=0xc430c65160
      fmt.(*pp).handleMethods(0xc4410b9800, 0x76, 0x644526c901)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/fmt/print.go:590 +0x2f6 fp=0xc430c65300 sp=0xc430c65250
      fmt.(*pp).printArg(0xc4410b9800, 0xd83420, 0xc42b623400, 0x76)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/fmt/print.go:665 +0x17b fp=0xc430c653f8 sp=0xc430c65300
      fmt.(*pp).doPrintf(0xc4410b9800, 0xc432730880, 0x7f, 0xc430c65880, 0x6, 0x6)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/fmt/print.go:985 +0x123d fp=0xc430c654e0 sp=0xc430c653f8
      fmt.Sprintf(0xc432730880, 0x7f, 0xc430c65880, 0x6, 0x6, 0xc432730880, 0x7f)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/fmt/print.go:196 +0x6a fp=0xc430c65538 sp=0xc430c654e0
      log.(*Logger).Printf(0xc42012e140, 0xc432730880, 0x7f, 0xc430c65880, 0x6, 0x6)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/log/log.go:173 +0x53 fp=0xc430c65580 sp=0xc430c65538
      github.com/couchbase/eventing/logging.printf(0x4, 0xdbcf21, 0x5a, 0xc430c65880, 0x6, 0x6)
      	goproj/src/github.com/couchbase/eventing/logging/logging.go:116 +0x27e fp=0xc430c65618 sp=0xc430c65580
      github.com/couchbase/eventing/logging.Infof(0xdbcf21, 0x5a, 0xc430c65880, 0x6, 0x6)
      	goproj/src/github.com/couchbase/eventing/logging/logging.go:133 +0x5c fp=0xc430c65658 sp=0xc430c65618
      github.com/couchbase/eventing/producer.(*Producer).KillAndRespawnEventingConsumer(0xc42b75e700, 0x157a840, 0xc42b623400)
      	goproj/src/github.com/couchbase/eventing/producer/producer.go:621 +0x596 fp=0xc430c658f0 sp=0xc430c65658
      github.com/couchbase/eventing/consumer.(*client).Serve(0xc43a4d0770)
      	goproj/src/github.com/couchbase/eventing/consumer/client.go:143 +0x1474 fp=0xc430c65f50 sp=0xc430c658f0
      github.com/couchbase/eventing/suptree.(*Supervisor).runService.func1(0xc42b934e00, 0x0, 0x154a7a0, 0xc43a4d0770)
      	goproj/src/github.com/couchbase/eventing/suptree/supervisor.go:398 +0x63 fp=0xc430c65f80 sp=0xc430c65f50
      runtime.goexit()
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc430c65f88 sp=0xc430c65f80
      created by github.com/couchbase/eventing/suptree.(*Supervisor).runService
      	goproj/src/github.com/couchbase/eventing/suptree/supervisor.go:401 +0x5b
      

      there has not been a rebalance since this failure so we have to see if subsequent rebalances pass

      supportal: https://supportal.couchbase.com/snapshot/047eea1c007e715b5d38d12d2c74589b::2
      test: http://172.23.109.231/job/centos-systest-launcher/1529/console
      cluster: http://172.23.108.103:8091/ui/index.html#!/logs

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-30460
          # Subject Branch Project Status CR V

          Activity

            People

              vikas.chaudhary Vikas Chaudhary
              arunkumar Arunkumar Senthilnathan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty