Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27071

eventing-producer should close DCP connections and restart new DCP connections when we kill eventing-consumer's after reading checkpoint from the metadata bucket

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 5.5.0
    • 5.5.0
    • eventing
    • Enterprise Edition 5.1.0 build 1477
      4 node cluster : kv-eventing-index-n1ql

    Description

      Steps to Repro:
      ===========
      1) Deployed the following handler code

      function OnUpdate(doc, meta) {
          var doc_id = meta.id;
          log('creating document for : ', doc);
          dst_bucket[doc_id] = {'doc_id' : doc_id}; // SET operation
      }
      function OnDelete(meta) {
          log('deleting document', meta.id);
          delete dst_bucket[meta.id]; // DELETE operation
      }
      

      2) created 4032 docs to the source bucket.
      3) When eventing is writing docs to dst bucket, killed eventing consumers using the following command

      killall -9 eventing-consumer
      

      Destination bucket only had 1,977 docs.

      Abhishek Singh did a initial triage and concluded that whenever eventing-consumer is killed, eventing producer should close existing DCP stream, read the checkpoint state from metadata bucket and restart DCP streams.

      Logs attached. Also pasting some useful logs used by Abhishek Singh to debug the issue.

      Balakumarans-MacBook-Pro:testrunner balakumaran.g$ curl http://Administrator:password@10.112.170.101:8092/metadata/_design/dev_d/_view/v?stale=false -s | jq ".rows[].value[0]" | awk '{sum+=$1} END {print sum}'
      1391
      [root@node2-cb500-centos7 ~]# ps aux | grep kvport
      couchba+ 12409 13.8  5.3 191056 54212 ?        Sl   08:49   1:47 /opt/couchbase/bin/eventing-producer -adminport=8096 -dir=/opt/couchbase/var/lib/couchbase/data/@eventing -kvport=11210 -restport=8091 -uuid=7448eef313e087076808908126eeae91 -adminsslport=18096 -certfile=/opt/couchbase/var/lib/couchbase/config/memcached-cert.pem -keyfile=/opt/couchbase/var/lib/couchbase/config/memcached-key.pem
      root     12542  0.0  0.0 112640   960 pts/0    R+   09:02   0:00 grep --color=auto kvport
      [root@node2-cb500-centos7 ~]# kill -9 12409
      Balakumarans-MacBook-Pro:testrunner balakumaran.g$ curl http://Administrator:password@10.112.170.101:8092/metadata/_design/dev_d/_view/v?stale=false -s | jq ".rows[].value[0]" | awk '{sum+=$1} END {print sum}'
      4032
      Balakumarans-MacBook-Pro:testrunner balakumaran.g$ 
      

      You could use the following test to validated the issue once the bug is fixed.

      ./testrunner -i b/temp_centos7.ini -t eventing.eventing_recovery.EventingRecovery.test_killing_eventing_consumer_when_eventing_is_processing_mutations,nodes_init=4,services_init=kv-eventing-index-n1ql,dataset=default,groups=simple,reset_services=True,skip_cleanup=True,doc-per-day=2
      

      Attachments

        For Gerrit Dashboard: MB-27071
        # Subject Branch Project Status CR V

        Activity

          People

            asingh Abhishek Singh (Inactive)
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty