Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61965

Eventing : Handle checkpointing failure by cursor_aware functions gracefully

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      Problem:

      For cursor_aware functions, on a checkpointing failure other than CAS_MISMATCH, the function handler skips onto the next mutation silently.
      A user has no mechanism:

      • Of knowing whether a failure is ongoing and mutations are being skipped due to checkpointing failure
      • Of knowing the document IDs of mutations for which checkpointing failed.
      • To control the maximum time up to which the checkpointing is allowed to run.

      Solutions:

      • Expose dcp_mutation_checkpoint_failure and dcp_deletion_checkpoint_failure via prometheus endpoints so that they can be reported as failures in eventing stats and in the on-prem UI by ns_server.
      • Push document IDs of failed mutations to app log.
      • Adding a checkpoint_timeout (just like script timeout) to control time limit on checkpointing activity.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            barkha.goyal Barkha Goyal
            abhishek.jindal Abhishek Jindal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty