Uploaded image for project: 'Couchbase Elasticsearch Connector'
  1. Couchbase Elasticsearch Connector
  2. CBES-50

RoutingMissingException when deleting child documents

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.1
    • Labels:
      None

      Description

      When parent-child relationship is configured in the Elasticsearch cluster, and when a child document gets deleted on the Couchbase cluster and the deletion is propagated via the XDCR replication, "RoutingMissingException" is seen in the ES logs.

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          david.nault David Nault added a comment - - edited

          The ES plugin was failing to specify a route when deleting children, because this info is not easily accessible (it's not included in the CAPI request).

          The solution has two parts:

          • When using RegexParentSelector, the parent ID is already embedded in the child ID. In that case, the routing can be inferred directly from the document ID.
          • When using DefaultParentSelector, a shadow document called a "routing signpost" is created for each child document. When deleting a child, the plugin discovers the correct routing by reading the contents of the signpost document.

          Some caveats apply when using DefaultParentSelector:

          • Child documents created by previous versions of the plugin will not be eligible for deletion, since they do not have routing signposts.
          • Failure to delete a child due to missing routing signpost is not treated as a fatal error. A warning is logged and replication continues. This is because the signpost and the document it refers to cannot be deleted in a single atomic operation. It's possible for the signpost to be successfully deleted while the child deletion fails. When the replication attempt is repeated, the signpost document will no longer exist, and it will no longer be possible to route the child deletion request correctly.
          • The routing signpost may live on a different Elasticsearch shard than the document it refers to. If the signpost's shard suffers data loss, it may become impossible for the plugin to delete the child document.
          • Loading the signpost document will trigger an Elasticsearch index refresh if the child document was both created and deleted within the same index refresh interval. It's fine if this happens occasionally, but constant rapid-fire creation and deletion may cause Elasticsearch performance issues associated with too-frequent refreshing.

          The alternative approach of using a query to locate the child documents (instead of using signpost documents) was considered, but it suffers from issues with document visibility. Documents only appear in query results after they have been indexed, and it's not practical to wait, or to force a refresh every time a document is deleted. The signposts, on the other hand, can be fetched by ID with a multi-get request regardless of whether they have been indexed yet.

           

          Show
          david.nault David Nault added a comment - - edited The ES plugin was failing to specify a route when deleting children, because this info is not easily accessible (it's not included in the CAPI request). The solution has two parts: When using RegexParentSelector, the parent ID is already embedded in the child ID. In that case, the routing can be inferred directly from the document ID. When using DefaultParentSelector, a shadow document called a "routing signpost" is created for each child document. When deleting a child, the plugin discovers the correct routing by reading the contents of the signpost document. Some caveats apply when using DefaultParentSelector: Child documents created by previous versions of the plugin will not be eligible for deletion, since they do not have routing signposts. Failure to delete a child due to missing routing signpost is not treated as a fatal error. A warning is logged and replication continues. This is because the signpost and the document it refers to cannot be deleted in a single atomic operation. It's possible for the signpost to be successfully deleted while the child deletion fails. When the replication attempt is repeated, the signpost document will no longer exist, and it will no longer be possible to route the child deletion request correctly. The routing signpost may live on a different Elasticsearch shard than the document it refers to. If the signpost's shard suffers data loss, it may become impossible for the plugin to delete the child document. Loading the signpost document will trigger an Elasticsearch index refresh if the child document was both created and deleted within the same index refresh interval. It's fine if this happens occasionally, but constant rapid-fire creation and deletion may cause Elasticsearch performance issues associated with too-frequent refreshing. The alternative approach of using a query to locate the child documents (instead of using signpost documents) was considered, but it suffers from issues with document visibility. Documents only appear in query results after they have been indexed, and it's not practical to wait, or to force a refresh every time a document is deleted. The signposts, on the other hand, can be fetched by ID with a multi-get request regardless of whether they have been indexed yet.  
          Hide
          david.nault David Nault added a comment -

          It's possible that the solution could be improved by falling back to querying all the shards when a signpost document is missing. Not sure whether it would be worth the added complexity, so I'm going to leave things where they stand for now.

          Show
          david.nault David Nault added a comment - It's possible that the solution could be improved by falling back to querying all the shards when a signpost document is missing. Not sure whether it would be worth the added complexity, so I'm going to leave things where they stand for now.

            People

            • Assignee:
              david.nault David Nault
              Reporter:
              david.nault David Nault
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.