Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7649

Elastic Search: Additional Content from Review

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • None
    • 2.0.1
    • None
    • Security Level: Public
    • None

    Description

      Hi Karen,

      Here are 2 sections to incorporate into the manual somewhere. Feel free to edit the text as necessary, and if any of this doesn't make sense let me know.

      marty

      Understanding the Couchbase plugin for Elasticsearch Performance in Practice
      ----------------------------------------------------------------------------

      The Couchbase plugin for Elasticsearch uses XDCR for the transport of data. One of the
      most important parameters controlling the performance of XDCR is "xdcrMaxConcurrentReps".
      This value represents the maximum number of replication operations that will take place
      concurrently from each node in the Couchbase cluster and it defaults to 32.

      In practice this means if I'm replicating from a 5 node Couchbase cluster to a 1 node
      Elasticsearch cluster I may have up to 160 concurrent replications targeting a single
      Elasticsearch node. Each replication may require multiple TCP connections and this
      can end up overwhelming the Elasticsearch node.

      Once an Elasticsearch node is overwhelmed a variety of errors may occur. Some of them
      are:

      Error replicating vbucket 7:
      {badmatch, {error,all_nodes_failed,
      <<"Failed to grab remote bucket info from any of known nodes">>}}

      Error replicating vbucket 7:
      {error,

      {error,timeout}

      }}

      These errors occur because Couchbase is unable to communicate with Elasticsearch in a
      reasonable amount of time. XDCR can recover from these types of errors, but your
      replication may take longer to complete, or operate with higher latency because these
      operations must be retried at a later time.

      In circumstances such as this, it may help to lower the "xdcrMaxConcurrentReps" so that
      the total number of concurrent replications for the whole cluster is a more reasonable
      number.

      Initial Elasticsearch Indexing of an Existing Couchbase Bucket
      --------------------------------------------------------------

      Often times you have an existing Couchbase bucket with a large number of documents in
      production. When you initially start to index this data with Elasticsearch a large
      number of documents will be transferred in bulk. While this should work with the default
      settings, there are some settings which can be tweaked in Elasticsearch to make this
      initial indexing phase complete faster.

      The "refresh_interval" setting in Elasticsearch controls how frequently newly indexed
      items become available in search results. During a bulk load, we trade-off access to the
      newly indexed items, in exchange for faster overall indexing time.

      Full details about disabling and reenabling index refresh, see this section of the
      Elasticsearch guide:

      http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            kzeller kzeller
            kzeller kzeller
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty