Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46396

FTS: server groups don't work properly in case of group failure

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.0.0
    • 7.0.2
    • fts

    Description

      Build: 7.0.0-5127

      While testing fts service behavior concerning server groups I found huge difference between fts and index services.

      Scenario is the following:

      • create cluster {kv} {fts, index, query} {fts, index, query}
      • create server group g1
      • put all nodes in g1
      • remove default server group Group 1
      • load `beer-sample` sample bucket
      • drop index `beer_primary`
      • create the fts index `fts_index`: 

        {
         "name": "fts_index",
         "type": "fulltext-index",
         "params": {
          "doc_config": {
           "docid_prefix_delim": "",
           "docid_regexp": "",
           "mode": "type_field",
           "type_field": "type"
          },
          "mapping": {
           "default_analyzer": "standard",
           "default_datetime_parser": "dateTimeOptional",
           "default_field": "_all",
           "default_mapping": {
            "dynamic": true,
            "enabled": true
           },
           "default_type": "_default",
           "docvalues_dynamic": false,
           "index_dynamic": true,
           "store_dynamic": false,
           "type_field": "_type"
          },
          "store": {
           "indexType": "scorch",
           "segmentVersion": 15
          }
         },
         "sourceType": "gocbcore",
         "sourceName": "beer-sample",
         "sourceUUID": "adc53e37f3906a5de153d05a718d8b47",
         "sourceParams": {},
         "planParams": {
          "maxPartitionsPerPIndex": 256,
          "indexPartitions": 4,
          "numReplicas": 1
         },
         "uuid": "540e0a1555cb507b"
        }
        

        index is on `beer-sample`, everything is default, replica: 1, partitions: 4

      • create gsi index `gsi_index` on `beer-sample`

      create index gsi_index on `beer-sample`(type)
      PARTITION BY HASH(type) WITH {"num_partition":4, "num_replica":1}
      

      * check fts index partitions distribution on both fts nodes

      curl -X GET -u Administrator:password http://172.23.107.56:8094/api/nsstats | jq
      

      we see that both nodes hold 4 partitions, which is correct since we have fts index replica=1

      • check that fts index returns full set of results on both fts nodes:

      curl -XPOST -H "Content-Type: application/json" -d '{"explain": true,"fields": ["*"],"highlight": {},"query": {"query": "type:beer"}}' -u Administrator:password http://172.23.104.237:8094/api/index/fts_index/query
      

      Both fts nodes return correct amount of docs: "total_hits":5891

      • check gsi index partitions distributon:

      gsi_index: Nodes: 172.23.104.237:8091 (1 partition), 172.23.107.56:8091 (3 partitions)
      gsi_index replica: Nodes: 172.23.104.237:8091 (3 partitions), 172.23.107.56:8091 (1 partition)

      So, gsi_index partitions are distributed through g1 gsi nodes

      • check that gsi index returns correct amount of docs on both gsi nodes:

      curl -v -u Administrator:password  -d 'statement=SELECT * from `beer-sample` where type="beer"' http://172.23.107.56:8093/query/service
      

      Both gsi nodes return correct amount of docs: "resultCount": 5891

      • create g2 server group
      • add the following nodes to g2: {kv} {fts, index, query} {fts, index, query}

        , so, g1 and g2 groups are equal.

      • perform rebalance

      Resulting cluster is here:

      • check fts index partitions distribution through g1 and g2 fts nodes

      Here we notice that partitions distribution was changed. Before adding g2 we had 4 partitions per node, now we have 2 partitions per node for all fts nodes in g1 and g2.

      •  check gsi_index partitions distribution

      gsi_index: Nodes: 172.23.104.237:8091 (1 partition), 172.23.105.18:8091 (2 partitions), 172.23.107.56:8091 (1 partition)
      gsi_index replica: Nodes: 172.23.104.237:8091 (2 partitions), 172.23.105.18:8091 (1 partition), 172.23.105.21:8091 (1 partition)
      

      So, gsi_index partitions are distributed through all gsi nodes from g1 and g2

      • check that gsi index returns correct amount of docs on all gsi nodes from g1 and g2:

      All nodes return correct results: "resultCount": 5891

      • check that fts index returns full set of results on all fts nodes:

      all 4 fts nodes return correct amount of docs: "total_hits":5891

      • Now let's fail g1 group fts and gsi nodes: let's shutdown couchbase service on all 2 non kv nodes from g1. No rebalance after.

      run the following command on all nodes from g1
      systemctl stop couchbase-server
      

      • check that gsi index returns correct amount of docs on all gsi nodes from g2:

      both g2 nodes return correct results: "resultCount": 5891

      • check that fts index returns correct amount of docs on all fts nodes from g2:

      both g2 nodes return incorrect amount of docs: "total_hits":2941

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - - edited

          Unlike GSI, FTS don't have the capability to read from replica partitions.

          So to leverage/enable the replica's provisioned, you need to failover the nodes in the other server group. And then FTS should be able to promote the replica partitions to primary on the remaining server group and get it to work querying seamlessly. 

          Please let me know whether it works on failover of nodes/not.

          Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - - edited Unlike GSI, FTS don't have the capability to read from replica partitions. So to leverage/enable the replica's provisioned, you need to failover the nodes in the other server group. And then FTS should be able to promote the replica partitions to primary on the remaining server group and get it to work querying seamlessly.  Please let me know whether it works on failover of nodes/not.

          Failover of offline nodes helps.

          evgeny.makarenko Evgeny Makarenko (Inactive) added a comment - Failover of offline nodes helps.

          People

            evgeny.makarenko Evgeny Makarenko (Inactive)
            evgeny.makarenko Evgeny Makarenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty