Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19342

[FTS] MCP: After hard failover of fts node, pindexes become unavailable

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft

    Description

      Build
      4.5.0-2151

      Testcase
      ./testrunner -i INI_FILE.ini -p skip-cleanup=True,get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,GROUP=P0 -t fts.moving_topology_fts.MovingTopFTS.hard_failover_no_rebalance_between_indexing_and_querying,items=10000,cluster=D,F,F,GROUP=P0

      Steps
      1. Cluster: D+F,F,F
      2. Load 10K docs onto default bucket and build index.
      3. Hard failover one fts node.
      4. Try to query, the pindexes are not available. the doc count isn't available either.

      Sorry, this bug should have come up long back but there was a test bug that caused most rebalance tests to be running with index_replicas =1 in which case, there was simply a replica promotion that totally hid this bug. I've fixed it and restarted rebalance tests on latest build to see if there are other failures.

      2016-04-22 18:24:18 | INFO | MainProcess | Cluster_Thread | [task._failover_nodes] Failing over 172.23.106.176:8091 with graceful=False
      2016-04-22 18:24:19 | INFO | MainProcess | Cluster_Thread | [rest_client.fail_over] fail_over node ns_1@172.23.106.176 successful
      2016-04-22 18:24:19 | INFO | MainProcess | Cluster_Thread | [task.execute] 0 seconds sleep after failover, for nodes to go pending....
      2016-04-22 18:24:19 | INFO | MainProcess | test_thread | [fts_base.sleep] sleep for 30 secs.  ...
      2016-04-22 18:24:49 | INFO | MainProcess | test_thread | [rest_client.get_nodes] Node 172.23.106.176 not part of cluster inactiveFailed
      2016-04-22 18:24:49 | INFO | MainProcess | test_thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "default_index_1", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "type", "match": "emp"}, "size": 10000000} on node: 172.23.106.139
      2016-04-22 18:24:49 | ERROR | MainProcess | test_thread | [rest_client._http_request] http://172.23.106.139:8094/api/index/default_index_1/query error 400 reason: status: 400, content: rest_index: Query, indexName: default_index_1, requestBody: {"from": 0, "indexName": "default_index_1", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "type", "match": "emp"}, "size": 10000000}, req: &http.Request{Method:"POST", URL:(*url.URL)(0xc840844900), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept-Encoding":[]string{"identity"}, "Content-Length":[]string{"209"}, "Content-Type":[]string{"application/json"}, "Authorization":[]string{"Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="}, "Accept":[]string{"*/*"}, "User-Agent":[]string{"Python-httplib2/$Rev: 259 $"}}, Body:(*http.body)(0xc8419aa740), ContentLength:209, TransferEncoding:[]string(nil), Close:false, Host:"172.23.106.139:8094", Form:url.Values{}, PostForm:url.Values{}, MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"192.168.3.122:59923", RequestURI:"/api/index/default_index_1/query", TLS:(*tls.ConnectionState)(nil), Cancel:(<-chan struct {})(nil)}, err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
       rest_index: Query, indexName: default_index_1, requestBody: {"from": 0, "indexName": "default_index_1", "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"field": "type", "match": "emp"}, "size": 10000000}, req: &http.Request{Method:"POST", URL:(*url.URL)(0xc840844900), Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"Accept-Encoding":[]string{"identity"}, "Content-Length":[]string{"209"}, "Content-Type":[]string{"application/json"}, "Authorization":[]string{"Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA=="}, "Accept":[]string{"*/*"}, "User-Agent":[]string{"Python-httplib2/$Rev: 259 $"}}, Body:(*http.body)(0xc8419aa740), ContentLength:209, TransferEncoding:[]string(nil), Close:false, Host:"172.23.106.139:8094", Form:url.Values{}, PostForm:url.Values{}, MultipartForm:(*multipart.Form)(nil), Trailer:http.Header(nil), RemoteAddr:"192.168.3.122:59923", RequestURI:"/api/index/default_index_1/query", TLS:(*tls.ConnectionState)(nil), Cancel:(<-chan struct {})(nil)}, err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
      2016-04-22 18:24:49 | ERROR | MainProcess | test_thread | [fts_base.execute_query] Error running query: 'NoneType' object is not iterable
      

      Live cluster : http://172.23.106.175:8091/ui/index.html#/servers/active

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mschoch Marty Schoch [X] (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty