Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-24369

[FTS] "Unable to find given hostport in cbauth database" errors flood logs after rebalance-out of fts node during indexing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 5.0.0
    • 5.0.0
    • fts
    • Untriaged
    • No

    Description

      Build
      5.0.0-2873

      Testcase
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,GROUP=P1 -t fts.moving_topology_fts.MovingTopFTS.update_index_during_rebalance,items=200000,GROUP=P1

      Steps:
      cluster: [D(.96), D+F(.190), F(.66)]
      1. Load 200K docs into bucket default
      2. Create a default index.
      3. When indexing crosses the 50% mark, start to rebalance out the D node(.96).
      4. As soon as rebalance starts, update the index definition to use 512 pindexes.
      5. Once rebalance is complete, issue a query on the other fts node (.190).
      See fts.log on .190, it's flooded with "Unable to find given hostport in cbauth database" errors.

      [2017-05-12 17:21:09,490] - [rest_client:2516] INFO - Index default_index_1 created
      [2017-05-12 17:21:09,491] - [fts_base:3449] INFO - Validating index distribution for default_index_1 ...
      [2017-05-12 17:21:09,623] - [fts_base:3467] INFO - Validated: Number of PIndexes = 6
      [2017-05-12 17:21:09,641] - [fts_base:3479] INFO - Validated: Every pIndex serves 171 partitions or lesser
      [2017-05-12 17:21:09,642] - [fts_base:3495] INFO - Validated: pIndexes are distributed across [u'66eb74c6df7ee54f7e0589f97fec4548', u'bf5b6bbcc32d4d9fc76e60713ea34663'] 
      [2017-05-12 17:21:09,642] - [fts_base:3501] INFO - Expecting num of partitions in each node in range 341-683
      [2017-05-12 17:21:09,642] - [fts_base:3516] INFO - Validated: Node 66eb74c6df7ee54f7e0589f97fec4548 houses 3 pindexes which serve 513 partitions
      [2017-05-12 17:21:09,643] - [fts_base:3516] INFO - Validated: Node bf5b6bbcc32d4d9fc76e60713ea34663 houses 3 pindexes which serve 511 partitions
      [2017-05-12 17:21:09,643] - [fts_base:3358] INFO - sleep for 10 secs.  ...
      [2017-05-12 17:21:19,654] - [moving_topology_fts:875] INFO - Index building has begun...
      [2017-05-12 17:21:19,813] - [moving_topology_fts:878] INFO - Index count for default_index_1: 9735
      [2017-05-12 17:21:20,012] - [fts_base:3379] INFO - Docs in bucket = 200000, docs in FTS index 'default_index_1': 9735
      [2017-05-12 17:21:26,448] - [fts_base:3379] INFO - Docs in bucket = 200000, docs in FTS index 'default_index_1': 24038
      [2017-05-12 17:21:32,839] - [fts_base:3379] INFO - Docs in bucket = 200000, docs in FTS index 'default_index_1': 44615
      [2017-05-12 17:21:39,325] - [fts_base:3379] INFO - Docs in bucket = 200000, docs in FTS index 'default_index_1': 63188
      [2017-05-12 17:21:45,720] - [fts_base:3379] INFO - Docs in bucket = 200000, docs in FTS index 'default_index_1': 83143
      [2017-05-12 17:21:52,096] - [fts_base:3379] INFO - Docs in bucket = 200000, docs in FTS index 'default_index_1': 103629
      [2017-05-12 17:21:52,097] - [fts_base:2465] INFO - Starting rebalance-out nodes:[ip:172.23.105.96 port:8091 ssh_username:root] at C1 cluster 172.23.105.96
      [2017-05-12 17:21:52,099] - [fts_base:3358] INFO - sleep for 15 secs.  ...
      [2017-05-12 17:21:52,605] - [rest_client:1394] INFO - rebalance params : {'password': 'password', 'ejectedNodes': u'ns_1@172.23.105.96', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.106.66,ns_1@172.23.105.96,ns_1@172.23.105.190'}
      [2017-05-12 17:21:52,617] - [rest_client:1399] INFO - rebalance operation started
      [2017-05-12 17:21:52,626] - [rest_client:1547] INFO - rebalance percentage : 0.00 %
      [2017-05-12 17:21:52,627] - [task:484] INFO - Rebalance - status: running, progress: 0
      [2017-05-12 17:22:02,683] - [rest_client:1547] INFO - rebalance percentage : 11.01 %
      [2017-05-12 17:22:02,683] - [task:484] INFO - Rebalance - status: running, progress: 11.0134548611
      [2017-05-12 17:22:07,397] - [fts_base:879] INFO - Updating fulltext-index default_index_1 on 172.23.106.66
      [2017-05-12 17:22:07,398] - [rest_client:2523] INFO - {
         "params": {}, 
         "name": "default_index_1", 
         "planParams": {
            "numReplicas": 0, 
            "maxPartitionsPerPIndex": 2
         }, 
         "sourceName": "default", 
         "sourceUUID": "", 
         "sourceType": "couchbase", 
         "type": "fulltext-index", 
         "sourceParams": {
            "authUser": "default", 
            "dataManagerSleepMaxMS": 20000, 
            "authSaslUser": "", 
            "clusterManagerSleepMaxMS": 20000, 
            "authSaslPassword": "", 
            "clusterManagerSleepInitMS": 0, 
            "dataManagerBackoffFactor": 0, 
            "authPassword": "", 
            "dataManagerSleepInitMS": 0, 
            "feedBufferAckThreshold": 0, 
            "feedBufferSizeBytes": 0, 
            "clusterManagerBackoffFactor": 0
         }, 
         "uuid": "12b2ad22adcf4677"
      }
      [2017-05-12 17:22:07,583] - [rest_client:2530] INFO - Index/alias default_index_1 updated
      [2017-05-12 17:22:07,781] - [moving_topology_fts:896] INFO - {u'sourceType': u'couchbase', u'sourceName': u'default', u'planParams': {u'maxPartitionsPerPIndex': 2}, u'uuid': u'58e4180229c2cb95', u'sourceParams': {u'authSaslUser': u'', u'authUser': u'default', u'feedBufferAckThreshold': 0, u'dataManagerSleepMaxMS': 20000, u'dataManagerSleepInitMS': 0, u'clusterManagerSleepMaxMS': 20000, u'clusterManagerBackoffFactor': 0, u'authSaslPassword': u'', u'clusterManagerSleepInitMS': 0, u'dataManagerBackoffFactor': 0, u'authPassword': u'', u'feedBufferSizeBytes': 0}, u'params': {}, u'type': u'fulltext-index', u'name': u'default_index_1'}
      [2017-05-12 17:22:12,746] - [rest_client:1547] INFO - rebalance percentage : 22.03 %
      [2017-05-12 17:22:12,747] - [task:484] INFO - Rebalance - status: running, progress: 22.0269097222
      [2017-05-12 17:22:22,783] - [rest_client:1547] INFO - rebalance percentage : 30.38 %
      [2017-05-12 17:22:22,784] - [task:484] INFO - Rebalance - status: running, progress: 30.3819444444
      [2017-05-12 17:22:32,826] - [rest_client:1547] INFO - rebalance percentage : 31.29 %
      [2017-05-12 17:22:32,827] - [task:484] INFO - Rebalance - status: running, progress: 31.2934027778
      [2017-05-12 17:22:42,869] - [rest_client:1547] INFO - rebalance percentage : 39.87 %
      [2017-05-12 17:22:42,870] - [task:484] INFO - Rebalance - status: running, progress: 39.8654513889
      [2017-05-12 17:22:52,906] - [rest_client:1547] INFO - rebalance percentage : 44.12 %
      [2017-05-12 17:22:52,906] - [task:484] INFO - Rebalance - status: running, progress: 44.1189236111
      [2017-05-12 17:23:02,939] - [rest_client:1547] INFO - rebalance percentage : 48.44 %
      [2017-05-12 17:23:02,940] - [task:484] INFO - Rebalance - status: running, progress: 48.4375
      [2017-05-12 17:23:12,981] - [rest_client:1547] INFO - rebalance percentage : 50.00 %
      [2017-05-12 17:23:12,982] - [task:484] INFO - Rebalance - status: running, progress: 50.0
      [2017-05-12 17:23:23,022] - [rest_client:1547] INFO - rebalance percentage : 50.00 %
      [2017-05-12 17:23:23,023] - [task:484] INFO - Rebalance - status: running, progress: 50.0
      [2017-05-12 17:23:33,051] - [rest_client:787] ERROR - socket error while connecting to http://172.23.105.96:8091/pools/default/rebalanceProgress error [Errno 111] Connection refused 
      [2017-05-12 17:23:36,066] - [task:484] INFO - Rebalance - status: none, progress: 100
      [2017-05-12 17:23:36,073] - [rest_client:783] ERROR - GET http://172.23.105.96:8091/pools/default body:  headers: {'Content-Type': 'application/json', 'Accept': '*/*', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n'} error: 404 reason: unknown "unknown pool" auth: Administrator:password
      http://172.23.105.96:8091/pools/default with status False: unknown pool
      [2017-05-12 17:23:36,091] - [task:543] INFO - rebalancing was completed with progress: 100% in 103.474075079 sec
      

      On .190, I see

      2017-05-12T16:26:18.296-07:00 INFO feed_dcp: OnError, name: default_index_8a901acf1c17ff46_18572d87: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.296-07:00 INFO main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.317-07:00 INFO feed_dcp: OnError, name: default_index_8a901acf1c17ff46_6ddbfb54: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.317-07:00 INFO main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.634-07:00 INFO feed_dcp: OnError, name: default_index_8a901acf1c17ff46_18572d87: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.634-07:00 INFO main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.655-07:00 INFO feed_dcp: OnError, name: default_index_8a901acf1c17ff46_6ddbfb54: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      2017-05-12T16:26:18.655-07:00 INFO main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.106.66:11210, err: Unable to find given hostport in cbauth database: `172.23.106.66:11210'
      :
      :
      2017-05-12T17:29:15.185-07:00 INFO feed_dcp: OnError, name: default_index_1_58e4180229c2cb95_6559adb4: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.105.96:11210, err: Unable to find given hostport in cbauth database: `172.23.105.96:11210'
      2017-05-12T17:29:15.185-07:00 INFO main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.105.96:11210, err: Unable to find given hostport in cbauth database: `172.23.105.96:11210'
      

      I did a cbcollect after 4 mins(after rebalance completed and I still see the log flooded with these errors.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Sreekanth Sivasankaran Sreekanth Sivasankaran (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty