Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43899

"Unexpected server error, request logged." upon adding a node to the cluster

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes

    Description

      Build : 7.0.0-4291

      This issue is seen in multiple tests across various components. Upon adding a new node to the cluster during test setup, or within the test, the rest api returns an error - Unexpected server error, request logged

      In this case, the timezone set on 172.23.106.116 was WSDT somehow, hence the difference of time in the test log vs. server log.

      [2021-01-26 14:55:32,148] - [rest_client:1174] INFO - settings/indexes params : storageMode=plasma
      [2021-01-26 14:55:32,171] - [fts_base:2215] INFO - 172.23.100.14 will be configured with services fts
      [2021-01-26 14:55:33,172] - [task:775] INFO - adding node 172.23.100.14:8091 to cluster
      [2021-01-26 14:55:33,173] - [rest_client:1485] INFO - adding remote node @172.23.100.14:8091 to this cluster @172.23.106.116:8091
      [2021-01-26 14:55:48,219] - [rest_client:1016] ERROR - POST http://172.23.106.116:8091/controller/addNode body: hostname=http%3A%2F%2F172.23.100.14%3A8091&user=Administrator&password=password&services=fts headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 500 reason: unknown b'["Unexpected server error, request logged."]' auth: Administrator:password
      [2021-01-26 14:55:48,246] - [rest_client:3732] INFO - Latest logs from UI on 172.23.106.116:
      [2021-01-26 14:55:48,247] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 0, 'module': 'ns_cluster', 'tstamp': 1611701733178, 'shortText': 'message', 'text': 'Change of address to "172.23.106.116" is requested.', 'serverTime': '2021-01-27T12:55:33.178Z'}
      [2021-01-26 14:55:48,247] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1611701689413, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.116' saw that node 'ns_1@172.23.100.14' went down. Details: [{nodedown_reason,\n                                                                                    connection_closed}]", 'serverTime': '2021-01-27T12:54:49.413Z'}
      [2021-01-26 14:55:48,247] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 0, 'module': 'mb_master', 'tstamp': 1611701689283, 'shortText': 'message', 'text': "I'm now the only node, so I'm the master.", 'serverTime': '2021-01-27T12:54:49.283Z'}
      [2021-01-26 14:55:48,247] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 1, 'module': 'ns_cluster', 'tstamp': 1611701689255, 'shortText': 'message', 'text': "Node 'ns_1@172.23.100.14' is leaving cluster.", 'serverTime': '2021-01-26T14:54:49.255Z'}
      [2021-01-26 14:55:48,247] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 4, 'module': 'ns_cluster', 'tstamp': 1611701689252, 'shortText': 'message', 'text': 'Node ns_1@172.23.100.14 left cluster', 'serverTime': '2021-01-26T14:54:49.252Z'}
      [2021-01-26 14:55:48,248] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1611701689173, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 1229c0041625fdca721630ae8b3887d6', 'serverTime': '2021-01-26T14:54:49.173Z'}
      [2021-01-26 14:55:48,248] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1611701678942, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.106.116'], EjectNodes = ['ns_1@172.23.100.14'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 1229c0041625fdca721630ae8b3887d6", 'serverTime': '2021-01-26T14:54:38.942Z'}
      [2021-01-26 14:55:48,248] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'warning', 'code': 102, 'module': 'menelaus_web', 'tstamp': 1611701678935, 'shortText': 'client-side error report', 'text': 'Client-side error-report for user "Administrator" on node \'ns_1@172.23.106.116\':\nUser-Agent:Python-httplib2/0.13.1 (gzip)\nStarting rebalance from test, ejected nodes [\'ns_1@172.23.100.14\']', 'serverTime': '2021-01-27T12:54:38.935Z'}
      [2021-01-26 14:55:48,248] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 11, 'module': 'menelaus_web', 'tstamp': 1611701678893, 'shortText': 'message', 'text': 'Deleted bucket "default"\n', 'serverTime': '2021-01-27T12:54:38.893Z'}
      [2021-01-26 14:55:48,248] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1611701678248, 'shortText': 'message', 'text': 'Shutting down bucket "default" on \'ns_1@172.23.106.116\' for deletion', 'serverTime': '2021-01-27T12:54:38.248Z'}
      [2021-01-26 14:55:48,270] - [rest_client:3732] INFO - Latest logs from UI on 172.23.100.14:
      [2021-01-26 14:55:48,270] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 0, 'module': 'ns_cluster', 'tstamp': 1611701733178, 'shortText': 'message', 'text': 'Change of address to "172.23.106.116" is requested.', 'serverTime': '2021-01-27T12:55:33.178Z'}
      [2021-01-26 14:55:48,271] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1611701689413, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.116' saw that node 'ns_1@172.23.100.14' went down. Details: [{nodedown_reason,\n                                                                                    connection_closed}]", 'serverTime': '2021-01-27T12:54:49.413Z'}
      [2021-01-26 14:55:48,271] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 0, 'module': 'mb_master', 'tstamp': 1611701689283, 'shortText': 'message', 'text': "I'm now the only node, so I'm the master.", 'serverTime': '2021-01-27T12:54:49.283Z'}
      [2021-01-26 14:55:48,271] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 1, 'module': 'ns_cluster', 'tstamp': 1611701689255, 'shortText': 'message', 'text': "Node 'ns_1@172.23.100.14' is leaving cluster.", 'serverTime': '2021-01-26T14:54:49.255Z'}
      [2021-01-26 14:55:48,271] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 4, 'module': 'ns_cluster', 'tstamp': 1611701689252, 'shortText': 'message', 'text': 'Node ns_1@172.23.100.14 left cluster', 'serverTime': '2021-01-26T14:54:49.252Z'}
      [2021-01-26 14:55:48,271] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1611701689173, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 1229c0041625fdca721630ae8b3887d6', 'serverTime': '2021-01-26T14:54:49.173Z'}
      [2021-01-26 14:55:48,272] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.100.14', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1611701678942, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.106.116'], EjectNodes = ['ns_1@172.23.100.14'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 1229c0041625fdca721630ae8b3887d6", 'serverTime': '2021-01-26T14:54:38.942Z'}
      [2021-01-26 14:55:48,272] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'warning', 'code': 102, 'module': 'menelaus_web', 'tstamp': 1611701678935, 'shortText': 'client-side error report', 'text': 'Client-side error-report for user "Administrator" on node \'ns_1@172.23.106.116\':\nUser-Agent:Python-httplib2/0.13.1 (gzip)\nStarting rebalance from test, ejected nodes [\'ns_1@172.23.100.14\']', 'serverTime': '2021-01-27T12:54:38.935Z'}
      [2021-01-26 14:55:48,272] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 11, 'module': 'menelaus_web', 'tstamp': 1611701678893, 'shortText': 'message', 'text': 'Deleted bucket "default"\n', 'serverTime': '2021-01-27T12:54:38.893Z'}
      [2021-01-26 14:55:48,272] - [rest_client:3733] ERROR - {'node': 'ns_1@172.23.106.116', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1611701678248, 'shortText': 'message', 'text': 'Shutting down bucket "default" on \'ns_1@172.23.106.116\' for deletion', 'serverTime': '2021-01-27T12:54:38.248Z'}
      [2021-01-26 14:55:48,272] - [rest_client:1529] ERROR - add_node error : b'["Unexpected server error, request logged."]'

      In the http_access.log on 172.23.106.116, I see the following entry :

      127.0.0.1 - - [27/Jan/2021:12:24:11 +1400] "GET /pools/default/buckets?v=54714639&uuid=a1517b4b7398afedb21f52f4aa4bcbef HTTP/1.1" 500 44 - "Go-http-client/1.1-indexer-projector" 8
      

      and then in the projector log on 172.23.106.116, the following is seen -

      2021-01-27T12:24:11.556+14:00 [Info] serviceChangeNotifier: Removing the bucket: standard_bucket_2 from book-keeping
      2021-01-27T12:24:11.556+14:00 [Warn] servicesChangeNotifier: Connection terminated for collection manifest notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default, bucket: standard_bucket_2, (invalid byte in chunk length)
      2021-01-27T12:24:11.564+14:00 [Warn] servicesChangeNotifier: Connection terminated for collection manifest notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default, bucket: standard_bucket_2, (invalid byte in chunk length)
      2021-01-27T12:24:11.580+14:00 [Warn] servicesChangeNotifier: Connection terminated for pool notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default (Notifier invalidated due to internal error)
      2021-01-27T12:24:12.129+14:00 [Warn] servicesChangeNotifier: Connection terminated for buckets notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default (Notifier invalidated due to internal error)
      2021-01-27T12:24:12.133+14:00 [Warn] servicesChangeNotifier: Connection terminated for collection manifest notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default, bucket: standard_bucket_3, (invalid byte in chunk length)
      2021-01-27T12:24:12.135+14:00 [Warn] servicesChangeNotifier: Connection terminated for collection manifest notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default, bucket: standard_bucket_3, (invalid byte in chunk length)
      2021-01-27T12:24:12.404+14:00 [Info] memstats {"Alloc":3408200, "TotalAlloc":557821224, "Sys":72286456, "Lookups":0, "Mallocs":7043015,"Frees":7021561, "HeapAlloc":3408200, "HeapSys":65994752, "HeapIdle":60997632, "HeapInuse":4997120,"HeapReleased":60080128, "HeapObjects":21454,"GCSys":2381824, "LastGC":1611699850739680355,"PauseTotalNs":141783487, "PauseNs":[101652, 36073, 7019367, 22701, 50865, 31555, 32278], "NumGC":320}
      2021-01-27T12:24:13.117+14:00 [Warn] servicesChangeNotifier: Connection terminated for services notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default (Notifier invalidated due to internal error)
      2021-01-27T12:24:13.118+14:00 [Warn] servicesChangeNotifier: Connection terminated for services notifier instance of http://%40projector-cbauth@127.0.0.1:8091, default (Notifier invalidated due to internal error)
      2021-01-27T12:24:13.499+14:00 [Info] Error occured during cluster info update (HTTP error 500 Internal Server Error getting "http://127.0.0.1:8091/pools/default/buckets?v=54714639&uuid=a1517b4b7398afedb21f52f4aa4bcbef": ["Unexpected server error, request logged."]) .. Retrying(1)
      2021-01-27T12:24:14.535+14:00 [Info] servicesChangeNotifier: Creating new notifier instance for http://%40projector-cbauth@127.0.0.1:8091, default
      

      Logs attached.

      Attachments

        1. 172.23.100.14.zip
          8.64 MB
        2. 172.23.100.14-20210126-1457-diag.zip
          16.70 MB
        3. 172.23.100.15.zip
          7.52 MB
        4. 172.23.100.15-20210126-1500-diag.zip
          6.98 MB
        5. 172.23.100.16-20210126-1459-diag.zip
          18.10 MB
        6. 172.23.100.17-20210126-1502-diag.zip
          10.63 MB
        7. 172.23.100.24.zip
          24.14 MB
        8. 172.23.106.116.zip
          54.50 MB
        9. 172.23.106.116-20210126-1456-diag.zip
          32.59 MB

        Issue Links

          For Gerrit Dashboard: MB-43899
          # Subject Branch Project Status CR V

          Activity

            People

              artem Artem Stemkovski
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty