Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45965

[System Test] : Fatal error seen in query logs - Fatal Error default : dial tcp 172.23.97.119:11210: i/o timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • qe

    Description

      Build : 7.0.0-5017
      Test : -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas_scale3.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
      Scale : 3
      Iteration : 2
      Day : 4th

      On 172.23.120.245, the following errors are seen. They seem to be benign though, not sure if there is a functional impact, hence setting the severity as Major.

      2021-04-26T18:52:29.147-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      2021-04-26T18:52:30.230-07:00 [Info] Refreshing indexer list due to cluster changes or auto-refresh.
      2021-04-26T18:52:30.230-07:00 [Info] Refreshed Indexer List: [172.23.105.107:9100 172.23.121.117:9100 172.23.96.252:9100 172.23.96.253:9100 172.23.99.11:9100]
      2021-04-26T18:52:30.260-07:00 [ERROR] Fatal Error default : dial tcp 172.23.97.119:11210: i/o timeout 
      2021-04-26T18:52:30.267-07:00 [Info] switched currmeta from 127599 -> 127599 force true 
      2021-04-26T18:52:30.269-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      2021-04-26T18:52:30.275-07:00 [Info] GsiClient::UpdateUsecjson: using collatejson as data format between indexer and GsiClient
      2021-04-26T18:52:30.335-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      2021-04-26T18:52:30.374-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      2021-04-26T18:52:30.403-07:00 [ERROR] Fatal Error default : dial tcp 172.23.97.119:11210: i/o timeout 
      _time=2021-04-26T18:52:30.403-07:00 _level=ERROR _msg={1 errors, starting with dial tcp 172.23.97.119:11210: i/o timeout} 
      2021-04-26T18:52:30.404-07:00 [ERROR] Fatal Error default : dial tcp 172.23.97.119:11210: i/o timeout 
      2021-04-26T18:52:30.410-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      2021-04-26T18:52:30.410-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      2021-04-26T18:52:30.513-07:00 [INFO] Retrying Memcached error (MCResponse status=ENOMEM, opcode=GET, opaque=1, msg: ) FOR default(vbid:347, keys:<ud>[73448AAA-00_8102999]</ud>) 
      

      On the KV node 172.23.97.119, around the same time, following is seen in the memcached logs :

      2021-04-26T18:52:30.406004-07:00 INFO 1601: HELO [GoMemcached] XATTR, JSON, Collections [ {"ip":"172.23.120.245","port":46148} - {"ip":"172.23.97.119","port":11210} (not authenticated) ]
      2021-04-26T18:52:30.408388-07:00 INFO 1601: Client {"ip":"172.23.120.245","port":46148} authenticated as <ud>@cbq-engine</ud>
      2021-04-26T18:52:30.408689-07:00 INFO 1510: HELO [GoMemcached] XATTR, JSON, Collections [ {"ip":"172.23.120.245","port":46150} - {"ip":"172.23.97.119","port":11210} (not authenticated) ]
      2021-04-26T18:52:30.409402-07:00 INFO 1601: Unrecoverable error encountered: ["reading","error"], socket_error: 104:Connection reset by peer, shutting down connection
      2021-04-26T18:52:30.409419-07:00 INFO 1510: Client {"ip":"172.23.120.245","port":46150} authenticated as <ud>@cbq-engine</ud>
      2021-04-26T18:52:30.410283-07:00 INFO 1510: Unrecoverable error encountered: ["reading","error"], socket_error: 104:Connection reset by peer, shutting down connection
      2021-04-26T18:52:30.410402-07:00 INFO 1597: Client {"ip":"172.23.97.119","port":47938} authenticated as <ud>@projector</ud>
      2021-04-26T18:52:30.410413-07:00 INFO 1601: HELO [GoMemcached] XATTR, JSON, Collections [ {"ip":"172.23.120.245","port":46108} - {"ip":"172.23.97.119","port":11210} (not authenticated) ]
      2021-04-26T18:52:30.410933-07:00 INFO 1510: HELO [GoMemcached] XATTR, JSON, Collections [ {"ip":"172.23.120.245","port":46152} - {"ip":"172.23.97.119","port":11210} (not authenticated) ]
      2021-04-26T18:52:30.411527-07:00 INFO 1510: Client {"ip":"172.23.120.245","port":46152} authenticated as <ud>@cbq-engine</ud>
      2021-04-26T18:52:30.411589-07:00 INFO 1601: Client {"ip":"172.23.120.245","port":46108} authenticated as <ud>@cbq-engine</ud>
      2021-04-26T18:52:30.412032-07:00 INFO 1602: HELO [GoMemcached] XATTR, JSON, Collections [ {"ip":"172.23.120.245","port":46154} - {"ip":"172.23.97.119","port":11210} (not authenticated) ]
      2021-04-26T18:52:30.412678-07:00 INFO 1602: Client {"ip":"172.23.120.245","port":46154} authenticated as <ud>@cbq-engine</ud>
      2021-04-26T18:52:30.413104-07:00 INFO 1597: DCP connection opened successfully. PRODUCER, INCLUDE_XATTRS [ {"ip":"172.23.97.119","port":47938} - {"ip":"172.23.97.119","port":11210} (System, <ud>@projector</ud>) ]
      

      Haven't seen these errors in the system test so far, so marking it as a regression. Last build on which the test was run is 7.0.0-4955

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty