Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56290

[System Test] I/O timeout errors seen in query log

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      Supportal: https://supportal.couchbase.com/snapshot/2f8a3f63cbb7b06601ba1dc4d878a8e2::0

      Seeing quite a few timeout messages.

      172.23.97.100 : query
       
      Sample result for error message ' read tcp 172.23.97.67:38926->172.23.106.171:9101: i/o timeout read tcp 172.23.97.67:41948->172.23.97.108:9101: i/o timeout from [172.23.97.108:9101 172.23.106.171:9101]' at time 2023-04-03T04:42:23.532-07:00: [{u'completed_requests': {u'optimizerEstimates': None, u'n1qlFeatCtrl': 76, u'userAgent': u'python-requests/2.7.0 Jython/2.7.0 Java/1.8.0_332', u'errorCount': 1, u'errors': [{u'code': 5000, u'icause': u' read tcp 172.23.97.67:38926->172.23.106.171:9101: i/o timeout read tcp 172.23.97.67:41948->172.23.97.108:9101: i/o timeout from [172.23.97.108:9101 172.23.106.171:9101]', u'message': u' read tcp 172.23.97.67:38926->172.23.106.171:9101: i/o timeout read tcp 172.23.97.67:41948->172.23.97.108:9101: i/o timeout from [172.23.97.108:9101 172.23.106.171:9101]', u'caller': u'secondary_index:1898', u'key': u'Internal Error'}], u'phaseOperators': {u'authorize': 1, u'stream': 1, u'filter': 1, u'indexScan': 1, u'fetch': 1, u'delete': 1}, u'txid': u'06649152
       -081f-482b-bbd4-7adce60ea81d', u'elapsedTime': u'12m2.745912226s', u'state': u'completed', u'serviceTime': u'12m2.745819059s', u'statement': u'delete from `default`:`bucket3`.`scope_3`.`coll_4` where free_breakfast=True limit 200', u'requestId': u'6e5eb4ed-70c9-4b2f-8d03-b490bef0a78d', u'node': u'172.23.97.67:8091', u'users': u'Administrator', u'statementType': u'DELETE', u'resultSize': 0, u'resultCount': 0, u'requestTime': u'2023-04-03T04:42:23.532-07:00', u'useCBO': True, u'transactionElapsedTime': u'12m2.769019605s', u'scanConsistency': u'scan_plus', u'phaseTimes': {u'authorize': u'1.306234ms', u'parse': u'787.287\xb5s', u'run': u'12m2.739964432s', u'stream': u'4.272\xb5s', u'instantiate': u'34.692\xb5s', u'filter': u'16.377\xb5s', u'plan': u'4.979972ms', u'indexScan': u'12m2.73841821s', u'fetch': u'9.956\xb5s', u'delete': u'9.937\xb5s'}, u'remoteAddr': u'172.23.106.127:33684', u'transactionRemainingTime': u'7m3.230914054s'}}]
       
       
      172.23.97.100 : query
       
      Sample result for error message ' read tcp 172.23.97.67:56982->172.23.105.122:9101: i/o timeout from [172.23.105.122:9101]' at time 2023-04-03T04:45:45.159-07:00: [{u'completed_requests': {u'optimizerEstimates': {u'cardinality': 137624.8190468445, u'cost': 1173459.8513670836}, u'n1qlFeatCtrl': 76, u'userAgent': u'python-requests/2.7.0 Jython/2.7.0 Java/1.8.0_332', u'errorCount': 1, u'errors': [{u'code': 5000, u'icause': u' read tcp 172.23.97.67:56982->172.23.105.122:9101: i/o timeout from [172.23.105.122:9101]', u'message': u' read tcp 172.23.97.67:56982->172.23.105.122:9101: i/o timeout from [172.23.105.122:9101]', u'caller': u'secondary_index:1898', u'key': u'Internal Error'}], u'phaseOperators': {u'authorize': 1, u'project': 1, u'stream': 1, u'filter': 1, u'indexScan': 1, u'fetch': 1}, u'txid': u'3c77cfc6-2209-49ff-b2aa-7b9b162e0fc4', u'elapsedTime': u'14m19.142758462s', u'state': u'completed', u'serviceTime': u'14m19.142622583s', u'statement': u'SELECT * FROM `default`:`bucket3`.
       `scope_5`.`coll_9` AS d WHERE ANY r IN d.reviews SATISFIES r.ratings.Rooms = 3 and r.ratings.Cleanliness > 1 END AND free_parking = True', u'requestId': u'1d5e3e25-02ee-454f-af0f-3029e1166de6', u'node': u'172.23.97.67:8091', u'users': u'Administrator', u'statementType': u'SELECT', u'resultSize': 0, u'resultCount': 0, u'requestTime': u'2023-04-03T04:45:45.159-07:00', u'useCBO': True, u'transactionElapsedTime': u'14m19.176703385s', u'scanConsistency': u'scan_plus', u'phaseTimes': {u'authorize': u'1.395324ms', u'parse': u'1.886124ms', u'run': u'14m5.953244961s', u'stream': u'4.355\xb5s', u'project': u'39.803\xb5s', u'instantiate': u'33.672\xb5s', u'filter': u'28.501\xb5s', u'plan': u'13.187383746s', u'indexScan': u'14m5.951512603s', u'fetch': u'11.942\xb5s'}, u'remoteAddr': u'172.23.106.127:58286', u'transactionRemainingTime': u'2m7.823209871s'}}]
       
       
      172.23.97.100 : query
       
      Sample result for error message 'GSI Drop()' at time 2023-04-03T04:44:48.322-07:00: [{u'completed_requests': {u'node': u'172.23.97.100:8091', u'users': u'Administrator', u'state': u'completed', u'errors': [{u'code': 5000, u'icause': u'Fail to drop index on some indexer nodes.  Error=Request timed out. Index server may still be processing this request. Please check the status after sometime or retry.\n.  If cluster or indexer is currently unavailable, the operation will automatically retry after cluster is back to normal.', u'message': u'GSI Drop()', u'caller': u'secondary_index:1062', u'key': u'Internal Error'}], u'scanConsistency': u'unbounded', u'requestTime': u'2023-04-03T04:44:48.322-07:00', u'statementType': u'DROP_INDEX', u'phaseOperators': {u'authorize': 1, u'stream': 1}, u'serviceTime': u'2m40.180794894s', u'remoteAddr': u'172.23.106.127:36128', u'elapsedTime': u'2m40.180888525s', u'resultSize': 0, u'n1qlFeatCtrl': 76, u'statement': u'DROP INDEX `idx8_P8LQ1C7FZX_idxprefix` on
        `bucket4`.`_default`.`_default`;', u'requestId': u'b46140da-c030-42e4-9648-ec1b1681f139', u'clientContextID': u'2b3abea0d970209d', u'userAgent': u'libcouchbase/3.0.6 (Linux-5.10.76-linuxkit; x86_64; GNU 9.4.0) PYCBC/3.0.10', u'useCBO': True, u'resultCount': 0, u'errorCount': 1, u'phaseTimes': {u'authorize': u'2.336138ms', u'run': u'2m40.17611155s', u'stream': u'18.678\xb5s', u'instantiate': u'19.549\xb5s', u'parse': u'609.836\xb5s', u'plan': u'4.015229ms'}}}]
       
       
      172.23.97.100 : query
       
      Sample result for error message 'Index scan timed out' at time 2023-04-03T04:44:50.079-07:00: [{u'completed_requests': {u'optimizerEstimates': None, u'n1qlFeatCtrl': 76, u'userAgent': u'python-requests/2.7.0 Jython/2.7.0 Java/1.8.0_332', u'errorCount': 1, u'errors': [{u'code': 12015, u'icause': u' Index scan timed out', u'message': u'Index scan timed out', u'caller': u'secondary_index:1891', u'key': u'datastore.couchbase.index_scan_timeout'}], u'phaseOperators': {u'authorize': 1, u'project': 1, u'stream': 1, u'sort': 1, u'filter': 1, u'indexScan': 1}, u'txid': u'd283f2a5-db4d-4863-bc6b-fcd805925458', u'elapsedTime': u'2m5.551614262s', u'state': u'completed', u'serviceTime': u'2m5.551494155s', u'statement': u'SELECT h.name,h.country,h.city,h.price,DENSE_RANK() OVER (window1) AS `rank` FROM `default`:`bucket3`.`scope_0`.`coll_15` AS h WHERE h.price IS NOT NULL WINDOW window1 AS ( PARTITION BY h.country ORDER BY h.price NULLS LAST) LIMIT 10;', u'requestId': u'ef010326-6874-4ee2-89ed-e57
       f417e5f51', u'node': u'172.23.97.67:8091', u'users': u'Administrator', u'phaseCounts': {u'filter': 776343, u'sort': 776343, u'indexScan': 776343}, u'statementType': u'SELECT', u'resultSize': 995, u'resultCount': 10, u'requestTime': u'2023-04-03T04:44:50.079-07:00', u'useCBO': True, u'transactionElapsedTime': u'2m41.143157319s', u'scanConsistency': u'unbounded', u'phaseTimes': {u'authorize': u'42.852\xb5s', u'parse': u'2.400815ms', u'run': u'2m5.535661424s', u'stream': u'175.055\xb5s', u'project': u'208.889\xb5s', u'instantiate': u'38.89\xb5s', u'sort': u'5.231616832s', u'filter': u'3.729127458s', u'plan': u'13.282393ms', u'indexScan': u'2m0.014885618s'}, u'remoteAddr': u'172.23.106.127:45985', u'transactionRemainingTime': u'16m42.856741153s'}}]
       
       
      172.23.97.100 : query
       
      Sample result for error message 'Error getting documents for update statistics.
      Failed to get random document.' at time 2023-04-03T04:24:49.353-07:00: [{u'completed_requests': {u'optimizerEstimates': None, u'n1qlFeatCtrl': 76, u'userAgent': u'libcouchbase/3.0.6 (Linux-5.10.76-linuxkit; x86_64; GNU 9.4.0) PYCBC/3.0.10', u'errorCount': 1, u'errors': [{u'code': 5360, u'message': u'Error getting documents for update statistics.\nFailed to get random document.', u'caller': u'upd_statistics:372', u'key': u'execution.update_statistics'}], u'phaseOperators': {u'authorize': 1, u'stream': 1}, u'elapsedTime': u'35.022005859s', u'state': u'completed', u'clientContextID': u'79459e958fc062ad', u'serviceTime': u'35.02191978s', u'statement': u"UPDATE STATISTICS FOR `bucket1`.`scope_3`.`coll_17` INDEX ALL WITH {'update_statistics_timeout': 0}  ;", u'requestId': u'71c14c45-10c5-46dc-a8be-a7a617f517d0', u'node': u'172.23.97.67:8091', u'users': u'Administrator', u'statementType': u'UPDATE_STATISTICS', u'resultSize': 0, u'resultCount': 0, u'requestTime': u'2023-04-03T04:24:49.353-07
       :00', u'useCBO': True, u'scanConsistency': u'unbounded', u'remoteAddr': u'172.23.106.127:35220', u'phaseTimes': {u'authorize': u'27.4\xb5s', u'run': u'35.021204684s', u'stream': u'3.383\xb5s', u'instantiate': u'28.339\xb5s', u'parse': u'632.071\xb5s', u'plan': u'25.454\xb5s'}}}]
      

      These timeout issues started around 2023-04-03T04:00:00 and we have been seeing these issues even in the latest iterations ( about 15 hours afterwards). I just checked the CPU usage and it's over 90% for about 5 hours starting from the time we started seeing i/o timeout errors.

      Not really sure if it's due to high memory usage alluded to in this comment

      cbcollect ->

               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.105.122.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.106.171.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.106.30.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.96.198.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.96.230.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.96.245.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.97.100.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.97.108.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.97.109.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.97.66.zip
               url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1680524083/collectinfo-2023-04-03T121447-ns_1%40172.23.97.67.zip
      

      Attachments

        1. ts_queue_size_122.png
          106 kB
          Sai Krishna Teja
        2. avg_resident_percent.png
          84 kB
          Sai Krishna Teja
        3. memory_stats.png
          243 kB
          Sai Krishna Teja
        4. cpu_utilization.png
          204 kB
          Sai Krishna Teja
        5. 96_198_cpu_utilization_rate.png
          157 kB
          Sai Krishna Teja
        6. 96_198_memory_utilization.png
          86 kB
          Sai Krishna Teja
        7. 96_198_ChannelLenghts.png
          224 kB
          Sai Krishna Teja
        8. projector_utilization.png
          503 kB
          Sai Krishna Teja

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sai.teja Sai Krishna Teja
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty