Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52516

[BP of MB-48409] - [sanity] intermitent query failure i/o timeout on Debian/Ubuntu

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      during sanity test on single node we see intermittent failures only on Deban on Ubuntu with following I/O error with simple N1QL query:

      [2021-09-10 08:52:56,503] - [tuq:1462] INFO - RUN QUERY SELECT airportname FROM `travel-sample` WHERE type = "airport" AND lower(city) = "lyon" AND country = "France"
      [2021-09-10 08:52:56,503] - [rest_client:4067] INFO - query params : statement=SELECT+airportname+FROM+%60travel-sample%60+WHERE+type+%3D+%22airport%22+AND+lower%28city%29+%3D+%22lyon%22+AND+country+%3D+%22France%22&scan_consistency=REQUEST_PLUS
      [2021-09-10 08:58:56,552] - [rest_client:1055] ERROR - POST http://172.23.96.83:8093/query?statement=SELECT+airportname+FROM+%60travel-sample%60+WHERE+type+%3D+%22airport%22+AND+lower%28city%29+%3D+%22lyon%22+AND+country+%3D+%22France%22&scan_consistency=REQUEST_PLUS body:  headers: {'Authorization': 'Basic am9hb0RvZTpwYXNzd29yZDE='} error: 500 reason: unknown b'{\n"requestID": "494919cd-6fab-4edc-87ed-84284d51b90c",\n"signature": {"airportname":"json"},\n"results": [\n],\n"errors": [{"code":5000,"msg":" read tcp 127.0.0.1:56424-\\u003e127.0.0.1:9101: i/o timeout from [127.0.0.1:9101] - cause:  read tcp 127.0.0.1:56424-\\u003e127.0.0.1:9101: i/o timeout from [127.0.0.1:9101]"}],\n"status": "errors",\n"metrics": {"elapsedTime": "6m0.044188178s","executionTime": "6m0.044109343s","resultCount": 0,"resultSize": 0,"serviceLoad": 1,"errorCount": 1}\n}\n' auth: joaoDoe:password1
       

      This is not easy to repro on-demand ... I have attached [^query.log] and [^indexer.log].

      From query log:

       2021-09-10T08:54:13.201-07:00 [Info] [Queryport-connpool:127.0.0.1:9101] active conns 1, free conns 0
      2021-09-10T08:54:56.535-07:00 [Error] receiving packet: read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout
      2021-09-10T08:54:56.535-07:00 [Error] [GsiScanClient:"127.0.0.1:9101"] req(494919cd-6fab-4edc-87ed-84284d51b90c) connection "127.0.0.1:54772" response transport failed `read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout`
      2021-09-10T08:54:56.535-07:00 [Error] [GsiScanClient:"127.0.0.1:9101"] Scan3Primary(494919cd-6fab-4edc-87ed-84284d51b90c) response failed `read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout`
      2021-09-10T08:54:56.535-07:00 [Warn] scan failed: requestId 494919cd-6fab-4edc-87ed-84284d51b90c queryport 127.0.0.1:9101 inst 7748196516615584655 partition [0]
      2021-09-10T08:54:56.535-07:00 [Warn] Scan failed with error for index 15959379842585527863.  Trying scan again with replica, reqId:494919cd-6fab-4edc-87ed-84284d51b90c :  read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout from [127.0.0.1:9101] ...
      2021-09-10T08:54:56.535-07:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 1.  Partition with instances 0
      2021-09-10T08:54:56.535-07:00 [Warn] Fail to find indexers to satisfy query request.  Trying scan again for index 15959379842585527863, reqId:494919cd-6fab-4edc-87ed-84284d51b90c :  read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout from [127.0.0.1:9101] ...
      

      From index log

      2021-09-10T08:56:56.547-07:00 [Info] SCAN##8 Error in getRequestedIndexSnapshot Index scan timed out
      2021-09-10T08:56:56.547-07:00 [Info] SCAN##8 REQUEST defnId:15959379842585527863, instId:7748196516615584655, index:travel-sample/def_primary, type:scan, partitions:[0], scans: <ud>([{<nil> <nil> 0 scanAll [] <nil>}])</ud>, limit:9223372036854775807, consistency:session_consistency, requestId:494919cd-6fab-4edc-87ed-84284d51b90c
      2021-09-10T08:56:56.547-07:00 [Info] SCAN##8 RESPONSE status:(error = Index scan timed out), requestId: 494919cd-6fab-4edc-87ed-84284d51b90c
      2021-09-10T08:56:56.548-07:00 [Info] [Queryport ":9101"] connection "127.0.0.1:56424" doReceive() ...
      2021-09-10T08:56:56.549-07:00 [Error] transport error between 127.0.0.1:9101->127.0.0.1:56338: write tcp4 127.0.0.1:9101->127.0.0.1:56338: write: connection reset by peer
      2021-09-10T08:56:56.549-07:00 [Info] [Queryport ":9101"] connection 127.0.0.1:56338 closed 

       

      For reference jenkins job: http://server.jenkins.couchbase.com/job/build_sanity_matrix/DISTRO=debian9,TYPE=1node/lastCompletedBuild/testReport/tuqquery.tuq_advisor/QueryAdvisorTests/tuqquery_tuq_advisor_QueryAdvisorTests_test_session_all_bucket_size_300_bucket_name_travel_sample_load_sample_True_skip_primary_index_true_ini_node_conf_ini_cluster_name_node_conf_spec_py_1node_sanity_conf_file_conf_py_1node_sanity_conf_get_cbcollect_info_True_get_couch_dbinfo_True_skip_cleanup_False_skip_log_scan_False_skip_security_scan_False_num_nodes_1_case_number_18_logs_folder__home_jenkins_workspace_build_sanity_matrix_DISTRO_debian9_TYPE_1node_logs_testrunner_21_Sep_10_08_42_29_test_18/

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              amit.kulkarni Amit Kulkarni
              varun.velamuri Varun Velamuri
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty