Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48409

[sanity] intermitent query failure i/o timeout on Debian/Ubuntu

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Critical
    • 7.1.0
    • 7.0.2, 7.0.4
    • secondary-index
    • 7.0.2-6667
    • Untriaged
    • 1
    • Unknown

    Description

      during sanity test on single node we see intermittent failures only on Deban on Ubuntu with following I/O error with simple N1QL query:

      [2021-09-10 08:52:56,503] - [tuq:1462] INFO - RUN QUERY SELECT airportname FROM `travel-sample` WHERE type = "airport" AND lower(city) = "lyon" AND country = "France"
      [2021-09-10 08:52:56,503] - [rest_client:4067] INFO - query params : statement=SELECT+airportname+FROM+%60travel-sample%60+WHERE+type+%3D+%22airport%22+AND+lower%28city%29+%3D+%22lyon%22+AND+country+%3D+%22France%22&scan_consistency=REQUEST_PLUS
      [2021-09-10 08:58:56,552] - [rest_client:1055] ERROR - POST http://172.23.96.83:8093/query?statement=SELECT+airportname+FROM+%60travel-sample%60+WHERE+type+%3D+%22airport%22+AND+lower%28city%29+%3D+%22lyon%22+AND+country+%3D+%22France%22&scan_consistency=REQUEST_PLUS body:  headers: {'Authorization': 'Basic am9hb0RvZTpwYXNzd29yZDE='} error: 500 reason: unknown b'{\n"requestID": "494919cd-6fab-4edc-87ed-84284d51b90c",\n"signature": {"airportname":"json"},\n"results": [\n],\n"errors": [{"code":5000,"msg":" read tcp 127.0.0.1:56424-\\u003e127.0.0.1:9101: i/o timeout from [127.0.0.1:9101] - cause:  read tcp 127.0.0.1:56424-\\u003e127.0.0.1:9101: i/o timeout from [127.0.0.1:9101]"}],\n"status": "errors",\n"metrics": {"elapsedTime": "6m0.044188178s","executionTime": "6m0.044109343s","resultCount": 0,"resultSize": 0,"serviceLoad": 1,"errorCount": 1}\n}\n' auth: joaoDoe:password1
       

      This is not easy to repro on-demand ... I have attached query.log and indexer.log.

      From query log:

       2021-09-10T08:54:13.201-07:00 [Info] [Queryport-connpool:127.0.0.1:9101] active conns 1, free conns 0
      2021-09-10T08:54:56.535-07:00 [Error] receiving packet: read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout
      2021-09-10T08:54:56.535-07:00 [Error] [GsiScanClient:"127.0.0.1:9101"] req(494919cd-6fab-4edc-87ed-84284d51b90c) connection "127.0.0.1:54772" response transport failed `read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout`
      2021-09-10T08:54:56.535-07:00 [Error] [GsiScanClient:"127.0.0.1:9101"] Scan3Primary(494919cd-6fab-4edc-87ed-84284d51b90c) response failed `read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout`
      2021-09-10T08:54:56.535-07:00 [Warn] scan failed: requestId 494919cd-6fab-4edc-87ed-84284d51b90c queryport 127.0.0.1:9101 inst 7748196516615584655 partition [0]
      2021-09-10T08:54:56.535-07:00 [Warn] Scan failed with error for index 15959379842585527863.  Trying scan again with replica, reqId:494919cd-6fab-4edc-87ed-84284d51b90c :  read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout from [127.0.0.1:9101] ...
      2021-09-10T08:54:56.535-07:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 1.  Partition with instances 0
      2021-09-10T08:54:56.535-07:00 [Warn] Fail to find indexers to satisfy query request.  Trying scan again for index 15959379842585527863, reqId:494919cd-6fab-4edc-87ed-84284d51b90c :  read tcp 127.0.0.1:54772->127.0.0.1:9101: i/o timeout from [127.0.0.1:9101] ...
      

      From index log

      2021-09-10T08:56:56.547-07:00 [Info] SCAN##8 Error in getRequestedIndexSnapshot Index scan timed out
      2021-09-10T08:56:56.547-07:00 [Info] SCAN##8 REQUEST defnId:15959379842585527863, instId:7748196516615584655, index:travel-sample/def_primary, type:scan, partitions:[0], scans: <ud>([{<nil> <nil> 0 scanAll [] <nil>}])</ud>, limit:9223372036854775807, consistency:session_consistency, requestId:494919cd-6fab-4edc-87ed-84284d51b90c
      2021-09-10T08:56:56.547-07:00 [Info] SCAN##8 RESPONSE status:(error = Index scan timed out), requestId: 494919cd-6fab-4edc-87ed-84284d51b90c
      2021-09-10T08:56:56.548-07:00 [Info] [Queryport ":9101"] connection "127.0.0.1:56424" doReceive() ...
      2021-09-10T08:56:56.549-07:00 [Error] transport error between 127.0.0.1:9101->127.0.0.1:56338: write tcp4 127.0.0.1:9101->127.0.0.1:56338: write: connection reset by peer
      2021-09-10T08:56:56.549-07:00 [Info] [Queryport ":9101"] connection 127.0.0.1:56338 closed 

       

      For reference jenkins job: http://server.jenkins.couchbase.com/job/build_sanity_matrix/DISTRO=debian9,TYPE=1node/lastCompletedBuild/testReport/tuqquery.tuq_advisor/QueryAdvisorTests/tuqquery_tuq_advisor_QueryAdvisorTests_test_session_all_bucket_size_300_bucket_name_travel_sample_load_sample_True_skip_primary_index_true_ini_node_conf_ini_cluster_name_node_conf_spec_py_1node_sanity_conf_file_conf_py_1node_sanity_conf_get_cbcollect_info_True_get_couch_dbinfo_True_skip_cleanup_False_skip_log_scan_False_skip_security_scan_False_num_nodes_1_case_number_18_logs_folder__home_jenkins_workspace_build_sanity_matrix_DISTRO_debian9_TYPE_1node_logs_testrunner_21_Sep_10_08_42_29_test_18/

       

      Attachments

        1. query.log
          1.41 MB
        2. indexer.log
          13.88 MB
        3. collectinfo-2021-09-10T214901-ns_1@cb.local.zip
          20.94 MB
        4. 172.23.120.229-20211216-0126-diag.zip
          10.44 MB
        5. 172.23.120.233-20220518-0154-diag.zip
          27.60 MB

        Issue Links

          For Gerrit Dashboard: MB-48409
          # Subject Branch Project Status CR V

          Activity

            People

              pierre.regazzoni Pierre Regazzoni
              pierre.regazzoni Pierre Regazzoni
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty