Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44943

timeout reached should abort document bulkGet

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.0.0
    • Cheshire-Cat
    • query
    • None
    • Untriaged
    • 1
    • Yes

    Description

      Query timeout reached under neath bulkGet must abort. It makes query service unusable some time.

      If query timeout is set we pass that as deadline to go-couchbase which in turn sets to TCP connection as deadline (if nothing is set we pass 2m).
      Recent changes ReadTimeout discard the connection (which is right). Then ignored error and retried.
      As already reached deadline, We should not retry. This makes connection again fail and makes things worse taking all cpu resources and even makes network port unusable until WAIT_TIME reached.

      repro:

      Install travel-sample
      \set -timeout "5ms";
      select type, country, x FROM `travel-sample` where type = "airline" limit 10;

      First select will fail. Try again. and check query.log

      Few of them are okay because we do in parallel. But in this case it keep on doing.

      2021-03-13T18:08:42.200-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57814->127.0.0.1:11210: i/o timeout
      2021-03-13T18:08:42.200-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57816->127.0.0.1:11210: i/o timeout
      2021-03-13T18:08:42.200-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57815->127.0.0.1:11210: i/o timeout
      2021-03-13T18:08:42.209-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57821->127.0.0.1:11210: i/o timeout
      2021-03-13T18:08:42.209-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57819->127.0.0.1:11210: i/o timeout
      2021-03-13T18:08:42.209-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57824->127.0.0.1:11210: i/o timeout
      2021-03-13T18:08:42.209-08:00 [ERROR]  Transmit failed in GetBulkAll write tcp 127.0.0.1:57822->127.0.0.1:11210: i/o timeout
      

      After some time (No more usable ports)

      2021-03-13T18:09:31.182-08:00 [INFO] Pool Get returned travel-sample: dial tcp 127.0.0.1:11210: connect: can't assign requested address
      2021-03-13T18:09:31.184-08:00 [INFO] Pool Get returned travel-sample: dial tcp 127.0.0.1:11210: connect: can't assign requested address
      2021-03-13T18:09:31.185-08:00 [INFO] Pool Get returned travel-sample: dial tcp 127.0.0.1:11210: connect: can't assign requested address
      2021-03-13T18:09:31.186-08:00 [INFO] Pool Get returned travel-sample: dial tcp 127.0.0.1:11210: connect: can't assign requested address
      2021-03-13T18:09:31.193-08:00 [INFO] Pool Get returned travel-sample: dial tcp 127.0.0.1:11210: connect: can't assign requested address
      2021-03-13T18:09:31.195-08:00 [INFO] Pool Get returned travel-sample: dial tcp 127.0.0.1:11210: connect: can't assign requested address
      

      I think this even makes memcached crash

      Attachments

        Issue Links

          Activity

            People

              Sitaram.Vemulapalli Sitaram Vemulapalli
              Sitaram.Vemulapalli Sitaram Vemulapalli
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty