Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-22184

cbq-engine chokes on pipelined /admin/ping while long running /query in-flight

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • feature-backlog
    • 4.5.1
    • query
    • None
    • Untriaged
    • Unknown

    Description

      Hi,

      While looking into a CBSE, I found weird behavior in cbq-engine (maybe even lower level go http) which is causing some trouble up the stack.

      What is going on is that first we issue a /query that is taking a long time, specifically deleting 2 million documents. While this is in-flight, I send a /admin/ping (or /admin/vitals, doesn't matter) GET request to the same socket . What I would expect is that first, the /query returns once its done and then subsequently I get the response from /admin/ping.

      This IS also what is happening, but instead of waiting for the /query to finish, it returns prematurely, stopping the deletion process and then also returning me the /admin/ping response.

      The question is: why is the pipelined /admin/ping request stopping the currently long-running deletion query prematurely?

      Since JIRA is not so good at formatting, I put the request response flow in also this gist: https://gist.github.com/daschl/b69a0bd93bb91bad1b1826c84d3cc3a9

       
      We send the DELETION:
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 50 4f 53 54 20 2f 71 75 65 72 79 20 48 54 54 50 |POST /query HTTP|
      |00000010| 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 |/1.1..User-Agent|
      |00000020| 3a 20 63 6f 75 63 68 62 61 73 65 2d 6a 61 76 61 |: couchbase-java|
      |00000030| 2d 63 6c 69 65 6e 74 2f 32 2e 33 2e 36 20 28 67 |-client/2.3.6 (g|
      |00000040| 69 74 3a 20 32 2e 33 2e 36 2c 20 63 6f 72 65 3a |it: 2.3.6, core:|
      |00000050| 20 31 2e 33 2e 36 29 20 28 4d 61 63 20 4f 53 20 | 1.3.6) (Mac OS |
      |00000060| 58 2f 31 30 2e 31 31 2e 36 20 78 38 36 5f 36 34 |X/10.11.6 x86_64|
      |00000070| 3b 20 4a 61 76 61 20 48 6f 74 53 70 6f 74 28 54 |; Java HotSpot(T|
      |00000080| 4d 29 20 36 34 2d 42 69 74 20 53 65 72 76 65 72 |M) 64-Bit Server|
      |00000090| 20 56 4d 20 31 2e 38 2e 30 5f 31 30 31 2d 62 31 | VM 1.8.0_101-b1|
      |000000a0| 33 29 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 |3)..Content-Type|
      |000000b0| 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6a 73 |: application/js|
      |000000c0| 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 |on..Content-Leng|
      |000000d0| 74 68 3a 20 36 37 0d 0a 48 6f 73 74 3a 20 31 32 |th: 67..Host: 12|
      |000000e0| 37 2e 30 2e 30 2e 31 3a 38 30 39 33 0d 0a 41 75 |7.0.0.1:8093..Au|
      |000000f0| 74 68 6f 72 69 7a 61 74 69 6f 6e 3a 20 42 61 73 |thorization: Bas|
      |00000100| 69 63 20 5a 47 56 6d 59 58 56 73 64 44 6f 3d 0d |ic ZGVmYXVsdDo=.|
      |00000110| 0a 0d 0a 7b 22 73 74 61 74 65 6d 65 6e 74 22 3a |...{"statement":|
      |00000120| 22 64 65 6c 65 74 65 20 66 72 6f 6d 20 60 64 65 |"delete from `de|
      |00000130| 66 61 75 6c 74 60 20 6c 69 6d 69 74 20 32 30 30 |fault` limit 200|
      |00000140| 30 30 30 30 22 2c 22 74 69 6d 65 6f 75 74 22 3a |0000","timeout":|
      |00000150| 22 31 35 6d 22 7d                               |"15m"}          |
      +--------+-------------------------------------------------+----------------+
       
      After 3 seconds we send the ping:
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 47 45 54 20 2f 61 64 6d 69 6e 2f 70 69 6e 67 20 |GET /admin/ping |
      |00000010| 48 54 54 50 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 |HTTP/1.1..User-A|
      |00000020| 67 65 6e 74 3a 20 63 6f 75 63 68 62 61 73 65 2d |gent: couchbase-|
      |00000030| 6a 61 76 61 2d 63 6c 69 65 6e 74 2f 32 2e 33 2e |java-client/2.3.|
      |00000040| 36 20 28 67 69 74 3a 20 32 2e 33 2e 36 2c 20 63 |6 (git: 2.3.6, c|
      |00000050| 6f 72 65 3a 20 31 2e 33 2e 36 29 20 28 4d 61 63 |ore: 1.3.6) (Mac|
      |00000060| 20 4f 53 20 58 2f 31 30 2e 31 31 2e 36 20 78 38 | OS X/10.11.6 x8|
      |00000070| 36 5f 36 34 3b 20 4a 61 76 61 20 48 6f 74 53 70 |6_64; Java HotSp|
      |00000080| 6f 74 28 54 4d 29 20 36 34 2d 42 69 74 20 53 65 |ot(TM) 64-Bit Se|
      |00000090| 72 76 65 72 20 56 4d 20 31 2e 38 2e 30 5f 31 30 |rver VM 1.8.0_10|
      |000000a0| 31 2d 62 31 33 29 0d 0a 48 6f 73 74 3a 20 31 32 |1-b13)..Host: 12|
      |000000b0| 37 2e 30 2e 30 2e 31 3a 38 30 39 33 0d 0a 0d 0a |7.0.0.1:8093....|
      +--------+-------------------------------------------------+----------------+
       
      Immediately the query stops with a (200 OK!) and a status: closed. It contains the mutation count of the deleted rows up to this point.
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
      |00000010| 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a |.Content-Length:|
      |00000020| 20 33 31 37 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 | 317..Content-Ty|
      |00000030| 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f |pe: application/|
      |00000040| 6a 73 6f 6e 3b 20 76 65 72 73 69 6f 6e 3d 31 2e |json; version=1.|
      |00000050| 36 2e 30 0d 0a 44 61 74 65 3a 20 57 65 64 2c 20 |6.0..Date: Wed, |
      |00000060| 30 34 20 4a 61 6e 20 32 30 31 37 20 31 36 3a 31 |04 Jan 2017 16:1|
      |00000070| 33 3a 31 31 20 47 4d 54 0d 0a 0d 0a 7b 0a 20 20 |3:11 GMT....{.  |
      |00000080| 20 20 22 72 65 71 75 65 73 74 49 44 22 3a 20 22 |  "requestID": "|
      |00000090| 31 63 61 34 36 34 63 66 2d 65 61 61 34 2d 34 62 |1ca464cf-eaa4-4b|
      |000000a0| 37 31 2d 61 34 34 39 2d 63 33 31 66 33 34 39 65 |71-a449-c31f349e|
      |000000b0| 35 30 65 37 22 2c 0a 20 20 20 20 22 73 69 67 6e |50e7",.    "sign|
      |000000c0| 61 74 75 72 65 22 3a 20 6e 75 6c 6c 2c 0a 20 20 |ature": null,.  |
      |000000d0| 20 20 22 72 65 73 75 6c 74 73 22 3a 20 5b 0a 20 |  "results": [. |
      |000000e0| 20 20 20 5d 2c 0a 20 20 20 20 22 73 74 61 74 75 |   ],.    "statu|
      |000000f0| 73 22 3a 20 22 63 6c 6f 73 65 64 22 2c 0a 20 20 |s": "closed",.  |
      |00000100| 20 20 22 6d 65 74 72 69 63 73 22 3a 20 7b 0a 20 |  "metrics": {. |
      |00000110| 20 20 20 20 20 20 20 22 65 6c 61 70 73 65 64 54 |       "elapsedT|
      |00000120| 69 6d 65 22 3a 20 22 32 2e 39 37 36 31 35 37 34 |ime": "2.9761574|
      |00000130| 36 39 73 22 2c 0a 20 20 20 20 20 20 20 20 22 65 |69s",.        "e|
      |00000140| 78 65 63 75 74 69 6f 6e 54 69 6d 65 22 3a 20 22 |xecutionTime": "|
      |00000150| 32 2e 39 37 36 31 33 37 30 35 33 73 22 2c 0a 20 |2.976137053s",. |
      |00000160| 20 20 20 20 20 20 20 22 72 65 73 75 6c 74 43 6f |       "resultCo|
      |00000170| 75 6e 74 22 3a 20 30 2c 0a 20 20 20 20 20 20 20 |unt": 0,.       |
      |00000180| 20 22 72 65 73 75 6c 74 53 69 7a 65 22 3a 20 30 | "resultSize": 0|
      |00000190| 2c 0a 20 20 20 20 20 20 20 20 22 6d 75 74 61 74 |,.        "mutat|
      |000001a0| 69 6f 6e 43 6f 75 6e 74 22 3a 20 32 37 35 36 38 |ionCount": 27568|
      |000001b0| 0a 20 20 20 20 7d 0a 7d 0a                      |.    }.}.       |
      +--------+-------------------------------------------------+----------------+
       
      Then we also get the ping response properly.
       
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
      |00000010| 0a 44 61 74 65 3a 20 57 65 64 2c 20 30 34 20 4a |.Date: Wed, 04 J|
      |00000020| 61 6e 20 32 30 31 37 20 31 36 3a 31 33 3a 31 31 |an 2017 16:13:11|
      |00000030| 20 47 4d 54 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 | GMT..Content-Le|
      |00000040| 6e 67 74 68 3a 20 32 0d 0a 43 6f 6e 74 65 6e 74 |ngth: 2..Content|
      |00000050| 2d 54 79 70 65 3a 20 74 65 78 74 2f 70 6c 61 69 |-Type: text/plai|
      |00000060| 6e 3b 20 63 68 61 72 73 65 74 3d 75 74 66 2d 38 |n; charset=utf-8|
      |00000070| 0d 0a 0d 0a 7b 7d                               |....{}          |
      

      Because its returning with a 200 OK and no error section it looks like a fine response while of course the deletion is not complete.

      Note that I didn't find anything in the normal query logs which would indicate some failure or stack trace.

      Any idea what is going on here?

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-22184
          # Subject Branch Project Status CR V

          Activity

            People

              johan.larson Johan Larson (Inactive)
              daschl Michael Nitschinger
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty