Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-22184

cbq-engine chokes on pipelined /admin/ping while long running /query in-flight

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • feature-backlog
    • 4.5.1
    • query
    • None
    • Untriaged
    • Unknown

    Description

      Hi,

      While looking into a CBSE, I found weird behavior in cbq-engine (maybe even lower level go http) which is causing some trouble up the stack.

      What is going on is that first we issue a /query that is taking a long time, specifically deleting 2 million documents. While this is in-flight, I send a /admin/ping (or /admin/vitals, doesn't matter) GET request to the same socket . What I would expect is that first, the /query returns once its done and then subsequently I get the response from /admin/ping.

      This IS also what is happening, but instead of waiting for the /query to finish, it returns prematurely, stopping the deletion process and then also returning me the /admin/ping response.

      The question is: why is the pipelined /admin/ping request stopping the currently long-running deletion query prematurely?

      Since JIRA is not so good at formatting, I put the request response flow in also this gist: https://gist.github.com/daschl/b69a0bd93bb91bad1b1826c84d3cc3a9

       
      We send the DELETION:
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 50 4f 53 54 20 2f 71 75 65 72 79 20 48 54 54 50 |POST /query HTTP|
      |00000010| 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 |/1.1..User-Agent|
      |00000020| 3a 20 63 6f 75 63 68 62 61 73 65 2d 6a 61 76 61 |: couchbase-java|
      |00000030| 2d 63 6c 69 65 6e 74 2f 32 2e 33 2e 36 20 28 67 |-client/2.3.6 (g|
      |00000040| 69 74 3a 20 32 2e 33 2e 36 2c 20 63 6f 72 65 3a |it: 2.3.6, core:|
      |00000050| 20 31 2e 33 2e 36 29 20 28 4d 61 63 20 4f 53 20 | 1.3.6) (Mac OS |
      |00000060| 58 2f 31 30 2e 31 31 2e 36 20 78 38 36 5f 36 34 |X/10.11.6 x86_64|
      |00000070| 3b 20 4a 61 76 61 20 48 6f 74 53 70 6f 74 28 54 |; Java HotSpot(T|
      |00000080| 4d 29 20 36 34 2d 42 69 74 20 53 65 72 76 65 72 |M) 64-Bit Server|
      |00000090| 20 56 4d 20 31 2e 38 2e 30 5f 31 30 31 2d 62 31 | VM 1.8.0_101-b1|
      |000000a0| 33 29 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 70 65 |3)..Content-Type|
      |000000b0| 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6a 73 |: application/js|
      |000000c0| 6f 6e 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 |on..Content-Leng|
      |000000d0| 74 68 3a 20 36 37 0d 0a 48 6f 73 74 3a 20 31 32 |th: 67..Host: 12|
      |000000e0| 37 2e 30 2e 30 2e 31 3a 38 30 39 33 0d 0a 41 75 |7.0.0.1:8093..Au|
      |000000f0| 74 68 6f 72 69 7a 61 74 69 6f 6e 3a 20 42 61 73 |thorization: Bas|
      |00000100| 69 63 20 5a 47 56 6d 59 58 56 73 64 44 6f 3d 0d |ic ZGVmYXVsdDo=.|
      |00000110| 0a 0d 0a 7b 22 73 74 61 74 65 6d 65 6e 74 22 3a |...{"statement":|
      |00000120| 22 64 65 6c 65 74 65 20 66 72 6f 6d 20 60 64 65 |"delete from `de|
      |00000130| 66 61 75 6c 74 60 20 6c 69 6d 69 74 20 32 30 30 |fault` limit 200|
      |00000140| 30 30 30 30 22 2c 22 74 69 6d 65 6f 75 74 22 3a |0000","timeout":|
      |00000150| 22 31 35 6d 22 7d                               |"15m"}          |
      +--------+-------------------------------------------------+----------------+
       
      After 3 seconds we send the ping:
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 47 45 54 20 2f 61 64 6d 69 6e 2f 70 69 6e 67 20 |GET /admin/ping |
      |00000010| 48 54 54 50 2f 31 2e 31 0d 0a 55 73 65 72 2d 41 |HTTP/1.1..User-A|
      |00000020| 67 65 6e 74 3a 20 63 6f 75 63 68 62 61 73 65 2d |gent: couchbase-|
      |00000030| 6a 61 76 61 2d 63 6c 69 65 6e 74 2f 32 2e 33 2e |java-client/2.3.|
      |00000040| 36 20 28 67 69 74 3a 20 32 2e 33 2e 36 2c 20 63 |6 (git: 2.3.6, c|
      |00000050| 6f 72 65 3a 20 31 2e 33 2e 36 29 20 28 4d 61 63 |ore: 1.3.6) (Mac|
      |00000060| 20 4f 53 20 58 2f 31 30 2e 31 31 2e 36 20 78 38 | OS X/10.11.6 x8|
      |00000070| 36 5f 36 34 3b 20 4a 61 76 61 20 48 6f 74 53 70 |6_64; Java HotSp|
      |00000080| 6f 74 28 54 4d 29 20 36 34 2d 42 69 74 20 53 65 |ot(TM) 64-Bit Se|
      |00000090| 72 76 65 72 20 56 4d 20 31 2e 38 2e 30 5f 31 30 |rver VM 1.8.0_10|
      |000000a0| 31 2d 62 31 33 29 0d 0a 48 6f 73 74 3a 20 31 32 |1-b13)..Host: 12|
      |000000b0| 37 2e 30 2e 30 2e 31 3a 38 30 39 33 0d 0a 0d 0a |7.0.0.1:8093....|
      +--------+-------------------------------------------------+----------------+
       
      Immediately the query stops with a (200 OK!) and a status: closed. It contains the mutation count of the deleted rows up to this point.
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
      |00000010| 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 3a |.Content-Length:|
      |00000020| 20 33 31 37 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 | 317..Content-Ty|
      |00000030| 70 65 3a 20 61 70 70 6c 69 63 61 74 69 6f 6e 2f |pe: application/|
      |00000040| 6a 73 6f 6e 3b 20 76 65 72 73 69 6f 6e 3d 31 2e |json; version=1.|
      |00000050| 36 2e 30 0d 0a 44 61 74 65 3a 20 57 65 64 2c 20 |6.0..Date: Wed, |
      |00000060| 30 34 20 4a 61 6e 20 32 30 31 37 20 31 36 3a 31 |04 Jan 2017 16:1|
      |00000070| 33 3a 31 31 20 47 4d 54 0d 0a 0d 0a 7b 0a 20 20 |3:11 GMT....{.  |
      |00000080| 20 20 22 72 65 71 75 65 73 74 49 44 22 3a 20 22 |  "requestID": "|
      |00000090| 31 63 61 34 36 34 63 66 2d 65 61 61 34 2d 34 62 |1ca464cf-eaa4-4b|
      |000000a0| 37 31 2d 61 34 34 39 2d 63 33 31 66 33 34 39 65 |71-a449-c31f349e|
      |000000b0| 35 30 65 37 22 2c 0a 20 20 20 20 22 73 69 67 6e |50e7",.    "sign|
      |000000c0| 61 74 75 72 65 22 3a 20 6e 75 6c 6c 2c 0a 20 20 |ature": null,.  |
      |000000d0| 20 20 22 72 65 73 75 6c 74 73 22 3a 20 5b 0a 20 |  "results": [. |
      |000000e0| 20 20 20 5d 2c 0a 20 20 20 20 22 73 74 61 74 75 |   ],.    "statu|
      |000000f0| 73 22 3a 20 22 63 6c 6f 73 65 64 22 2c 0a 20 20 |s": "closed",.  |
      |00000100| 20 20 22 6d 65 74 72 69 63 73 22 3a 20 7b 0a 20 |  "metrics": {. |
      |00000110| 20 20 20 20 20 20 20 22 65 6c 61 70 73 65 64 54 |       "elapsedT|
      |00000120| 69 6d 65 22 3a 20 22 32 2e 39 37 36 31 35 37 34 |ime": "2.9761574|
      |00000130| 36 39 73 22 2c 0a 20 20 20 20 20 20 20 20 22 65 |69s",.        "e|
      |00000140| 78 65 63 75 74 69 6f 6e 54 69 6d 65 22 3a 20 22 |xecutionTime": "|
      |00000150| 32 2e 39 37 36 31 33 37 30 35 33 73 22 2c 0a 20 |2.976137053s",. |
      |00000160| 20 20 20 20 20 20 20 22 72 65 73 75 6c 74 43 6f |       "resultCo|
      |00000170| 75 6e 74 22 3a 20 30 2c 0a 20 20 20 20 20 20 20 |unt": 0,.       |
      |00000180| 20 22 72 65 73 75 6c 74 53 69 7a 65 22 3a 20 30 | "resultSize": 0|
      |00000190| 2c 0a 20 20 20 20 20 20 20 20 22 6d 75 74 61 74 |,.        "mutat|
      |000001a0| 69 6f 6e 43 6f 75 6e 74 22 3a 20 32 37 35 36 38 |ionCount": 27568|
      |000001b0| 0a 20 20 20 20 7d 0a 7d 0a                      |.    }.}.       |
      +--------+-------------------------------------------------+----------------+
       
      Then we also get the ping response properly.
       
       
               +-------------------------------------------------+
               |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
      +--------+-------------------------------------------------+----------------+
      |00000000| 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
      |00000010| 0a 44 61 74 65 3a 20 57 65 64 2c 20 30 34 20 4a |.Date: Wed, 04 J|
      |00000020| 61 6e 20 32 30 31 37 20 31 36 3a 31 33 3a 31 31 |an 2017 16:13:11|
      |00000030| 20 47 4d 54 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 | GMT..Content-Le|
      |00000040| 6e 67 74 68 3a 20 32 0d 0a 43 6f 6e 74 65 6e 74 |ngth: 2..Content|
      |00000050| 2d 54 79 70 65 3a 20 74 65 78 74 2f 70 6c 61 69 |-Type: text/plai|
      |00000060| 6e 3b 20 63 68 61 72 73 65 74 3d 75 74 66 2d 38 |n; charset=utf-8|
      |00000070| 0d 0a 0d 0a 7b 7d                               |....{}          |
      

      Because its returning with a 200 OK and no error section it looks like a fine response while of course the deletion is not complete.

      Note that I didn't find anything in the normal query logs which would indicate some failure or stack trace.

      Any idea what is going on here?

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              johan.larson Johan Larson (Inactive)
              daschl Michael Nitschinger
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty