Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-1445

Spurious segfaults due to query metrics

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.1
    • library
    • None
    • 1
    • SDK32: Eventing Mgmt, SDK34: CB++, OBO, Connect Prep

    Description

      It appears that under certain circumstances, lcb_QUERY_HANDLE's are left active after the associated instance is destroyed. This causes the query destructor which indirectly calls the metrics interface to fail and segfault. Interestingly, there is a related bug I've seen (but also cannot reliably reproduce) where executed queries never invoke the callback, even though wireshark shows the full response was received.

      * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x13c)
          frame #0: 0x000000010c40f875 couchbase_impl.node`record_http_op_latency(char const*, char const*, lcb_st*, unsigned long long) [inlined] record_op_latency(op=<unavailable>, svc=<unavailable>, settings=0x0000000000000000, start=<unavailable>) at metrics-internal.cc:41:9 [opt]
         38
         39  	void record_op_latency(const char *op, const char *svc, lcb_settings_st *settings, hrtime_t start)
         40  	{
      -> 41  	    if (settings->op_metrics_enabled && settings->meter) {
         42  	        lcbmetrics_TAG tags[2] = {{METRICS_SVC_TAG_NAME, svc ? svc : ""}, {METRICS_OP_TAG_NAME, op ? op : ""}};
         43  	        auto recorder = settings->meter->value_recorder_(settings->meter, METRICS_OPS_METER_NAME, tags, 2);
         44  	        if (recorder) {
      Target 0: (node) stopped.
      (lldb) bt
      * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x13c)
        * frame #0: 0x000000010c40f875 couchbase_impl.node`record_http_op_latency(char const*, char const*, lcb_st*, unsigned long long) [inlined] record_op_latency(op=<unavailable>, svc=<unavailable>, settings=0x0000000000000000, start=<unavailable>) at metrics-internal.cc:41:9 [opt]
          frame #1: 0x000000010c40f875 couchbase_impl.node`record_http_op_latency(op=0x0000000000000000, svc="query", instance=0x000000010314ed30, start=0) at metrics-internal.cc:62 [opt]
          frame #2: 0x000000010c41e7a2 couchbase_impl.node`lcb_QUERY_HANDLE_::invoke_row(this=0x0000000102f87880, resp=0x00007ffeefbff150, is_last=<unavailable>) at query_handle.cc:223:13 [opt]
          frame #3: 0x000000010c421061 couchbase_impl.node`lcb_QUERY_HANDLE_::~lcb_QUERY_HANDLE_(this=0x0000000102f87880) at query_handle.cc:555:9 [opt]
          frame #4: 0x000000010c4213be couchbase_impl.node`lcb_QUERY_HANDLE_::~lcb_QUERY_HANDLE_() [inlined] lcb_QUERY_HANDLE_::~lcb_QUERY_HANDLE_(this=0x0000000102f87880) at query_handle.cc:552:1 [opt]
          frame #5: 0x000000010c4213b9 couchbase_impl.node`lcb_QUERY_HANDLE_::~lcb_QUERY_HANDLE_(this=0x0000000102f87880) at query_handle.cc:552 [opt]
          frame #6: 0x000000010c40dbd1 couchbase_impl.node`timer_callback(sock=<unavailable>, which=<unavailable>, arg=0x0000000102f67e40) at timer.cc:43:5 [opt]
       
      (lldb) f 1
      frame #1: 0x000000010c40f875 couchbase_impl.node`record_http_op_latency(op=0x0000000000000000, svc="query", instance=0x000000010314ed30, start=0) at metrics-internal.cc:62 [opt]
         59
         60  	void record_http_op_latency(const char *op, const char *svc, lcb_INSTANCE *instance, hrtime_t start)
         61  	{
      -> 62  	    record_op_latency(op, svc, instance->settings, start);
         63  	}
      (lldb) p instance->settings
      (lcb_settings *) $0 = nullptr
       
      (lldb) p *instance
      (lcb_INSTANCE) $2 = {
        cmdq = {
          pipelines = nullptr
          scheds = 0xd000000010318118 ""
          ctxenter = 8
          npipelines = 0
          _npipelines_ex = 0
          seq = 0
          config = nullptr
          cqdata = 0x0000000000000000
          fallback = nullptr
        }
        cookie = 0x0000000000000000
        confmon = nullptr
        mc_nodes = nullptr
        ht_nodes = nullptr
        cur_configinfo = nullptr
        bs_state = nullptr
        callbacks = {
          v3callbacks = {
            [0] = 0x0000000000000000
            [1] = 0x0000000000000000
            [2] = 0x0008000000000000 (0x0008000000000000)
            [3] = 0x0000000000000002 (0x0000000000000002)
            [4] = 0x0000000000000000
            [5] = 0x000000010c3ed460 (couchbase_impl.node`iops_lcb_dtor(lcb_io_opt_st*) at plugin-libuv.c:40)
            [6] = 0x0000000000000000
            [7] = 0x0000000000000059 (0x0000000000000059)
            [8] = 0x000000010c3ed4f0 (couchbase_impl.node`wire_iops2(int, lcb_loopprocs_st*, lcb_timerprocs_st*, lcb_bsdprocs_st*, lcb_evprocs_st*, lcb_completion_procs*, lcb_iomodel_t*) at plugin-libuv.c:630)
            [9] = 0x00000001031252b0 (0x00000001031252b0)
            [10] = 0x0000000000000000
            [11] = 0x0000000000000000
            [12] = 0x0000000000000000
            [13] = 0x0000000000000000
            [14] = 0x0000000000000000
            [15] = 0x0000000000000000
            [16] = 0x0000000000000000
            [17] = 0x0000000000000000
            [18] = 0x0000000000000000
            [19] = 0x0000000000000000
            [20] = 0x0000000000000000
            [21] = 0x0000000000000000
            [22] = 0x0000000000000000
            [23] = 0x0000000000000000
            [24] = 0x0000000000000000
          }
          errmap = 0x0000000000000000
          bootstrap = 0x0000000000000000
          pktfwd = 0x0000000101ce4d40 (node`default_loop_struct)
          pktflushed = 0x0000000100000001 (node`_mh_execute_header + 1)
          open = 0x0000000000000001 (0x0000000000000001)
        }
        kv_timings = nullptr
        pendops = {
          items = {
            [0] = 0x000000010315f460 size=4294967295
            [1] = 0x0000000101ce4d40 size=4294967295
            [2] = 0x0000000000000009 size=0
          }
          count = 0
        }
        wait = 53045520
        memd_sockpool = 0x000000010313b320
        http_sockpool = nullptr
        last_error = LCB_SUCCESS
        settings = nullptr
        iotable = nullptr
        retryq = nullptr
        scratch = ""
        vbguess = 0x000000010c3e93a0
        n1ql_cache = 0x000000010323cba8
        dcpinfo = 0x0000000101ce4f68
        dtor_timer = 0x100000001031667d
        btype = 0x1031149c
        collcache = 0xd000000010316b30
        crypto = 0x000000010c3eae10 size=4294967295
        deferred_operations = 0x000000010314a6b0 size=4294967295
      }
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Note, per discussion earlier today, the Ottoman tests regularly trigger this issue. According to Brett Lawson, 90% of the time running that test suite will run into this.

            ingenthr Matt Ingenthron added a comment - Note, per discussion earlier today, the Ottoman tests regularly trigger this issue. According to Brett Lawson , 90% of the time running that test suite will run into this.

            So I've managed to build standalone reproduction of the issue by modifying libuv example main.c

            diff --git i/example/libuvdirect/main.c w/example/libuvdirect/main.c
            index 4e359648..ade86e04 100644
            --- i/example/libuvdirect/main.c
            +++ w/example/libuvdirect/main.c
            @@ -52,9 +52,17 @@ static void delete_timer()
                 uv_close((uv_handle_t *)&timer, timer_close_cb);
             }
             
            +static void query_callback(lcb_INSTANCE *instance, int cbtype, const lcb_RESPQUERY *resp)
            +{
            +    printf("invoke query callback: %s, final: %s\n", lcb_strerror_short(lcb_respquery_status(resp)),
            +           lcb_respquery_is_final(resp) ? "true" : "false");
            +    (void)instance;
            +    (void)cbtype;
            +    (void)resp;
            +}
            +
             static void bootstrap_callback(lcb_INSTANCE *instance, lcb_STATUS err)
             {
            -    lcb_CMDSTORE *cmd;
                 if (err != LCB_SUCCESS) {
                     fprintf(stderr, "bootstrap error: %s\n", lcb_strerror_short(err));
                     lcb_destroy_async(instance, NULL);
            @@ -62,16 +70,35 @@ static void bootstrap_callback(lcb_INSTANCE *instance, lcb_STATUS err)
                 }
                 printf("successfully bootstrapped\n");
                 fflush(stdout);
            -    /* Since we've got our configuration, let's go ahead and store a value */
            -    lcb_cmdstore_create(&cmd, LCB_STORE_UPSERT);
            -    lcb_cmdstore_key(cmd, key, nkey);
            -    lcb_cmdstore_value(cmd, val, nval);
            -    err = lcb_store(instance, NULL, cmd);
            -    lcb_cmdstore_destroy(cmd);
            -    if (err != LCB_SUCCESS) {
            -        fprintf(stderr, "failed to set up store request: %s\n", lcb_strerror_short(err));
            -        lcb_destroy_async(instance, NULL);
            -        return;
            +    {
            +        /* Since we've got our configuration, let's go ahead and store a value */
            +        lcb_CMDSTORE *cmd;
            +        lcb_cmdstore_create(&cmd, LCB_STORE_UPSERT);
            +        lcb_cmdstore_key(cmd, key, nkey);
            +        lcb_cmdstore_value(cmd, val, nval);
            +        err = lcb_store(instance, NULL, cmd);
            +        lcb_cmdstore_destroy(cmd);
            +        if (err != LCB_SUCCESS) {
            +            fprintf(stderr, "failed to set up store request: %s\n", lcb_strerror_short(err));
            +            lcb_destroy_async(instance, NULL);
            +            return;
            +        }
            +    }
            +
            +    {
            +        lcb_CMDQUERY *cmd;
            +        lcb_cmdquery_create(&cmd);
            +        const char *statement = "SELECT 'hello' AS greeting";
            +        lcb_cmdquery_statement(cmd, statement, strlen(statement));
            +        lcb_cmdquery_callback(cmd, query_callback);
            +        err = lcb_query(instance, NULL, cmd);
            +        lcb_cmdquery_destroy(cmd);
            +        if (err != LCB_SUCCESS) {
            +            fprintf(stderr, "failed to set up query request: %s\n", lcb_strerror_short(err));
            +            lcb_destroy_async(instance, NULL);
            +            return;
            +        }
            +        printf("scheduled query\n");
                 }
             }
            

            avsej Sergey Avseyev added a comment - So I've managed to build standalone reproduction of the issue by modifying libuv example main.c diff --git i/example/libuvdirect/main.c w/example/libuvdirect/main.c index 4e359648..ade86e04 100644 --- i/example/libuvdirect/main.c +++ w/example/libuvdirect/main.c @@ -52,9 +52,17 @@ static void delete_timer() uv_close((uv_handle_t *)&timer, timer_close_cb); } +static void query_callback(lcb_INSTANCE *instance, int cbtype, const lcb_RESPQUERY *resp) +{ + printf("invoke query callback: %s, final: %s\n", lcb_strerror_short(lcb_respquery_status(resp)), + lcb_respquery_is_final(resp) ? "true" : "false"); + (void)instance; + (void)cbtype; + (void)resp; +} + static void bootstrap_callback(lcb_INSTANCE *instance, lcb_STATUS err) { - lcb_CMDSTORE *cmd; if (err != LCB_SUCCESS) { fprintf(stderr, "bootstrap error: %s\n", lcb_strerror_short(err)); lcb_destroy_async(instance, NULL); @@ -62,16 +70,35 @@ static void bootstrap_callback(lcb_INSTANCE *instance, lcb_STATUS err) } printf("successfully bootstrapped\n"); fflush(stdout); - /* Since we've got our configuration, let's go ahead and store a value */ - lcb_cmdstore_create(&cmd, LCB_STORE_UPSERT); - lcb_cmdstore_key(cmd, key, nkey); - lcb_cmdstore_value(cmd, val, nval); - err = lcb_store(instance, NULL, cmd); - lcb_cmdstore_destroy(cmd); - if (err != LCB_SUCCESS) { - fprintf(stderr, "failed to set up store request: %s\n", lcb_strerror_short(err)); - lcb_destroy_async(instance, NULL); - return; + { + /* Since we've got our configuration, let's go ahead and store a value */ + lcb_CMDSTORE *cmd; + lcb_cmdstore_create(&cmd, LCB_STORE_UPSERT); + lcb_cmdstore_key(cmd, key, nkey); + lcb_cmdstore_value(cmd, val, nval); + err = lcb_store(instance, NULL, cmd); + lcb_cmdstore_destroy(cmd); + if (err != LCB_SUCCESS) { + fprintf(stderr, "failed to set up store request: %s\n", lcb_strerror_short(err)); + lcb_destroy_async(instance, NULL); + return; + } + } + + { + lcb_CMDQUERY *cmd; + lcb_cmdquery_create(&cmd); + const char *statement = "SELECT 'hello' AS greeting"; + lcb_cmdquery_statement(cmd, statement, strlen(statement)); + lcb_cmdquery_callback(cmd, query_callback); + err = lcb_query(instance, NULL, cmd); + lcb_cmdquery_destroy(cmd); + if (err != LCB_SUCCESS) { + fprintf(stderr, "failed to set up query request: %s\n", lcb_strerror_short(err)); + lcb_destroy_async(instance, NULL); + return; + } + printf("scheduled query\n"); } }

            libcouchbase does not invoke Query callback when the instance destroyed, so in case of external IO (like couchnode) the IO object lives longer that lcb_INSTANCE, and tries to execute N1QL timeout handler (by default 75 seconds) but at that point there is no lcb_INSTANCE around. Before metrics we were lucky not to touch any memory from lcb_INSTANCE, and the issue didn't generate segfault, but metrics made it visible.

            avsej Sergey Avseyev added a comment - libcouchbase does not invoke Query callback when the instance destroyed, so in case of external IO (like couchnode) the IO object lives longer that lcb_INSTANCE, and tries to execute N1QL timeout handler (by default 75 seconds) but at that point there is no lcb_INSTANCE around. Before metrics we were lucky not to touch any memory from lcb_INSTANCE, and the issue didn't generate segfault, but metrics made it visible.

            Build couchbase-server-7.1.0-1450 contains libcouchbase commit 1d2e0e4 with commit message:
            CCBC-1445: always invoke callbacks for pending HTTP operations

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1450 contains libcouchbase commit 1d2e0e4 with commit message: CCBC-1445 : always invoke callbacks for pending HTTP operations

            People

              avsej Sergey Avseyev
              brett19 Brett Lawson
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty