Uploaded image for project: 'C++ Couchbase Client'
  1. C++ Couchbase Client
  2. CXXCBC-484

Intermittent CXX crash on rebalancing PL clusters

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • 1.0.0
    • None
    • None

    Description

      I've seen a number of CXX SDK crashes in SDKD runs and also rarely on some FIT-SIT runs.

      Here is one with 7.1:
      https://sdk.jenkins.couchbase.com/view/Situational/job/c-cpp/job/cxx/job/centos-cxx-sdk-server-situational-tests/1504/consoleFull

      Section of seg fault: seg_fault.log

      The crash happens mid rebalance.

       

      Also a SDKD crash here: https://sdk.jenkins.couchbase.com/view/Situational/job/c-cpp/job/cxx/job/centos-cxx-sdk-server-situational-tests/1506/

      The crash in this case happens on asking sdkd-cxx for a new connection.

       

      Also on the cxx performer against Capella I've seen the occasional crash in the middle of a scale (this is against PL) (though it passes most of the time):

      2024-03-11T19:16:03.0048767Z   }
      2024-03-11T19:16:03.0050302Z [2024-03-11 19:09:29.448] 24798ms [warn] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/default] unable to find connected session with GCCCP support, retry in 2500ms
      2024-03-11T19:16:03.0057164Z [2024-03-11 19:09:34.636] 5188ms [warn] [12,18] Operations over threshold: {"count":3,"service":"query","top":[{"last_local_id":"e9ae6c-ed65-954b-bfb8-73beecfe68d478","last_local_socket":"172.18.0.2:60920","last_operation_id":"fc9571-c939-5b4c-b4ec-073d593c633359","last_remote_socket":"10.0.102.72:18093","operation_name":"cb.query","total_duration_us":4241711},{"last_local_id":"983374-d02f-d045-a579-9e1132b7e3319a","last_local_socket":"172.18.0.2:60970","last_operation_id":"fccd52-e617-1f4c-0ea9-97ccae2d9216f7","last_remote_socket":"10.0.102.72:18093","operation_name":"cb.query","total_duration_us":4228740},{"last_local_id":"ae0048-245c-0848-705f-c219c818d599bb","last_local_socket":"172.18.0.2:60944","last_operation_id":"01e5fc-94a9-8a4d-fa73-7955937cfb4084","last_remote_socket":"10.0.102.72:18093","operation_name":"cb.query","total_duration_us":4226577}]}
      2024-03-11T19:16:03.0066667Z [2024-03-11 19:09:44.636] 10000ms [warn] [12,18] Operations over threshold: {"count":2,"service":"query","top":[{"last_local_id":"88845e-024b-574d-bd00-d3db08ab340849","last_local_socket":"172.18.0.2:60930","last_operation_id":"a0f02a-9308-f340-728f-ea7d0d9c4c4464","last_remote_socket":"10.0.102.72:18093","operation_name":"cb.query","total_duration_us":7949247},{"last_local_id":"59a075-bb19-5f41-1786-d2f60a364ba115","last_local_socket":"172.18.0.2:60958","last_operation_id":"5ef9e8-68e7-2747-20b0-9a67f858dce7b7","last_remote_socket":"10.0.102.72:18093","operation_name":"cb.query","total_duration_us":7946730}]}
      2024-03-11T19:16:03.0072703Z [2024-03-11 19:11:03.915] 79278ms [erro] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/e419a7-a2b5-5b4d-0259-82a70d79761a7a/tls/default] <j6sccwbni3iekqns.pl.nonprod-project-avengers.com/10.0.102.72:11208> IO error while reading from the socket("669f53-3ad8-8a49-8619-b16d5a8f37720d"): 1 (stream truncated)
      2024-03-11T19:16:03.0076358Z [2024-03-11 19:11:35.678] 31762ms [erro] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/ae0048-245c-0848-705f-c219c818d599bb] <10.0.102.72:18093> IO error while reading from the socket: Connection reset by peer
      2024-03-11T19:16:03.0079200Z [2024-03-11 19:11:41.864] 6185ms [erro] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/e9ae6c-ed65-954b-bfb8-73beecfe68d478] <10.0.102.72:18093> IO error while writing to the socket: Connection reset by peer
      2024-03-11T19:16:03.0082970Z [2024-03-11 19:12:36.372] 54508ms [erro] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/a6c530-3fd4-bd4e-6dc9-b683959a69f70d/tls/default] <j6sccwbni3iekqns.pl.nonprod-project-avengers.com/10.0.102.72:11210> IO error while reading from the socket("0e5864-509e-eb42-45af-cb520eeade215c"): 1 (stream truncated)
      2024-03-11T19:16:03.0086683Z [2024-03-11 19:13:18.533] 42161ms [erro] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/59a075-bb19-5f41-1786-d2f60a364ba115] <10.0.102.72:18093> IO error while writing to the socket: Connection reset by peer
      2024-03-11T19:16:03.0089492Z [2024-03-11 19:13:18.539]    6ms [erro] [12,18] [e7e3e3-51b0-8046-197f-17ae6dd6993067/88845e-024b-574d-bd00-d3db08ab340849] <10.0.102.72:18093> IO error while writing to the socket: Connection reset by peer
      2024-03-11T19:16:03.0091325Z ./fit_cxx_wrapper.sh: line 17:    12 Segmentation fault      (core dumped) $FIT_CXX
      2024-03-11T19:16:03.0092220Z   horizontal_scaling { 

      Full FIT-SIT run log: 0_Build 2.txt.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              avsej Sergey Avseyev
              will.broadbelt Will Broadbelt
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes