Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45759

All timer tests with 512 bytes context failing with timer_msg_count = 1

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • Cheshire-Cat
    • 7.0.0
    • clients
    • 7.0.0-4907
    • Triaged
    • 1
    • Unknown

    Description

      All 512byte context tests are failing with this flooding the eventing logs - looks like a JSON parsing issue.

       678\"}","vb":800}}
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_18:/tmp/127.0.0.1:8091_18_954300937.sock:92690] ReferenceKey json parse failed, ReferenceKey: eventing::954300937:tm:800:cx:HUsmYdlezRJAO*StVvNp*JqSr6F value :{"alr":"eventing::954300937:tm:800:al:1618643957:1488","ctx":{"callback":"timerCallback","context":"{\"docId\":\"eventing-000025096304\",\"field\":\"01234567890123456789...truncated due to encoding error
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_4:/tmp/127.0.0.1:8091_4_954300937.sock:92392] ...truncated due to encoding error
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_4:/tmp/127.0.0.1:8091_4_954300937.sock:92392] 6789...truncated due to encoding error
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_4:/tmp/127.0.0.1:8091_4_954300937.sock:92392] 6789...truncated due to encoding error
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_4:/tmp/127.0.0.1:8091_4_954300937.sock:92392] 6789...truncated due to encoding error
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_4:/tmp/127.0.0.1:8091_4_954300937.sock:92392] ...truncated due to encoding error
      2021-04-17T02:01:33.125-07:00 [Info] eventing-consumer [worker_perf-test1_4:/tmp/127.0.0.1:8091_4_954300937.sock:92392] ...truncated due to encoding error
      

       The stats on the eventing node are : 

       on_update_success: 50000000
       timer_create_counter: 50000000
       timer_callback_success : 0

      The timers were all created but - none of the callbacks were received.

      Eventing Rebalance time has also increased in a similar test by 3x : http://perf.jenkins.couchbase.com/job/themis/9831/console 

      Test Log : http://perf.jenkins.couchbase.com/job/themis/9924/consoleFull 

      Logs : 

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-themis-9924/172.23.96.16.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-themis-9924/172.23.96.17.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-themis-9924/172.23.96.20.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-themis-9924/172.23.96.23.zip

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-themis-9924/172.23.97.177.zip - Eventing

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            jeelan.poola Jeelan Poola added a comment -

            Prajwal‌ Kiran Kumar‌ On which build did this test last pass? Trying to narrow down the root cause.

            jeelan.poola Jeelan Poola added a comment - Prajwal‌ Kiran Kumar‌ On which build did this test last pass? Trying to narrow down the root cause.

            7.0.0-4854 had a clean run 

            prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - 7.0.0-4854 had a clean run 
            jeelan.poola Jeelan Poola added a comment - - edited

            Prajwal‌ Kiran Kumar‌ Could you please help identify the first build where this started failing? Also, 4907 is a bit older build now. Could you please re-run with latest build and confirm the issue still exists?

            jeelan.poola Jeelan Poola added a comment - - edited Prajwal‌ Kiran Kumar‌ Could you please help identify the first build where this started failing? Also, 4907 is a bit older build now. Could you please re-run with latest build and confirm the issue still exists?

            Jeelan Poola  - I have reproduced this issue on a smaller scale manually on a different cluster/env as well on 7.0.0-4861 which happens to be the build from which we start to see this issue. The diff between 4860 and 4861 does not have any KV/Eventing commits. But this is consistently reproduced on 4861 with the following function handler (that is used in the test as well) .The timer max context size set in this setup is 1024 bytes

            function OnUpdate(doc, meta) {
             
              var oneMinuteFromNow = new Date(); 
              oneMinuteFromNow.setSeconds(oneMinuteFromNow.getSeconds() + 1);
                var context = {docId : meta.id, field: "012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345670123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456701234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678"};
                createTimer(timerCallback, oneMinuteFromNow, meta.id, context);
            }
             
            function timerCallback(context) {
                bucket1[context.docId]=context;
            }
            

            When the context size is reduced to 10bytes, the timers begin to execute in this setup and we see the bucketop executing which is inline with the rest of the timer tests running fine on the recent builds.

            This is seen on the latest weekly build 4960 as well.

            prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - - edited Jeelan Poola   - I have reproduced this issue on a smaller scale manually on a different cluster/env as well on 7.0.0-4861 which happens to be the build from which we start to see this issue. The diff between 4860 and 4861 does not have any KV/Eventing commits. But this is consistently reproduced on 4861 with the following function handler (that is used in the test as well) .The timer max context size set in this setup is 1024 bytes function OnUpdate(doc, meta) {   var oneMinuteFromNow = new Date(); oneMinuteFromNow.setSeconds(oneMinuteFromNow.getSeconds() + 1); var context = {docId : meta.id, field: "012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345670123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456701234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678"}; createTimer(timerCallback, oneMinuteFromNow, meta.id, context); }   function timerCallback(context) { bucket1[context.docId]=context; } When the context size is reduced to 10bytes, the timers begin to execute in this setup and we see the bucketop executing which is inline with the rest of the timer tests running fine on the recent builds. This is seen on the latest weekly build 4960 as well.
            jeelan.poola Jeelan Poola added a comment -

            Thank you Prajwal‌ Kiran Kumar‌!
            Abhishek can reproduce on latest builds as well, but it is very strange & intriguing why the issue shows up with 4861 but not with 4860 (as you mention there are no eventing/kv changes between these builds). Some component/library changed between these builds causing this problem. We are digging in.

            jeelan.poola Jeelan Poola added a comment - Thank you Prajwal‌ Kiran Kumar‌ ! Abhishek can reproduce on latest builds as well, but it is very strange & intriguing why the issue shows up with 4861 but not with 4860 (as you mention there are no eventing/kv changes between these builds). Some component/library changed between these builds causing this problem. We are digging in.
            abhishek.jindal Abhishek Jindal added a comment - - edited

            After some code walkthrough, can confirm that the "truncated due to encoding error" is being returned as part of the response from libcouchbase's lcb_get request.

            Ultimately down the stack we make a call to lcb_get here: https://github.com/couchbase/eventing/blob/b35dfcc1239b619a764b33246ce488db51e4cd4d/features/src/lcb_util.cc#L283

            As a result, we are getting a response : "01234567890123456789...truncated..."
            Now, it is pretty clear that the above string is not a json, which is why when timer_scanner tries to parse this string, the parsing fails.

            The question is - Why did libcouchbase return a non-json in the first place?
            Good news is that I am able to reproduce this via cbc cat (which internally makes a call to lcb_get).

            In order to reproduce the issue -

            • Install latest Couchbase Server build 4995 on a centos7 box.
            • Create a bucket with active compression enabled so that the inserted document's datatype is snappy,json (eventing inserts context timer documents with compression bit enabled).
            • Insert the following document (minimal document structure that repros the issue):

            {
              "fieldA": "valA",
              "fieldB": {
                "callback": "timerCallback",
                "context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789012345678901234567890123\"}",
                "fieldC": "valc"
              }
            }
            

            • Now cbc-cat this document:

               /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default
              

            • Returned document is the following (note that the returned data is garbled for the field - "field"):

              doc1                 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)
              {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789�
              \"6789�
              \"\"}","fieldC": "valc"}}
              

            • Now, install Couchbase Server build 4860 (last good build) and run the same steps as above.
            • Returned document the following.

            doc1                 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)
            {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"0123456789012345678901234567890123456789012\"}","fieldC": "valc"}}
            

            Note that there is no issue with KV here because cbc cat from build 4860 queries the same document from same cluster without issues.
            It seems like libcouchbase is facing issues while trying to uncompress the value: 01234567890123456789012345678901234567890123

            Note: The garbled value is returned only when the length of data in "field" is at least 44 characters where all characters are digits.

            abhishek.jindal Abhishek Jindal added a comment - - edited After some code walkthrough, can confirm that the "truncated due to encoding error" is being returned as part of the response from libcouchbase's lcb_get request. Ultimately down the stack we make a call to lcb_get here: https://github.com/couchbase/eventing/blob/b35dfcc1239b619a764b33246ce488db51e4cd4d/features/src/lcb_util.cc#L283 As a result, we are getting a response : "01234567890123456789...truncated..." Now, it is pretty clear that the above string is not a json, which is why when timer_scanner tries to parse this string, the parsing fails. — The question is - Why did libcouchbase return a non-json in the first place? Good news is that I am able to reproduce this via cbc cat (which internally makes a call to lcb_get). In order to reproduce the issue - Install latest Couchbase Server build 4995 on a centos7 box. Create a bucket with active compression enabled so that the inserted document's datatype is snappy,json (eventing inserts context timer documents with compression bit enabled). Insert the following document (minimal document structure that repros the issue): { "fieldA": "valA", "fieldB": { "callback": "timerCallback", "context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789012345678901234567890123\"}", "fieldC": "valc" } } Now cbc-cat this document: /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default Returned document is the following (note that the returned data is garbled for the field - "field"): doc1 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON) {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789� \"6789� \"\"}","fieldC": "valc"}} Now, install Couchbase Server build 4860 (last good build) and run the same steps as above. Returned document the following. doc1 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON) {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"0123456789012345678901234567890123456789012\"}","fieldC": "valc"}} Note that there is no issue with KV here because cbc cat from build 4860 queries the same document from same cluster without issues. It seems like libcouchbase is facing issues while trying to uncompress the value: 01234567890123456789012345678901234567890123 Note: The garbled value is returned only when the length of data in "field" is at least 44 characters where all characters are digits.

            Abhishek Jindal, if the cluster still running, could you pipe the output through xxd or something to get something portable? Looks like Jira is trying to interpret some bytes as characters here.

             /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default | xxd
            

            avsej Sergey Avseyev added a comment - Abhishek Jindal , if the cluster still running, could you pipe the output through xxd or something to get something portable? Looks like Jira is trying to interpret some bytes as characters here. /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default | xxd

            Bad run (from cbc cat of build 4995):

            [vagrant@node2-cb660-centos7 ~]$ /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default | xxd
            doc1                 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)
             
            0000000: 7b22 6669 656c 6441 223a 2022 7661 6c41  {"fieldA": "valA
            0000010: 222c 2266 6965 6c64 4222 3a20 7b22 6361  ","fieldB": {"ca
            0000020: 6c6c 6261 636b 223a 2022 7469 6d65 7243  llback": "timerC
            0000030: 616c 6c62 6163 6b22 2c22 636f 6e74 6578  allback","contex
            0000040: 7422 3a20 227b 5c22 646f 6349 645c 223a  t": "{\"docId\":
            0000050: 5c22 646f 6330 5c22 2c5c 2266 6965 6c64  \"doc0\",\"field
            0000060: 5c22 3a5c 2230 3132 3334 3536 3738 3930  \":\"01234567890
            0000070: 3132 3334 3536 3738 3982 0a00 085c 2236  123456789....\"6
            0000080: 3738 3982 0a00 085c 2200 0000 0000 0000  789....\".......
            0000090: 5c22 7d22 2c22 6669 656c 6443 223a 2022  \"}","fieldC": "
            00000a0: 7661 6c63 227d 7d                        valc"}}
            

            Good run (using cbc cat from build 4860):

            [vagrant@node1-cb660-centos7 ~]$ /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default | xxd
            doc1                 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)
             
            0000000: 7b22 6669 656c 6441 223a 2022 7661 6c41  {"fieldA": "valA
            0000010: 222c 2266 6965 6c64 4222 3a20 7b22 6361  ","fieldB": {"ca
            0000020: 6c6c 6261 636b 223a 2022 7469 6d65 7243  llback": "timerC
            0000030: 616c 6c62 6163 6b22 2c22 636f 6e74 6578  allback","contex
            0000040: 7422 3a20 227b 5c22 646f 6349 645c 223a  t": "{\"docId\":
            0000050: 5c22 646f 6330 5c22 2c5c 2266 6965 6c64  \"doc0\",\"field
            0000060: 5c22 3a5c 2230 3132 3334 3536 3738 3930  \":\"01234567890
            0000070: 3132 3334 3536 3738 3930 3132 3334 3536  1234567890123456
            0000080: 3738 3930 3132 3334 3536 3738 3930 3132  7890123456789012
            0000090: 5c22 7d22 2c22 6669 656c 6443 223a 2022  \"}","fieldC": "
            00000a0: 7661 6c63 227d 7d                        valc"}}
            

            abhishek.jindal Abhishek Jindal added a comment - Bad run (from cbc cat of build 4995): [vagrant@node2-cb660-centos7 ~]$ /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default | xxd doc1 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)   0000000: 7b22 6669 656c 6441 223a 2022 7661 6c41 {"fieldA": "valA 0000010: 222c 2266 6965 6c64 4222 3a20 7b22 6361 ","fieldB": {"ca 0000020: 6c6c 6261 636b 223a 2022 7469 6d65 7243 llback": "timerC 0000030: 616c 6c62 6163 6b22 2c22 636f 6e74 6578 allback","contex 0000040: 7422 3a20 227b 5c22 646f 6349 645c 223a t": "{\"docId\": 0000050: 5c22 646f 6330 5c22 2c5c 2266 6965 6c64 \"doc0\",\"field 0000060: 5c22 3a5c 2230 3132 3334 3536 3738 3930 \":\"01234567890 0000070: 3132 3334 3536 3738 3982 0a00 085c 2236 123456789....\"6 0000080: 3738 3982 0a00 085c 2200 0000 0000 0000 789....\"....... 0000090: 5c22 7d22 2c22 6669 656c 6443 223a 2022 \"}","fieldC": " 00000a0: 7661 6c63 227d 7d valc"}} Good run (using cbc cat from build 4860) : [vagrant@node1-cb660-centos7 ~]$ /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default | xxd doc1 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)   0000000: 7b22 6669 656c 6441 223a 2022 7661 6c41 {"fieldA": "valA 0000010: 222c 2266 6965 6c64 4222 3a20 7b22 6361 ","fieldB": {"ca 0000020: 6c6c 6261 636b 223a 2022 7469 6d65 7243 llback": "timerC 0000030: 616c 6c62 6163 6b22 2c22 636f 6e74 6578 allback","contex 0000040: 7422 3a20 227b 5c22 646f 6349 645c 223a t": "{\"docId\": 0000050: 5c22 646f 6330 5c22 2c5c 2266 6965 6c64 \"doc0\",\"field 0000060: 5c22 3a5c 2230 3132 3334 3536 3738 3930 \":\"01234567890 0000070: 3132 3334 3536 3738 3930 3132 3334 3536 1234567890123456 0000080: 3738 3930 3132 3334 3536 3738 3930 3132 7890123456789012 0000090: 5c22 7d22 2c22 6669 656c 6443 223a 2022 \"}","fieldC": " 00000a0: 7661 6c63 227d 7d valc"}}

            Thanks, I confirm there is a problem. Although it is not reproducing with current master of libcouchbase.

            avsej Sergey Avseyev added a comment - Thanks, I confirm there is a problem. Although it is not reproducing with current master of libcouchbase.

            $ /opt/couchbase/bin/cbc version
            cbc:
              Runtime: Version=3.0.7_8_g465188d3b1, Changeset=465188d3b1a85902e918a553d8ec71aeb903018a
              Headers: Version=3.0.7_8_g465188d3b1, Changeset=465188d3b1a85902e918a553d8ec71aeb903018a
            

            Any chance to upgrade to upgrade libcouchbase and re-run check?

            avsej Sergey Avseyev added a comment - $ /opt/couchbase/bin/cbc version cbc: Runtime: Version=3.0.7_8_g465188d3b1, Changeset=465188d3b1a85902e918a553d8ec71aeb903018a Headers: Version=3.0.7_8_g465188d3b1, Changeset=465188d3b1a85902e918a553d8ec71aeb903018a Any chance to upgrade to upgrade libcouchbase and re-run check?

            I wonder if it could be something build-related. I build it from sources using current chesire-cat.xml manifest, it has the same version of libcouchbase as above, but works as expected.

            avsej Sergey Avseyev added a comment - I wonder if it could be something build-related. I build it from sources using current chesire-cat.xml manifest, it has the same version of libcouchbase as above, but works as expected.

            Sergey Avseyev Last good build was 4860 and first bad build was 4861. Hence, something's def. changed between the two builds worth checking.

            abhishek.jindal Abhishek Jindal added a comment - Sergey Avseyev Last good build was 4860 and first bad build was 4861. Hence, something's def. changed between the two builds worth checking.
            abhishek.jindal Abhishek Jindal added a comment - - edited

            Sergey Avseyev Some more info that might help - This toy build - http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/12746/ is built using lcb 3.1.1 where issue gets reproduced:

            [vagrant@node3-cb660-centos7 ~]$ /opt/couchbase/bin/cbc version
            cbc:
              Runtime: Version=3.1.1_5_gc84648dbc3, Changeset=c84648dbc3bde08372de934053197d579a0fe5ca
              Headers: Version=3.1.1_5_gc84648dbc3, Changeset=c84648dbc3bde08372de934053197d579a0fe5ca
            

            /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default
            doc1                 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)
            {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789�
            \"6789�
            \"@\"}","fieldC": "valc"}}
            

            Looks like c84648dbc3bde08372de934053197d579a0fe5ca is the sha of current master too.

            abhishek.jindal Abhishek Jindal added a comment - - edited Sergey Avseyev Some more info that might help - This toy build - http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/12746/ is built using lcb 3.1.1 where issue gets reproduced: [vagrant@node3-cb660-centos7 ~]$ /opt/couchbase/bin/cbc version cbc: Runtime: Version=3.1.1_5_gc84648dbc3, Changeset=c84648dbc3bde08372de934053197d579a0fe5ca Headers: Version=3.1.1_5_gc84648dbc3, Changeset=c84648dbc3bde08372de934053197d579a0fe5ca /opt/couchbase/bin/cbc cat doc1 -u Administrator -P asdasd -U couchbase://10.112.205.102/src --scope _default --collection _default doc1 CAS=0x1677d8ac21ab0000, Flags=0x2000006, Size=167, Datatype=0x01(JSON) {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789� \"6789� \"@\"}","fieldC": "valc"}} Looks like c84648dbc3bde08372de934053197d579a0fe5ca is the sha of current master too.

            There is nothing in the diff between these two versions that might affect decompression. I stepped in GDB, and in fact the version of libcouchbase from Couchbase incorrectly decodes snappy-compressed bytestring.

            avsej Sergey Avseyev added a comment - There is nothing in the diff between these two versions that might affect decompression. I stepped in GDB, and in fact the version of libcouchbase from Couchbase incorrectly decodes snappy-compressed bytestring.

            Assigning it to you Sergey Avseyev as the build with upgraded lcb 3.1.1 by Abhishek Jindal also reproduced the issue.

            prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - - edited Assigning it to you Sergey Avseyev as the build with upgraded lcb 3.1.1 by Abhishek Jindal also reproduced the issue.

            My idea right now is that snappy decompressor requires zero filled output buffer (although filling it with zeroes using GDB didn't help).

            $ valgrind --track-origins=yes /opt/couchbase/bin/cbc cat -u Administrator -P password doc1
            ==198733== Memcheck, a memory error detector
            ==198733== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
            ==198733== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info
            ==198733== Command: /opt/couchbase/bin/cbc cat -u Administrator -P password doc1
            ==198733== 
            doc1                 CAS=0x1677e655f2400000, Flags=0x2000006, Size=167, Datatype=0x01(JSON)
            {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789�
            \"6789�
            ==198733== Conditional jump or move depends on uninitialised value(s)
            ==198733==    at 0x6D645EE: _IO_file_overflow@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D65610: _IO_default_xsputn (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D63910: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D58A7B: fwrite (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x4156BC: get_callback (cbc.cc:115)
            ==198733==    by 0x4EB2130: void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) (handler.cc:376)
            ==198733==    by 0x4EAF2C6: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:448)
            ==198733==    by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054)
            ==198733==    by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567)
            ==198733==    by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625)
            ==198733==    by 0x4E7FB86: invoke_read_cb (ctx.cc:249)
            ==198733==    by 0x4E7FB86: E_handler(int, short, void*) (ctx.cc:276)
            ==198733==    by 0x4E67D44: run_loop (plugin-select.c:309)
            ==198733==  Uninitialised value was created by a heap allocation
            ==198733==    at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
            ==198733==    by 0x4E776AB: mcreq_inflate_value (compress.cc:160)
            ==198733==    by 0x4EADA8D: maybe_decompress(lcb_st*, lcb::MemcachedResponse const*, lcb_RESPGET_*, void**) (handler.cc:413)
            ==198733==    by 0x4EAF1B5: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:445)
            ==198733==    by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054)
            ==198733==    by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567)
            ==198733==    by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625)
            ==198733==    by 0x4E7FB86: invoke_read_cb (ctx.cc:249)
            ==198733==    by 0x4E7FB86: E_handler(int, short, void*) (ctx.cc:276)
            ==198733==    by 0x4E67D44: run_loop (plugin-select.c:309)
            ==198733==    by 0x4F0657D: lcb_wait (wait.cc:108)
            ==198733==    by 0x417046: cbc::GetHandler::run() (cbc.cc:844)
            ==198733==    by 0x41ABA3: cbc::Handler::execute(int, char**) (cbc.cc:709)
            ==198733== 
            ==198733== Syscall param write(buf) points to uninitialised byte(s)
            ==198733==    at 0x6DD4648: write (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D631FC: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D6256E: new_do_write (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D642B8: _IO_do_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D62417: _IO_file_sync@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D57C20: fflush (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x4156C8: get_callback (cbc.cc:116)
            ==198733==    by 0x4EB2130: void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) (handler.cc:376)
            ==198733==    by 0x4EAF2C6: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:448)
            ==198733==    by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054)
            ==198733==    by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567)
            ==198733==    by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625)
            ==198733==  Address 0x718a954 is 4 bytes inside a block of size 1,024 alloc'd
            ==198733==    at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
            ==198733==    by 0x6D5760F: _IO_file_doallocate (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D654BF: _IO_doallocbuf (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D64727: _IO_file_overflow@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D638CE: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x6D58A7B: fwrite (in /usr/lib64/libc-2.28.so)
            ==198733==    by 0x4156BC: get_callback (cbc.cc:115)
            ==198733==    by 0x4EB2130: void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) (handler.cc:376)
            ==198733==    by 0x4EAF2C6: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:448)
            ==198733==    by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054)
            ==198733==    by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567)
            ==198733==    by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625)
            ==198733==  Uninitialised value was created by a heap allocation
            ==198733==    at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
            ==198733==    by 0x4E776AB: mcreq_inflate_value (compress.cc:160)
            ==198733==    by 0x4EADA8D: maybe_decompress(lcb_st*, lcb::MemcachedResponse const*, lcb_RESPGET_*, void**) (handler.cc:413)
            ==198733==    by 0x4EAF1B5: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:445)
            ==198733==    by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054)
            ==198733==    by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567)
            ==198733==    by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625)
            ==198733==    by 0x4E7FB86: invoke_read_cb (ctx.cc:249)
            ==198733==    by 0x4E7FB86: E_handler(int, short, void*) (ctx.cc:276)
            ==198733==    by 0x4E67D44: run_loop (plugin-select.c:309)
            ==198733==    by 0x4F0657D: lcb_wait (wait.cc:108)
            ==198733==    by 0x417046: cbc::GetHandler::run() (cbc.cc:844)
            ==198733==    by 0x41ABA3: cbc::Handler::execute(int, char**) (cbc.cc:709)
            ==198733== 
            \"\"}","fieldC": "valc"}}
            ==198733== 
            ==198733== HEAP SUMMARY:
            ==198733==     in use at exit: 32 bytes in 1 blocks
            ==198733==   total heap usage: 2,797 allocs, 2,796 frees, 712,768 bytes allocated
            ==198733== 
            ==198733== LEAK SUMMARY:
            ==198733==    definitely lost: 0 bytes in 0 blocks
            ==198733==    indirectly lost: 0 bytes in 0 blocks
            ==198733==      possibly lost: 0 bytes in 0 blocks
            ==198733==    still reachable: 32 bytes in 1 blocks
            ==198733==         suppressed: 0 bytes in 0 blocks
            ==198733== Rerun with --leak-check=full to see details of leaked memory
            ==198733== 
            ==198733== For lists of detected and suppressed errors, rerun with: -s
            ==198733== ERROR SUMMARY: 7 errors from 2 contexts (suppressed: 0 from 0)
            

            These two lines says that snappy decompressor didn't update part of the buffer, and valgrind caught it when cbc tried to write the buffer to the output.

            ==198733==  Uninitialised value was created by a heap allocation
            ==198733==    at 0x4C34F0B: malloc (vg_replace_malloc.c:307)
            

            BTW when I filled the buffer with zeros (or with pattern) right before this line
            https://github.com/couchbase/libcouchbase/blob/465188d3b1a85902e918a553d8ec71aeb903018a/src/mc/compress.cc#L161
            The output had zeros (or my pattern) after decompression.

            avsej Sergey Avseyev added a comment - My idea right now is that snappy decompressor requires zero filled output buffer (although filling it with zeroes using GDB didn't help). $ valgrind --track-origins=yes /opt/couchbase/bin/cbc cat -u Administrator -P password doc1 ==198733== Memcheck, a memory error detector ==198733== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==198733== Using Valgrind-3.16.0 and LibVEX; rerun with -h for copyright info ==198733== Command: /opt/couchbase/bin/cbc cat -u Administrator -P password doc1 ==198733== doc1 CAS=0x1677e655f2400000, Flags=0x2000006, Size=167, Datatype=0x01(JSON) {"fieldA": "valA","fieldB": {"callback": "timerCallback","context": "{\"docId\":\"doc0\",\"field\":\"01234567890123456789� \"6789� ==198733== Conditional jump or move depends on uninitialised value(s) ==198733== at 0x6D645EE: _IO_file_overflow@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D65610: _IO_default_xsputn (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D63910: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D58A7B: fwrite (in /usr/lib64/libc-2.28.so) ==198733== by 0x4156BC: get_callback (cbc.cc:115) ==198733== by 0x4EB2130: void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) (handler.cc:376) ==198733== by 0x4EAF2C6: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:448) ==198733== by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054) ==198733== by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567) ==198733== by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625) ==198733== by 0x4E7FB86: invoke_read_cb (ctx.cc:249) ==198733== by 0x4E7FB86: E_handler(int, short, void*) (ctx.cc:276) ==198733== by 0x4E67D44: run_loop (plugin-select.c:309) ==198733== Uninitialised value was created by a heap allocation ==198733== at 0x4C34F0B: malloc (vg_replace_malloc.c:307) ==198733== by 0x4E776AB: mcreq_inflate_value (compress.cc:160) ==198733== by 0x4EADA8D: maybe_decompress(lcb_st*, lcb::MemcachedResponse const*, lcb_RESPGET_*, void**) (handler.cc:413) ==198733== by 0x4EAF1B5: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:445) ==198733== by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054) ==198733== by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567) ==198733== by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625) ==198733== by 0x4E7FB86: invoke_read_cb (ctx.cc:249) ==198733== by 0x4E7FB86: E_handler(int, short, void*) (ctx.cc:276) ==198733== by 0x4E67D44: run_loop (plugin-select.c:309) ==198733== by 0x4F0657D: lcb_wait (wait.cc:108) ==198733== by 0x417046: cbc::GetHandler::run() (cbc.cc:844) ==198733== by 0x41ABA3: cbc::Handler::execute(int, char**) (cbc.cc:709) ==198733== ==198733== Syscall param write(buf) points to uninitialised byte(s) ==198733== at 0x6DD4648: write (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D631FC: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D6256E: new_do_write (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D642B8: _IO_do_write@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D62417: _IO_file_sync@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D57C20: fflush (in /usr/lib64/libc-2.28.so) ==198733== by 0x4156C8: get_callback (cbc.cc:116) ==198733== by 0x4EB2130: void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) (handler.cc:376) ==198733== by 0x4EAF2C6: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:448) ==198733== by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054) ==198733== by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567) ==198733== by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625) ==198733== Address 0x718a954 is 4 bytes inside a block of size 1,024 alloc'd ==198733== at 0x4C34F0B: malloc (vg_replace_malloc.c:307) ==198733== by 0x6D5760F: _IO_file_doallocate (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D654BF: _IO_doallocbuf (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D64727: _IO_file_overflow@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D638CE: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib64/libc-2.28.so) ==198733== by 0x6D58A7B: fwrite (in /usr/lib64/libc-2.28.so) ==198733== by 0x4156BC: get_callback (cbc.cc:115) ==198733== by 0x4EB2130: void invoke_callback<lcb_RESPGET_>(mc_packet_st const*, lcb_st*, lcb_RESPGET_*, lcb_CALLBACK_TYPE) (handler.cc:376) ==198733== by 0x4EAF2C6: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:448) ==198733== by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054) ==198733== by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567) ==198733== by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625) ==198733== Uninitialised value was created by a heap allocation ==198733== at 0x4C34F0B: malloc (vg_replace_malloc.c:307) ==198733== by 0x4E776AB: mcreq_inflate_value (compress.cc:160) ==198733== by 0x4EADA8D: maybe_decompress(lcb_st*, lcb::MemcachedResponse const*, lcb_RESPGET_*, void**) (handler.cc:413) ==198733== by 0x4EAF1B5: H_get(mc_pipeline_st*, mc_packet_st*, lcb::MemcachedResponse*, lcb_STATUS) (handler.cc:445) ==198733== by 0x4EB1710: mcreq_dispatch_response (handler.cc:1054) ==198733== by 0x4EF61A7: lcb::Server::try_read(lcbio_CTX*, rdb_IOROPE*) (mcserver.cc:567) ==198733== by 0x4EF7E15: on_read(lcbio_CTX*, unsigned int) (mcserver.cc:625) ==198733== by 0x4E7FB86: invoke_read_cb (ctx.cc:249) ==198733== by 0x4E7FB86: E_handler(int, short, void*) (ctx.cc:276) ==198733== by 0x4E67D44: run_loop (plugin-select.c:309) ==198733== by 0x4F0657D: lcb_wait (wait.cc:108) ==198733== by 0x417046: cbc::GetHandler::run() (cbc.cc:844) ==198733== by 0x41ABA3: cbc::Handler::execute(int, char**) (cbc.cc:709) ==198733== \"\"}","fieldC": "valc"}} ==198733== ==198733== HEAP SUMMARY: ==198733== in use at exit: 32 bytes in 1 blocks ==198733== total heap usage: 2,797 allocs, 2,796 frees, 712,768 bytes allocated ==198733== ==198733== LEAK SUMMARY: ==198733== definitely lost: 0 bytes in 0 blocks ==198733== indirectly lost: 0 bytes in 0 blocks ==198733== possibly lost: 0 bytes in 0 blocks ==198733== still reachable: 32 bytes in 1 blocks ==198733== suppressed: 0 bytes in 0 blocks ==198733== Rerun with --leak-check=full to see details of leaked memory ==198733== ==198733== For lists of detected and suppressed errors, rerun with: -s ==198733== ERROR SUMMARY: 7 errors from 2 contexts (suppressed: 0 from 0) These two lines says that snappy decompressor didn't update part of the buffer, and valgrind caught it when cbc tried to write the buffer to the output. ==198733== Uninitialised value was created by a heap allocation ==198733== at 0x4C34F0B: malloc (vg_replace_malloc.c:307) BTW when I filled the buffer with zeros (or with pattern) right before this line https://github.com/couchbase/libcouchbase/blob/465188d3b1a85902e918a553d8ec71aeb903018a/src/mc/compress.cc#L161 The output had zeros (or my pattern) after decompression.

            On the other hand, this line says explicitly that version of Uncompress for std::string allows uninitialized memory in output buffer:
            https://github.com/couchbase/libcouchbase/blob/465188d3b1a85902e918a553d8ec71aeb903018a/contrib/snappy/snappy-stubs-internal.h#L466-L467

            avsej Sergey Avseyev added a comment - On the other hand, this line says explicitly that version of Uncompress for std::string allows uninitialized memory in output buffer: https://github.com/couchbase/libcouchbase/blob/465188d3b1a85902e918a553d8ec71aeb903018a/contrib/snappy/snappy-stubs-internal.h#L466-L467

            --- couchbase-server-7.0.0-4860-manifest.xml	2021-04-05 05:34:06.000000000 +0300
            +++ couchbase-server-7.0.0-4861-manifest.xml	2021-04-05 13:38:55.000000000 +0300
            @@ -20,7 +20,7 @@
               <project name="build" path="cbbuild" revision="6a2340fd7fd8bbcc376442baa6cdc8abe225d226" upstream="master" groups="notdefault,build">
                 <annotation name="RELEASE" value="cheshire-cat" />
                 <annotation name="PRODUCT" value="couchbase-server" />
            -    <annotation name="BLD_NUM" value="4860" />
            +    <annotation name="BLD_NUM" value="4861" />
                 <annotation name="VERSION" value="7.0.0" />
                 <annotation name="BSL_PRODUCT" value="Couchbase Server" />
                 <annotation name="BSL_VERSION" value="Cheshire-Cat" />
            @@ -104,7 +104,7 @@
               <project name="liner" path="godeps/src/github.com/peterh/liner" remote="couchbasedeps" revision="6f820f8f90ce9482ffbd40bb15f9ea9932f4942d" />
               <project name="liner" path="godeps/src/github.com/sbinet/liner" remote="couchbasedeps" revision="d9335eee40a45a4f5d74524c90040d6fe6013d50" />
               <project name="logstats" path="godeps/src/github.com/couchbase/logstats" revision="b6030b3d1dd07fc9fd047b699466ac2c90d48a5b" upstream="master" />
            -  <project name="magma" remote="couchbase-priv" revision="cbd1fb55e41d612afa32ded73cb611a1a0b9fbd2" upstream="master" groups="notdefault,enterprise,kv_ee" />
            +  <project name="magma" remote="couchbase-priv" revision="9d5b37c33ebbd7219193ab9dec070b49d64579d7" upstream="master" groups="notdefault,enterprise,kv_ee" />
               <project name="minify" path="godeps/src/github.com/tdewolff/minify" remote="couchbasedeps" revision="ede45cc53f43891267b1fe7c689db9c76d4ce0fb" />
               <project name="mmap-go" path="godeps/src/github.com/blevesearch/mmap-go" remote="blevesearch" revision="99940f54c59671cf69e10b2e4041fabce88eb9b2" />
               <project name="mmap-go" path="godeps/src/github.com/edsrzf/mmap-go" remote="couchbasedeps" revision="935e0e8a636ca4ba70b713f3e38a19e1b77739e8" />
            @@ -141,7 +141,7 @@
               <project name="strconv" path="godeps/src/github.com/tdewolff/strconv" remote="couchbasedeps" revision="9b189f5be77f33c46776f24dbddb2a7ab32af214" />
               <project name="subjson" revision="fe5d6b74ff58bda69993ebbe973f9c010692d215" upstream="master" groups="kv" />
               <project name="sys" path="godeps/src/golang.org/x/sys" remote="couchbasedeps" revision="12a6c2dcc1e4cb348b57847c73987099e261714b" />
            -  <project name="testrunner" revision="ee2f4fc42c00b2b66ef3ca757f0e8b02d5f36679" upstream="master" />
            +  <project name="testrunner" revision="de07646cc4742426e2a9cb96704e446199471423" upstream="master" />
               <project name="text" path="godeps/src/golang.org/x/text" remote="couchbasedeps" revision="88f656faf3f37f690df1a32515b479415e1a6769" />
               <project name="tlm" revision="0aed291a104dc4f94f37b262fedcc27929b9e410" upstream="master" groups="kv">
                 <copyfile src="GNUmakefile" dest="GNUmakefile" />
            

            avsej Sergey Avseyev added a comment - --- couchbase-server-7.0.0-4860-manifest.xml 2021-04-05 05:34:06.000000000 +0300 +++ couchbase-server-7.0.0-4861-manifest.xml 2021-04-05 13:38:55.000000000 +0300 @@ -20,7 +20,7 @@ <project name="build" path="cbbuild" revision="6a2340fd7fd8bbcc376442baa6cdc8abe225d226" upstream="master" groups="notdefault,build"> <annotation name="RELEASE" value="cheshire-cat" /> <annotation name="PRODUCT" value="couchbase-server" /> - <annotation name="BLD_NUM" value="4860" /> + <annotation name="BLD_NUM" value="4861" /> <annotation name="VERSION" value="7.0.0" /> <annotation name="BSL_PRODUCT" value="Couchbase Server" /> <annotation name="BSL_VERSION" value="Cheshire-Cat" /> @@ -104,7 +104,7 @@ <project name="liner" path="godeps/src/github.com/peterh/liner" remote="couchbasedeps" revision="6f820f8f90ce9482ffbd40bb15f9ea9932f4942d" /> <project name="liner" path="godeps/src/github.com/sbinet/liner" remote="couchbasedeps" revision="d9335eee40a45a4f5d74524c90040d6fe6013d50" /> <project name="logstats" path="godeps/src/github.com/couchbase/logstats" revision="b6030b3d1dd07fc9fd047b699466ac2c90d48a5b" upstream="master" /> - <project name="magma" remote="couchbase-priv" revision="cbd1fb55e41d612afa32ded73cb611a1a0b9fbd2" upstream="master" groups="notdefault,enterprise,kv_ee" /> + <project name="magma" remote="couchbase-priv" revision="9d5b37c33ebbd7219193ab9dec070b49d64579d7" upstream="master" groups="notdefault,enterprise,kv_ee" /> <project name="minify" path="godeps/src/github.com/tdewolff/minify" remote="couchbasedeps" revision="ede45cc53f43891267b1fe7c689db9c76d4ce0fb" /> <project name="mmap-go" path="godeps/src/github.com/blevesearch/mmap-go" remote="blevesearch" revision="99940f54c59671cf69e10b2e4041fabce88eb9b2" /> <project name="mmap-go" path="godeps/src/github.com/edsrzf/mmap-go" remote="couchbasedeps" revision="935e0e8a636ca4ba70b713f3e38a19e1b77739e8" /> @@ -141,7 +141,7 @@ <project name="strconv" path="godeps/src/github.com/tdewolff/strconv" remote="couchbasedeps" revision="9b189f5be77f33c46776f24dbddb2a7ab32af214" /> <project name="subjson" revision="fe5d6b74ff58bda69993ebbe973f9c010692d215" upstream="master" groups="kv" /> <project name="sys" path="godeps/src/golang.org/x/sys" remote="couchbasedeps" revision="12a6c2dcc1e4cb348b57847c73987099e261714b" /> - <project name="testrunner" revision="ee2f4fc42c00b2b66ef3ca757f0e8b02d5f36679" upstream="master" /> + <project name="testrunner" revision="de07646cc4742426e2a9cb96704e446199471423" upstream="master" /> <project name="text" path="godeps/src/golang.org/x/text" remote="couchbasedeps" revision="88f656faf3f37f690df1a32515b479415e1a6769" /> <project name="tlm" revision="0aed291a104dc4f94f37b262fedcc27929b9e410" upstream="master" groups="kv"> <copyfile src="GNUmakefile" dest="GNUmakefile" />

            It seems 4861 was the first build with the final updated version of gcc 10 (MB-45044). We had been building with gcc10 for a while, but a final change went in on April 5 to fix some RPATH issues with ASan builds. The main difference in the April 5 version is that all binaries - executables and .so files - will have an explicit /opt/gcc-10.2.0/lib64 at the beginning of their RPATH. I have confirmed that that is there in the binaries for 4861 and not for 4860; for instance, the RPATH on /opt/couchbase/lib/libcouchbase.so.6 is $ORIGIN/../lib:/opt/couchbase/lib in 4860, and /opt/gcc-10.2.0/lib64:$ORIGIN/../lib:/opt/couchbase/lib​ in 4861.

            I'm having trouble guessing how this could be causing this problem - especially on systems where /opt/gcc-10.2.0/lib64 doesn't exist, which should be virtually all of them.

            Sergey Avseyev It doesn't appear (from reading ldd output) that libcouchbase or cbc actually depend on snappy. Is it built into libcouchbase statically?

            ceej Chris Hillery added a comment - It seems 4861 was the first build with the final updated version of gcc 10 ( MB-45044 ). We had been building with gcc10 for a while, but a final change went in on April 5 to fix some RPATH issues with ASan builds. The main difference in the April 5 version is that all binaries - executables and .so files - will have an explicit /opt/gcc-10.2.0/lib64 at the beginning of their RPATH. I have confirmed that that is there in the binaries for 4861 and not for 4860; for instance, the RPATH on /opt/couchbase/lib/libcouchbase.so.6 is $ORIGIN/../lib:/opt/couchbase/lib in 4860, and /opt/gcc-10.2.0/lib64:$ORIGIN/../lib:/opt/couchbase/lib​ in 4861. I'm having trouble guessing how this could be causing this problem - especially on systems where /opt/gcc-10.2.0/lib64 doesn't exist, which should be virtually all of them. Sergey Avseyev  It doesn't appear (from reading ldd output) that libcouchbase or cbc actually depend on snappy. Is it built into libcouchbase statically?

            Chris Hillery, yes. Libcouchbase builds object file from snappy sources and link it statically

            avsej Sergey Avseyev added a comment - Chris Hillery , yes. Libcouchbase builds object file from snappy sources and link it statically

            Sergey Avseyev This ticket shouldn't be assigned to me. I don't have any clue how to fix it, I'm just helping narrow down what environmental things might have triggered the problem.

            Reverting the gcc-10 change isn't an option, so we need to understand how/why libcouchbase or snappy is responding badly to it. I can't do that.

            ceej Chris Hillery added a comment - Sergey Avseyev  This ticket shouldn't be assigned to me. I don't have any clue how to fix it, I'm just helping narrow down what environmental things might have triggered the problem. Reverting the gcc-10 change isn't an option, so we need to understand how/why libcouchbase or snappy is responding badly to it. I can't do that.

            Ok, assigning to me after all. After working with Sergey and Dave at some length, we've identified that the Snappy code inside libcouchbase doesn't like being built with gcc 10. (Build 4860, despite my earlier thoughts, was still built with gcc 7.3.0 - there were likely some earlier builds that were done with gcc 10 that would have also exhibited this problem, but they didn't happen to be caught. At some point we rolled back to gcc 7 due to ASan problems, and build 4861 was the first one after we finally rolled forward to gcc 10 for good.)

            Sergey is likely going to upgrade Snappy in libcouchbase to 1.1.8, which is much newer than the 1.1.1 that libcouchbase and Server currently use. However that is a larger change than we probably want to take in 7.0.0.

            I proposed a change to allow libcouchbase to build and link against an external snappy package rather than its vendored copy. This lets Server build libcouchbase using Server's own snappy library, which is still 1.1.1 but was compiled some time ago with gcc 7.3.0. In my local testing this did solve the reported problem.

            Sergey reviewed and merged this change, so it's ready to go. However we now need to update CC builds to use the latest libcouchbase. I have proposed the manifest change for this and requested Jeelan Poola to review - as eventing is the primary client of libcouchbase, they need to be aware of any updates to assess whether there may be other problems.

            ceej Chris Hillery added a comment - Ok, assigning to me after all. After working with Sergey and Dave at some length, we've identified that the Snappy code inside libcouchbase doesn't like being built with gcc 10. (Build 4860, despite my earlier thoughts, was still built with gcc 7.3.0 - there were likely some earlier builds that were done with gcc 10 that would have also exhibited this problem, but they didn't happen to be caught. At some point we rolled back to gcc 7 due to ASan problems, and build 4861 was the first one after we finally rolled forward to gcc 10 for good.) Sergey is likely going to upgrade Snappy in libcouchbase to 1.1.8, which is much newer than the 1.1.1 that libcouchbase and Server currently use. However that is a larger change than we probably want to take in 7.0.0. I proposed a change to allow libcouchbase to build and link against an external snappy package rather than its vendored copy. This lets Server build libcouchbase using Server's own snappy library, which is still 1.1.1 but was compiled some time ago with gcc 7.3.0. In my local testing this did solve the reported problem. Sergey reviewed and merged this change, so it's ready to go. However we now need to update CC builds to use the latest libcouchbase. I have proposed the manifest change for this and requested Jeelan Poola  to review - as eventing is the primary client of libcouchbase, they need to be aware of any updates to assess whether there may be other problems.

            Sounds like this will require a new libcouchbase release (3.1.2) as eventing policy is to only intake upstream releases, not mid-release SHAs. So this ticket is on hold until that release is available. So for now, assigning this back to Sergey Avseyev as he is working on the 3.1.2 release.

            ceej Chris Hillery added a comment - Sounds like this will require a new libcouchbase release (3.1.2) as eventing policy is to only intake upstream releases, not mid-release SHAs. So this ticket is on hold until that release is available. So for now, assigning this back to Sergey Avseyev  as he is working on the 3.1.2 release.

            Chris Hillery, libcouchbase 3.1.2 has been tagged.

            Prajwal‌ Kiran Kumar‌, you can proceed with manifest updates.

            avsej Sergey Avseyev added a comment - Chris Hillery , libcouchbase 3.1.2 has been tagged. Prajwal‌ Kiran Kumar‌ , you can proceed with manifest updates.

            Jeelan Poola I've rebased the manifest change http://review.couchbase.org/c/manifest/+/151855 and the CV run assured that it at least does not break the build. What testing do you need to do before submitting that change?

            ceej Chris Hillery added a comment - Jeelan Poola  I've rebased the manifest change http://review.couchbase.org/c/manifest/+/151855  and the CV run assured that it at least does not break the build. What testing do you need to do before submitting that change?
            jeelan.poola Jeelan Poola added a comment -

            Abhishek Jindal Request we make a Toy and run failing perf tests. And also trigger some 'rebalance' group (bucket ops, n1ql, timers, curl) of QE/test-runner tests on dev-cluster. Thank you!

            jeelan.poola Jeelan Poola added a comment - Abhishek Jindal Request we make a Toy and run failing perf tests. And also trigger some 'rebalance' group (bucket ops, n1ql, timers, curl) of QE/test-runner tests on dev-cluster. Thank you!

            FYI Ankit Prabhu put a +2 on the manifest change which I assume meant it was OK to submit, so that plus my tlm change are in now.

            ceej Chris Hillery added a comment - FYI Ankit Prabhu  put a +2 on the manifest change which I assume meant it was OK to submit, so that plus my tlm change are in now.
            abhishek.jindal Abhishek Jindal added a comment - - edited

            Apologies - Spoke too soon.
            Sergey Avseyev Unfortunately, the issue wasn't addressed with lcb 3.1.2.
            Here is the build using lcb 3.1.2 : http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/12812/

            • Create a bucket with active compression enabled.
            • Create this document:

            {
              "alr": "eventing::3429204974:tm:344:al:1619509983:128",
              "ctx": {
                "callback": "timerCallback",
                "context": "{\"docId\":\"doc1\",\"field\":\"01234567890123456789012345678901234567890123\"}",
                "vb": 344
              }
            }
            

            * Run the following cbc cat:

            /opt/couchbase/bin/cbc cat doc0 -u Administrator -P asdasd -U couchbase://localhost/test --scope _default --collection _default | xxd
            doc0                 CAS=0x1679a7db2dcb0000, Flags=0x2000006, Size=196, Datatype=0x01(JSON)
             
            0000000: 7b22 616c 7222 3a20 2265 7665 6e74 696e  {"alr": "eventin
            0000010: 673a 3a33 3432 3932 3034 3937 343a 746d  g::3429204974:tm
            0000020: 3a33 3434 3a61 6c3a 3136 3139 3530 3939  :344:al:16195099
            0000030: 3833 3a31 3238 222c 2263 7478 223a 207b  83:128","ctx": {
            0000040: 2263 616c 6c62 6163 6b22 3a20 2274 696d  "callback": "tim
            0000050: 6572 4361 6c6c 6261 636b 222c 2263 6f6e  erCallback","con
            0000060: 7465 7874 223a 2022 7b5c 2264 6f63 4964  text": "{\"docId
            0000070: 5c22 3a5c 2264 6f63 315c 222c 5c22 6669  \":\"doc1\",\"fi
            0000080: 656c 645c 223a 5c22 3031 3233 3435 3637  eld\":\"01234567
            0000090: 3839 3031 3233 3435 3637 3839 860a 003c  890123456789...<
            00000a0: 5c22 3637 3839 860a 003c 5c22 0000 0000  \"6789...<\"....
            00000b0: 0000 003c 5c22 7d22 2c22 7662 223a 2033  ...<\"}","vb": 3
            00000c0: 3434 7d7d                                44}}
            

             

            • cbc version:

            /opt/couchbase/bin/cbc version
            cbc:
              Runtime: Version=3.1.2, Changeset=f463b2eff72b016adcfc00cc2787e6d0201147fe
              Headers: Version=3.1.2, Changeset=f463b2eff72b016adcfc00cc2787e6d0201147fe
              Build Timestamp: 2021-04-27 05:28:08
              Default plugin directory: /opt/couchbase/lib/libcouchbase
              IO: Default=libevent, Current=select, Accessible=select
              SSL Runtime: OpenSSL 1.1.1k  25 Mar 2021
              SSL Headers: OpenSSL 1.1.1k  25 Mar 2021
              Snappy: 1.1.1
              Tracing: SUPPORTED
              System: Linux-4.15.0-66-generic; x86_64
              CC: GNU 10.2.0;  -fvisibility=hidden -pthread -fno-strict-aliasing -ggdb3 -pthread
              CXX: GNU 10.2.0;  -fvisibility=hidden -pthread -pedantic -Wall -Wredundant-decls -Werror=missing-braces -fno-strict-aliasing -Werror=switch -fno-strict-aliasing -ggdb3 -pthread
            

            Looks like snappy 1.1.1 is still being compiled with gcc 10.2.0 which is what has been causing a problem.

            abhishek.jindal Abhishek Jindal added a comment - - edited Apologies - Spoke too soon. Sergey Avseyev Unfortunately, the issue wasn't addressed with lcb 3.1.2. Here is the build using lcb 3.1.2 :  http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/12812/ Create a bucket with active compression enabled. Create this document: { "alr": "eventing::3429204974:tm:344:al:1619509983:128", "ctx": { "callback": "timerCallback", "context": "{\"docId\":\"doc1\",\"field\":\"01234567890123456789012345678901234567890123\"}", "vb": 344 } } * Run the following cbc cat: /opt/couchbase/bin/cbc cat doc0 -u Administrator -P asdasd -U couchbase://localhost/test --scope _default --collection _default | xxd doc0 CAS=0x1679a7db2dcb0000, Flags=0x2000006, Size=196, Datatype=0x01(JSON)   0000000: 7b22 616c 7222 3a20 2265 7665 6e74 696e {"alr": "eventin 0000010: 673a 3a33 3432 3932 3034 3937 343a 746d g::3429204974:tm 0000020: 3a33 3434 3a61 6c3a 3136 3139 3530 3939 :344:al:16195099 0000030: 3833 3a31 3238 222c 2263 7478 223a 207b 83:128","ctx": { 0000040: 2263 616c 6c62 6163 6b22 3a20 2274 696d "callback": "tim 0000050: 6572 4361 6c6c 6261 636b 222c 2263 6f6e erCallback","con 0000060: 7465 7874 223a 2022 7b5c 2264 6f63 4964 text": "{\"docId 0000070: 5c22 3a5c 2264 6f63 315c 222c 5c22 6669 \":\"doc1\",\"fi 0000080: 656c 645c 223a 5c22 3031 3233 3435 3637 eld\":\"01234567 0000090: 3839 3031 3233 3435 3637 3839 860a 003c 890123456789...< 00000a0: 5c22 3637 3839 860a 003c 5c22 0000 0000 \"6789...<\".... 00000b0: 0000 003c 5c22 7d22 2c22 7662 223a 2033 ...<\"}","vb": 3 00000c0: 3434 7d7d 44}}   cbc version: /opt/couchbase/bin/cbc version cbc: Runtime: Version=3.1.2, Changeset=f463b2eff72b016adcfc00cc2787e6d0201147fe Headers: Version=3.1.2, Changeset=f463b2eff72b016adcfc00cc2787e6d0201147fe Build Timestamp: 2021-04-27 05:28:08 Default plugin directory: /opt/couchbase/lib/libcouchbase IO: Default=libevent, Current=select, Accessible=select SSL Runtime: OpenSSL 1.1.1k 25 Mar 2021 SSL Headers: OpenSSL 1.1.1k 25 Mar 2021 Snappy: 1.1.1 Tracing: SUPPORTED System: Linux-4.15.0-66-generic; x86_64 CC: GNU 10.2.0; -fvisibility=hidden -pthread -fno-strict-aliasing -ggdb3 -pthread CXX: GNU 10.2.0; -fvisibility=hidden -pthread -pedantic -Wall -Wredundant-decls -Werror=missing-braces -fno-strict-aliasing -Werror=switch -fno-strict-aliasing -ggdb3 -pthread Looks like snappy 1.1.1 is still being compiled with gcc 10.2.0 which is what has been causing a problem.

            Build couchbase-server-7.0.0-5043 contains libcouchbase commit 83c0e28 with commit message:
            MB-45759: Allow building libcouchbase with external Snappy

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-5043 contains libcouchbase commit 83c0e28 with commit message: MB-45759 : Allow building libcouchbase with external Snappy

            Build couchbase-server-7.0.0-5043 contains tlm commit 08f04e8 with commit message:
            MB-45759: Set new LCB_SNAPPY_xxx vars so libcouchbase uses our Snappy

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-5043 contains tlm commit 08f04e8 with commit message: MB-45759 : Set new LCB_SNAPPY_xxx vars so libcouchbase uses our Snappy

            Chris Hillery, could you check that now libcouchbase is using proper snappy?

            avsej Sergey Avseyev added a comment - Chris Hillery , could you check that now libcouchbase is using proper snappy?

            Abhishek Jindal Your toy build didn't include my tlm change, which is required to activate the libcouchbase change. Also, FYI, you can't specify Gerrit changes to the "manifest" repo to the toy-unix-simple job; I've updated the parameter description to make that explicit.

            Since both changes are already merged now, you can re-try your tests with 7.0.0-5043.

            Sergey Avseyev I confirm that in 7.0.0-5043, libcouchbase is linked against Server's own libsnappy.so.

            [root@d3605a6896d2 lib]# ldd libcouchbase.so.7
                linux-vdso.so.1 =>  (0x00007ffeecffe000)
                libdl.so.2 => /lib64/libdl.so.2 (0x00007fa526c17000)
                libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fa5269fd000)
                libssl.so.1.1 => /opt/couchbase/lib/./../lib/libssl.so.1.1 (0x00007fa526769000)
                libcrypto.so.1.1 => /opt/couchbase/lib/./../lib/libcrypto.so.1.1 (0x00007fa52625c000)
                librt.so.1 => /lib64/librt.so.1 (0x00007fa526054000)
                libsnappy.so.1 => /opt/couchbase/lib/./../lib/libsnappy.so.1 (0x00007fa525e4d000)
                libm.so.6 => /lib64/libm.so.6 (0x00007fa525b4b000)
                libz.so.1 => /opt/couchbase/lib/./../lib/libz.so.1 (0x00007fa52592e000)
                libstdc++.so.6 => /opt/couchbase/lib/./../lib/libstdc++.so.6 (0x00007fa525561000)
                libgcc_s.so.1 => /opt/couchbase/lib/./../lib/libgcc_s.so.1 (0x00007fa525349000)
                libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa52512d000)
                libc.so.6 => /lib64/libc.so.6 (0x00007fa524d5f000)
                /lib64/ld-linux-x86-64.so.2 (0x00007fa527156000)
            

            ceej Chris Hillery added a comment - Abhishek Jindal  Your toy build didn't include my tlm change, which is required to activate the libcouchbase change. Also, FYI, you can't specify Gerrit changes to the "manifest" repo to the toy-unix-simple job; I've updated the parameter description to make that explicit. Since both changes are already merged now, you can re-try your tests with 7.0.0-5043. Sergey Avseyev  I confirm that in 7.0.0-5043, libcouchbase is linked against Server's own libsnappy.so. [root @d3605a6896d2 lib]# ldd libcouchbase.so. 7 linux-vdso.so. 1 => ( 0x00007ffeecffe000 ) libdl.so. 2 => /lib64/libdl.so. 2 ( 0x00007fa526c17000 ) libresolv.so. 2 => /lib64/libresolv.so. 2 ( 0x00007fa5269fd000 ) libssl.so. 1.1 => /opt/couchbase/lib/./../lib/libssl.so. 1.1 ( 0x00007fa526769000 ) libcrypto.so. 1.1 => /opt/couchbase/lib/./../lib/libcrypto.so. 1.1 ( 0x00007fa52625c000 ) librt.so. 1 => /lib64/librt.so. 1 ( 0x00007fa526054000 ) libsnappy.so. 1 => /opt/couchbase/lib/./../lib/libsnappy.so. 1 ( 0x00007fa525e4d000 ) libm.so. 6 => /lib64/libm.so. 6 ( 0x00007fa525b4b000 ) libz.so. 1 => /opt/couchbase/lib/./../lib/libz.so. 1 ( 0x00007fa52592e000 ) libstdc++.so. 6 => /opt/couchbase/lib/./../lib/libstdc++.so. 6 ( 0x00007fa525561000 ) libgcc_s.so. 1 => /opt/couchbase/lib/./../lib/libgcc_s.so. 1 ( 0x00007fa525349000 ) libpthread.so. 0 => /lib64/libpthread.so. 0 ( 0x00007fa52512d000 ) libc.so. 6 => /lib64/libc.so. 6 ( 0x00007fa524d5f000 ) /lib64/ld-linux-x86- 64 .so. 2 ( 0x00007fa527156000 )

            Chris Hillery Yes, looks like it was a problem with the toy. Can confirm that the lcb (and eventing) can now decompress the document correctly. Resolving.

            abhishek.jindal Abhishek Jindal added a comment - Chris Hillery Yes, looks like it was a problem with the toy. Can confirm that the lcb (and eventing) can now decompress the document correctly. Resolving.
            prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive) added a comment - verified for 7.0.0-5043 http://perf.jenkins.couchbase.com/job/themis/10075/

            Build couchbase-lite-core-3.1.0-205 contains tlm commit 08f04e8 with commit message:
            MB-45759: Set new LCB_SNAPPY_xxx vars so libcouchbase uses our Snappy

            build-team Couchbase Build Team added a comment - Build couchbase-lite-core-3.1.0-205 contains tlm commit 08f04e8 with commit message: MB-45759 : Set new LCB_SNAPPY_xxx vars so libcouchbase uses our Snappy

            People

              prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive)
              prajwal.kirankumar Prajwal‌ Kiran Kumar‌ (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty