Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45615

[System Test] : OOM kills and crashes with errors "fatal error: runtime: out of memory" seen multiple times on 1 query node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • query

    Description

      Build : 7.0.0-4916
      Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat_dgm.yml
      Scale : 2
      Iteration : 1st onwards

      On query node 172.23.97.236, cbq-engine was OOM killed by the OS multiple times and there were several crashes seen during the test. Interestingly the other query node did not have any of these symptoms.

      172.23.97.236 : crash
      [user:info,2021-04-11T19:21:54.053-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-11T21:15:58.723-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-11T21:54:19.666-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-12T04:18:49.522-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 137. Restarting. Messages: --> this was intentionally done in the test
      [user:info,2021-04-12T05:06:43.218-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-12T15:18:11.996-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-12T15:44:24.123-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-12T16:11:57.985-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-12T16:40:47.016-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 137. Restarting. Messages: --> This is an OOM kill by the OS.
      [user:info,2021-04-12T17:06:13.708-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      [user:info,2021-04-12T17:35:30.518-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 137. Restarting. Messages: --> This is an OOM kill by the OS.
      [user:info,2021-04-12T17:53:52.005-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 137. Restarting. Messages: --> This is an OOM kill by the OS.
      [user:info,2021-04-12T19:29:41.798-07:00,ns_1@172.23.97.236:<0.9317.0>:ns_log:crash_consumption_loop:63]Service 'query' exited with status 2. Restarting. Messages:
      

      Partial stack trace from the first crash :

      2021-04-11T19:21:46.102-07:00 [Info] GSIC[default/bucket2-scope_2-coll_10-1618179081787262788] request(666b4f57-197b-4a32-95cb-4be101efc750) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results6183086626225 ...
      2021-04-11T19:21:48.976-07:00 [Info] GSIC[default/bucket1-scope_1-coll_1-1618178914532332248] 66f2b73b-9db3-4b3b-baf7-33b83713092c new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results6183662078325
      2021-04-11T19:21:48.744-07:00 [Info] GSIC[default/bucket2-scope_2-coll_10-1618179081787262788] f50632c9-3888-4b81-9a2e-4feb21e3cf53 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results6183078927086
      fatal error: runtime: out of memory
       
      runtime stack:
      runtime.throw(0x250559d, 0x16)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/panic.go:774 +0x72
      runtime.sysMap(0xc648000000, 0x4000000, 0x3b26718)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/mem_linux.go:169 +0xc5
      runtime.(*mheap).sysAlloc(0x3b0d460, 0x2000, 0x2000, 0x135dd8da8)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/malloc.go:701 +0x1cd
      runtime.(*mheap).grow(0x3b0d460, 0x1, 0xffffffff)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/mheap.go:1255 +0xa3
      runtime.(*mheap).allocSpanLocked(0x3b0d460, 0x1, 0x3b26728, 0xc00005c720)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/mheap.go:1170 +0x266
      runtime.(*mheap).alloc_m(0x3b0d460, 0x1, 0xb, 0xc647fa4640)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/mheap.go:1022 +0xc2
      runtime.(*mheap).alloc.func1()
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/mheap.go:1093 +0x4c
      runtime.systemstack(0x0)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/asm_amd64.s:370 +0x66
      runtime.mstart()
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/runtime/proc.go:1146

      The other query node in the cluster is 172.23.97.227.

      Attachments

        1. pmap_172.23.107.47.out
          71 kB
        2. pmap_172.23.107.54.out
          71 kB
        3. pmap.out
          93 kB
        4. query_dumps0413_toybuild.zip
          1.50 MB
        5. query_dumps0413.zip
          4.34 MB

        Issue Links

          For Gerrit Dashboard: MB-45615
          # Subject Branch Project Status CR V

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty