Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62205

Service Memcached got OOM killed, exited with status 137

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • None
    • 7.6.2
    • couchbase-bucket
    • None
    • Enterprise Edition 7.6.2 build 3697

    Description

      Steps to reproduce:

      1. Create a cluster with 4 KV nodes with 22.6GB memory
      2. Create 1 magma bucket and 4 collections
      3. Load 50M vector data of ~13kb doc size in collection 1
      4. Load 10M vector data in collection 2 -> Seeing OOM kill of memcached.

      /var/logs:

      [Wed Jun  5 06:12:20 2024] Out of memory: Kill process 10302 (memcached) score 951 or sacrifice child
      [Wed Jun  5 06:12:20 2024] Killed process 10302 (memcached) total-vm:36464424kB, anon-rss:23687388kB, file-rss:0kB, shmem-rss:0kB
       
      root@sd3806-deb10:~# dmesg | egrep -i 'killed process'
      [45681501.936904] Killed process 24300 (memcached) total-vm:36529960kB, anon-rss:23282512kB, file-rss:0kB, shmem-rss:0kB
      [46078344.986845] Killed process 10302 (memcached) total-vm:36464424kB, anon-rss:23687388kB, file-rss:0kB, shmem-rss:0kB

      Seeing a crash on node `172.23.97.66`

      172.23.97.66: crash
      [user:info,2024-06-05T06:18:42.996-07:00,ns_1@172.23.97.66:<0.24965.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages: 

       

      ns-server log:

      [user:info,2024-06-05T06:18:42.996-07:00,ns_1@172.23.97.66:<0.24965.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages: 

       Memcached Logs:

      2024-06-05T06:18:30.234742-07:00 WARNING (bucket1) Slow runtime for 'Memory defragmenter' on thread NonIoPool0: 243 ms
      2024-06-05T06:18:30.280758-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 403 ms
      2024-06-05T06:18:30.524427-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 284 ms
      2024-06-05T06:18:30.866100-07:00 WARNING UptimeClock::tick is outside of tolerance ±100ms. expected:100ms but 217ms have elapsed. uptime:6494.89s warnings:36
      2024-06-05T06:18:30.881514-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 438 ms
      2024-06-05T06:18:30.980027-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 439 ms
      2024-06-05T06:18:31.515183-07:00 WARNING UptimeClock::tick is outside of tolerance ±100ms. expected:100ms but 292ms have elapsed. uptime:6495.51s warnings:37
      2024-06-05T06:18:31.707642-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 572 ms
      2024-06-05T06:18:31.876938-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 760 ms
      2024-06-05T06:18:32.079126-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 234 ms
      2024-06-05T06:18:32.642854-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 759 ms
      2024-06-05T06:18:32.673046-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 496 ms
      2024-06-05T06:18:43.234240-07:00 INFO ---------- Opening logfile: 
      2024-06-05T06:18:43.239660-07:00 INFO Couchbase version 7.6.2-3697 starting.
      2024-06-05T06:18:43.239687-07:00 INFO Process identifier: 14666
      2024-06-05T06:18:43.239709-07:00 INFO recalculate_max_connections: {"engine_fds":133973,"max_connections":65000,"max_fds":200000,"system_connections":5000}
      2024-06-05T06:18:43.239816-07:00 INFO Breakpad enabled. Minidumps will be written to '/opt/couchbase/var/lib/couchbase/crash'
      2024-06-05T06:18:43.241448-07:00 INFO Fine clock resolution:1549ns, overhead:1607ns
      2024-06-05T06:18:43.242963-07:00 INFO Coarse clock resolution:4000035ns, overhead:10ns
      2024-06-05T06:18:43.242972-07:00 INFO (Clock measurement period: 1ns)
      2024-06-05T06:18:43.243307-07:00 INFO Using SLA configuration: {"COMPACT_DB":{"slow":"1800 s"},"CREATE_BUCKET":{"slow":"5 s"},"DELETE_BUCKET":{"slow":"10 s"},"SELECT_BUCKET":{"slow":"10 ms"},"SEQNO_PERSISTENCE":{"slow":"30 s"},"comment":"Current MCBP SLA configuration","default":{"slow":"500 ms"},"version":1}
      2024-06-05T06:18:43.243317-07:00 INFO Enable standard input listener
      2024-06-05T06:18:43.243441-07:00 INFO NUMA: Set memory allocation policy to 'interleave'
      2024-06-05T06:18:43.243464-07:00 INFO Loading RBAC configuration from [/opt/couchbase/var/lib/couchbase/config/memcached.rbac]
      2024-06-05T06:18:43.243856-07:00 INFO Loading error maps from [/opt/couchbase/etc/couchbase/kv/error_maps]
      2024-06-05T06:18:43.245079-07:00 INFO Starting external authentication manager
      2024-06-05T06:18:43.251193-07:00 INFO Changing logging level to 0
      2024-06-05T06:18:43.251212-07:00 INFO recalculate_max_connections: {"engine_fds":133973,"max_connections":65000,"max_fds":200000,"system_connections":5000}
      2024-06-05T06:18:43.251216-07:00 INFO Initialize bucket manager
      2024-06-05T06:18:43.251238-07:00 INFO Initialize SASL
      2024-06-05T06:18:43.252604-07:00 INFO Starting network interface manager
      2024-06-05T06:18:43.252754-07:00 INFO Enable port(s)
      2024-06-05T06:18:43.252906-07:00 INFO 15 Listen on IPv4: 0.0.0.0:11210 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000}
      2024-06-05T06:18:43.253006-07:00 INFO 16 Listen on IPv4: 0.0.0.0:11209 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0}
      2024-06-05T06:18:43.253094-07:00 INFO 17 Listen on IPv4: 0.0.0.0:11207 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000}
      2024-06-05T06:18:43.253180-07:00 INFO 18 Listen on IPv4: 0.0.0.0:11206 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0}
      2024-06-05T06:18:43.253277-07:00 INFO 19 Listen on IPv6: [::]:11210 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000}
      2024-06-05T06:18:43.253361-07:00 INFO 20 Listen on IPv6: [::]:11209 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0}
      2024-06-05T06:18:43.253446-07:00 INFO 21 Listen on IPv6: [::]:11207 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000}
      2024-06-05T06:18:43.253529-07:00 INFO 22 Listen on IPv6: [::]:11206 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0}
      2024-06-05T06:18:43.254506-07:00 INFO Prometheus Exporter started, listening on family:inet port:11280
      2024-06-05T06:18:43.256530-07:00 INFO Starting Phosphor tracing with config: "buffer-mode:ring;buffer-size:20971520;enabled-categories:*"
      2024-06-05T06:18:43.263096-07:00 INFO Taskable No bucket registered with low priority

      This is pure insert workload with no indexes. No of docs present in the bucket was ~58M with 71.7% RR.

      Document structure is something like:

      {
        "sno": 1000,
        "sname": "all",
        "id": "vect1000",
        "dim": 1536,
        "vector_data": [1536 dims float array]
      }

       
      Same thing can be seen on promtimer as well

      The cluster is still LIVE -> 172.23.97.66:8091

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sarthak.dua Sarthak Dua
            sarthak.dua Sarthak Dua
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty