Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
7.6.2
-
None
-
Enterprise Edition 7.6.2 build 3697
-
Untriaged
-
-
0
-
Unknown
Description
Steps to reproduce:
- Create a cluster with 4 KV nodes with 22.6GB memory
- Create 1 magma bucket and 4 collections
- Load 50M vector data of ~13kb doc size in collection 1
- Load 10M vector data in collection 2 -> Seeing OOM kill of memcached.
/var/logs:
[Wed Jun 5 06:12:20 2024] Out of memory: Kill process 10302 (memcached) score 951 or sacrifice child
|
[Wed Jun 5 06:12:20 2024] Killed process 10302 (memcached) total-vm:36464424kB, anon-rss:23687388kB, file-rss:0kB, shmem-rss:0kB
|
|
root@sd3806-deb10:~# dmesg | egrep -i 'killed process'
|
[45681501.936904] Killed process 24300 (memcached) total-vm:36529960kB, anon-rss:23282512kB, file-rss:0kB, shmem-rss:0kB
|
[46078344.986845] Killed process 10302 (memcached) total-vm:36464424kB, anon-rss:23687388kB, file-rss:0kB, shmem-rss:0kB
|
Seeing a crash on node `172.23.97.66`
172.23.97.66: crash |
[user:info,2024-06-05T06:18:42.996-07:00,ns_1@172.23.97.66:<0.24965.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages: |
ns-server log:
[user:info,2024-06-05T06:18:42.996-07:00,ns_1@172.23.97.66:<0.24965.0>:ns_log:consume_log:76]Service 'memcached' exited with status 137. Restarting. Messages: |
Memcached Logs:
2024-06-05T06:18:30.234742-07:00 WARNING (bucket1) Slow runtime for 'Memory defragmenter' on thread NonIoPool0: 243 ms |
2024-06-05T06:18:30.280758-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 403 ms |
2024-06-05T06:18:30.524427-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 284 ms |
2024-06-05T06:18:30.866100-07:00 WARNING UptimeClock::tick is outside of tolerance ±100ms. expected:100ms but 217ms have elapsed. uptime:6494.89s warnings:36 |
2024-06-05T06:18:30.881514-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 438 ms |
2024-06-05T06:18:30.980027-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 439 ms |
2024-06-05T06:18:31.515183-07:00 WARNING UptimeClock::tick is outside of tolerance ±100ms. expected:100ms but 292ms have elapsed. uptime:6495.51s warnings:37 |
2024-06-05T06:18:31.707642-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 572 ms |
2024-06-05T06:18:31.876938-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 760 ms |
2024-06-05T06:18:32.079126-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 234 ms |
2024-06-05T06:18:32.642854-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:840' on thread NonIoPool0: 759 ms |
2024-06-05T06:18:32.673046-07:00 WARNING (bucket1) Slow runtime for 'Item pager on vb:823' on thread NonIoPool1: 496 ms |
2024-06-05T06:18:43.234240-07:00 INFO ---------- Opening logfile: |
2024-06-05T06:18:43.239660-07:00 INFO Couchbase version 7.6.2-3697 starting. |
2024-06-05T06:18:43.239687-07:00 INFO Process identifier: 14666 |
2024-06-05T06:18:43.239709-07:00 INFO recalculate_max_connections: {"engine_fds":133973,"max_connections":65000,"max_fds":200000,"system_connections":5000} |
2024-06-05T06:18:43.239816-07:00 INFO Breakpad enabled. Minidumps will be written to '/opt/couchbase/var/lib/couchbase/crash' |
2024-06-05T06:18:43.241448-07:00 INFO Fine clock resolution:1549ns, overhead:1607ns |
2024-06-05T06:18:43.242963-07:00 INFO Coarse clock resolution:4000035ns, overhead:10ns |
2024-06-05T06:18:43.242972-07:00 INFO (Clock measurement period: 1ns) |
2024-06-05T06:18:43.243307-07:00 INFO Using SLA configuration: {"COMPACT_DB":{"slow":"1800 s"},"CREATE_BUCKET":{"slow":"5 s"},"DELETE_BUCKET":{"slow":"10 s"},"SELECT_BUCKET":{"slow":"10 ms"},"SEQNO_PERSISTENCE":{"slow":"30 s"},"comment":"Current MCBP SLA configuration","default":{"slow":"500 ms"},"version":1} |
2024-06-05T06:18:43.243317-07:00 INFO Enable standard input listener |
2024-06-05T06:18:43.243441-07:00 INFO NUMA: Set memory allocation policy to 'interleave' |
2024-06-05T06:18:43.243464-07:00 INFO Loading RBAC configuration from [/opt/couchbase/var/lib/couchbase/config/memcached.rbac] |
2024-06-05T06:18:43.243856-07:00 INFO Loading error maps from [/opt/couchbase/etc/couchbase/kv/error_maps] |
2024-06-05T06:18:43.245079-07:00 INFO Starting external authentication manager |
2024-06-05T06:18:43.251193-07:00 INFO Changing logging level to 0 |
2024-06-05T06:18:43.251212-07:00 INFO recalculate_max_connections: {"engine_fds":133973,"max_connections":65000,"max_fds":200000,"system_connections":5000} |
2024-06-05T06:18:43.251216-07:00 INFO Initialize bucket manager |
2024-06-05T06:18:43.251238-07:00 INFO Initialize SASL |
2024-06-05T06:18:43.252604-07:00 INFO Starting network interface manager |
2024-06-05T06:18:43.252754-07:00 INFO Enable port(s) |
2024-06-05T06:18:43.252906-07:00 INFO 15 Listen on IPv4: 0.0.0.0:11210 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000} |
2024-06-05T06:18:43.253006-07:00 INFO 16 Listen on IPv4: 0.0.0.0:11209 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0} |
2024-06-05T06:18:43.253094-07:00 INFO 17 Listen on IPv4: 0.0.0.0:11207 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000} |
2024-06-05T06:18:43.253180-07:00 INFO 18 Listen on IPv4: 0.0.0.0:11206 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0} |
2024-06-05T06:18:43.253277-07:00 INFO 19 Listen on IPv6: [::]:11210 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000} |
2024-06-05T06:18:43.253361-07:00 INFO 20 Listen on IPv6: [::]:11209 Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0} |
2024-06-05T06:18:43.253446-07:00 INFO 21 Listen on IPv6: [::]:11207 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":5000} |
2024-06-05T06:18:43.253529-07:00 INFO 22 Listen on IPv6: [::]:11206 (TLS) Properties: {"so_keepalive":1,"so_linger":"off","so_rcvbuf":131072,"so_sndbuf":16384,"tcp_keepcnt":9,"tcp_keepidle":7200,"tcp_keepintvl":75,"tcp_user_timeout":0} |
2024-06-05T06:18:43.254506-07:00 INFO Prometheus Exporter started, listening on family:inet port:11280 |
2024-06-05T06:18:43.256530-07:00 INFO Starting Phosphor tracing with config: "buffer-mode:ring;buffer-size:20971520;enabled-categories:*" |
2024-06-05T06:18:43.263096-07:00 INFO Taskable No bucket registered with low priority |
This is pure insert workload with no indexes. No of docs present in the bucket was ~58M with 71.7% RR.
Document structure is something like:
{
|
"sno": 1000, |
"sname": "all", |
"id": "vect1000", |
"dim": 1536, |
"vector_data": [1536 dims float array] |
}
|
Same thing can be seen on promtimer as well
The cluster is still LIVE -> 172.23.97.66:8091