Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38855

[Magma] Memcached is OOM killed by kernel.

    XMLWordPrintable

Details

    Description

      Steps:
      1. Create a 4 node cluster:

      +----------------+----------+--------------+
      | Nodes          | Services | Status       |
      +----------------+----------+--------------+
      | 172.23.121.76  | kv       | Cluster node |
      | 172.23.121.79  | None     | <--- IN ---  |
      | 172.23.122.208 | None     | <--- IN ---  |
      | 172.23.122.221 | None     | <--- IN ---  |
      +----------------+----------+--------------+
      

      2. Create a default bucket with replicas=0. Load 10M items of doc_size=4096

      http://172.23.121.76:8091/pools/default/buckets with param: storageBackend=magma&replicaIndex=1&maxTTL=0&flushEnabled=0&compressionMode=off&bucketType=membase&conflictResolutionType=seqno&name=default&replicaNumber=0&ramQuotaMB=3105&threadsNumber=3&evictionPolicy=fullEviction
      

      Bucket statistics

      +---------+---------+----------+-----+----------+-------------+------------+-------------+
      | Bucket  | Type    | Replicas | TTL | Items    | RAM Quota   | RAM Used   | Disk Used   |
      +---------+---------+----------+-----+----------+-------------+------------+-------------+
      | default | membase | 0        | 0   | 10000000 | 13023313920 | 8596930056 | 44373086653 |
      +---------+---------+----------+-----+----------+-------------+------------+-------------+
      

      3. Upsert all 10M items. While upserting, OOM kill is observed for memcahced.

      Service 'memcached' exited with status 137. Restarting. Messages:
      2020-04-20T04:43:29.574444-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_1: 201 ms
      2020-04-20T04:43:29.972861-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_1: 215 ms
      2020-04-20T04:43:30.372276-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_1: 204 ms
      2020-04-20T04:43:30.603379-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_1: 231 ms
      2020-04-20T04:43:31.166528-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_0: 221 ms
      2020-04-20T04:43:31.866978-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_0: 332 ms
      2020-04-20T04:43:32.121060-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_0: 253 ms
      2020-04-20T04:43:32.345296-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_0: 224 ms
      2020-04-20T04:43:32.741603-07:00 WARNING (No Engine) Slow runtime for 'Item pager on vb:4' on thread nonIO_worker_1: 207 ms
      

      Resident ratio is around 20% at this time that means we are not under heavy DGM also and still memcahced is kill by the OS.

      On 172.23.121.76:

      Riteshs-MacBook-Pro:TAF riteshagarwal$ ssh root@172.23.121.76
      root@172.23.121.76's password:
      Last login: Mon Apr 20 03:56:17 2020 from 172.16.20.131
      ^[[A^[[A[root@localhost ~]# grep 'Killed process' /var/log/messages
      Apr 20 03:37:54 localhost kernel: Killed process 71731 (memcached), UID 996, total-vm:5005480kB, anon-rss:3238940kB, file-rss:244kB, shmem-rss:0kB
      Apr 20 03:37:54 localhost kernel: Killed process 71738 (mc:worker_0), UID 996, total-vm:5005480kB, anon-rss:3241664kB, file-rss:1036kB, shmem-rss:0kB
      Apr 20 03:45:47 localhost kernel: Killed process 73197 (memcached), UID 996, total-vm:4921512kB, anon-rss:3302840kB, file-rss:1512kB, shmem-rss:0kB
      Apr 20 03:45:47 localhost kernel: Killed process 73285 (memcached), UID 996, total-vm:4921512kB, anon-rss:3306728kB, file-rss:1804kB, shmem-rss:0kB
      Apr 20 03:48:46 localhost kernel: Killed process 74377 (memcached), UID 996, total-vm:5521580kB, anon-rss:3256616kB, file-rss:2620kB, shmem-rss:0kB
      Apr 20 04:02:01 localhost kernel: Killed process 78149 (memcached), UID 996, total-vm:4939944kB, anon-rss:3219904kB, file-rss:2360kB, shmem-rss:0kB
      Apr 20 04:08:36 localhost kernel: Killed process 79582 (memcached), UID 996, total-vm:5005480kB, anon-rss:3237372kB, file-rss:2416kB, shmem-rss:0kB
      Apr 20 04:08:36 localhost kernel: Killed process 79659 (mc:worker_0), UID 996, total-vm:5005480kB, anon-rss:3241368kB, file-rss:2596kB, shmem-rss:0kB
      Apr 20 04:08:36 localhost kernel: Killed process 79660 (mc:worker_1), UID 996, total-vm:5005480kB, anon-rss:3241544kB, file-rss:2596kB, shmem-rss:0kB
      Apr 20 04:11:23 localhost kernel: Killed process 80682 (memcached), UID 996, total-vm:5478580kB, anon-rss:3181576kB, file-rss:172kB, shmem-rss:0kB
      Apr 20 04:34:29 localhost kernel: Killed process 85249 (memcached), UID 996, total-vm:4939944kB, anon-rss:3226752kB, file-rss:2184kB, shmem-rss:0kB
      

      Attachments

        1. Screen Shot 2020-05-20 at 10.11.32 PM.png
          887 kB
          Ritesh Agarwal
        2. Screen Shot 2020-05-20 at 10.11.40 PM.png
          1.18 MB
          Ritesh Agarwal

        Issue Links

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty