Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62827

[System Test] java.lang.OutOfMemoryError: Java heap space during ingestion via Kafka links

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Ionic
    • Columnar 1.0.0
    • analytics
    • 1.0.0-2230
    • Untriaged
    • 0
    • Unknown

    Description

      Workload - 

       

      Type Number of collections Number of items in millions Total count in millions
      Remote 80 75 6000
      Standalone 50 8 4000*
      Kafka 30 33.5 ~1000

      The change from the previous runs has been the increase in the number of Kafka collections.

      *Some standalone collections have 8 mil and some have multiples of 8 million items. The total doc count is 4000 million ( 4 billion) items.
      Number of links = 6 ( 2 remote + 2 external + 2 kafka). 1 remote link and 1 kafka link is active.

       

      OOM seen on 003 and 004 -

      003 -

       

      2024-07-21T05:24:12.044+00:00 INFO CBAS.adapter.TopicRecordReader [SAO:JID:0.4085:TAID:TID:ANID:ODID:92:0:9:0:(Default.linkWCvvGTPb.b-3-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-2-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-1-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196(CB))[9]:TO] State for Topic testSystemNewesttXXUnITt.systemTest.systemTestCollectionLarge is  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      2024-07-21T05:24:12.044+00:00 INFO CBAS.adapter.TopicRecordReader [SAO:JID:0.4085:TAID:TID:ANID:ODID:92:0:5:0:(Default.linkWCvvGTPb.b-3-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-2-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-1-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196(CB))[5]:TO] State for Topic testSystemNewesttXXUnITt.systemTest.systemTestCollectionLarge is  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      2024-07-21T05:24:12.044+00:00 INFO CBAS.adapter.TopicRecordReader [SAO:JID:0.4085:TAID:TID:ANID:ODID:92:0:7:0:(Default.linkWCvvGTPb.b-3-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-2-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-1-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196(CB))[7]:TO] State for Topic testSystemNewesttXXUnITt.systemTest.systemTestCollectionLarge is  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      2024-07-21T05:24:12.044+00:00 INFO CBAS.adapter.TopicRecordReader [SAO:JID:0.4085:TAID:TID:ANID:ODID:92:0:19:0:(Default.linkWCvvGTPb.b-3-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-2-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-1-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196(CB))[19]:TO] State for Topic testSystemNewesttXXUnITt.systemTest.systemTestCollectionLarge is  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      2024-07-21T05:24:12.044+00:00 INFO CBAS.adapter.TopicRecordReader [SAO:JID:0.4085:TAID:TID:ANID:ODID:92:0:25:0:(Default.linkWCvvGTPb.b-3-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-2-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-1-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196(CB))[25]:TO] State for Topic testSystemNewesttXXUnITt.systemTest.systemTestCollectionLarge is  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      2024-07-21T05:24:12.045+00:00 INFO CBAS.adapter.TopicRecordReader [SAO:JID:0.4085:TAID:TID:ANID:ODID:92:0:12:0:(Default.linkWCvvGTPb.b-3-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-2-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196,b-1-public.qekafkatestcluster.7b9vtv.c13.kafka.us-east-1.amazonaws.com:9196(CB))[12]:TO] State for Topic testSystemNewesttXXUnITt.systemTest.systemTestCollectionLarge is  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
      2024-07-21T05:24:12.049+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7925:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7940:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7946:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7949:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7965:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7963:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7928:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7964:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7966:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      2024-07-21T05:24:12.050+00:00 INFO CBAS.adapter.TopicRecordReader [Executor-7967:d5162036730bed9965768aa18a9cdbe4] Data Polling started...
      Terminating due to java.lang.OutOfMemoryError: Java heap space
      2024-07-21T05:25:18.382+00:00 WARN CBAS.cbas analytics driver has exited w/ exit status 3
      2024-07-21T05:25:18.393+00:00 INFO CBAS.cbas exiting  

       

       

      004

      2024-07-21T05:25:24.262+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8218:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_4/storage/partition_100/Database11boluRBGY/scope0btgjupyd/LinkedDatasetskXWsAeXnf/0/LinkedDatasetskXWsAeXnf", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_100/Database11boluRBGY/scope0btgjupyd/LinkedDatasetskXWsAeXnf/0/LinkedDatasetskXWsAeXnf_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_100/Database11boluRBGY/scope0btgjupyd/LinkedDatasetskXWsAeXnf/0/LinkedDatasetskXWsAeXnf_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:24.262+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8228:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_12/storage/partition_108/Database19EtVbaMXn/scope1pAMKvKXz/LinkedDatasetuUroaejcPK/0/LinkedDatasetuUroaejcPK", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_108/Database19EtVbaMXn/scope1pAMKvKXz/LinkedDatasetuUroaejcPK/0/LinkedDatasetuUroaejcPK_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_108/Database19EtVbaMXn/scope1pAMKvKXz/LinkedDatasetuUroaejcPK/0/LinkedDatasetuUroaejcPK_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:24.262+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8305:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_10/storage/partition_122/Database16DbijEPqn/scope0SmnYSzlx/LinkedDatasetGzIKCMTSWg/0/LinkedDatasetGzIKCMTSWg", "memory" : [{"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_122/Database16DbijEPqn/scope0SmnYSzlx/LinkedDatasetGzIKCMTSWg/0/LinkedDatasetGzIKCMTSWg_virtual_0"}}, {"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"null", "index":{"class": "BTree", "file": "storage/partition_122/Database16DbijEPqn/scope0SmnYSzlx/LinkedDatasetGzIKCMTSWg/0/LinkedDatasetGzIKCMTSWg_virtual_1"}}], "disk" : 0, "num-scheduled-flushes":0, "current-memory-component":0} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:24.262+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8226:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_6/storage/partition_86/Database22UkFgeWkt/scope1XJjfUFVy/LinkedDatasetyzcsmPheLH/0/LinkedDatasetyzcsmPheLH", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_86/Database22UkFgeWkt/scope1XJjfUFVy/LinkedDatasetyzcsmPheLH/0/LinkedDatasetyzcsmPheLH_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_86/Database22UkFgeWkt/scope1XJjfUFVy/LinkedDatasetyzcsmPheLH/0/LinkedDatasetyzcsmPheLH_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:24.262+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8319:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_10/storage/partition_106/Database16DbijEPqn/scope0SmnYSzlx/LinkedDatasetGzIKCMTSWg/0/LinkedDatasetGzIKCMTSWg", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_106/Database16DbijEPqn/scope0SmnYSzlx/LinkedDatasetGzIKCMTSWg/0/LinkedDatasetGzIKCMTSWg_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_106/Database16DbijEPqn/scope0SmnYSzlx/LinkedDatasetGzIKCMTSWg/0/LinkedDatasetGzIKCMTSWg_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:24.817+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8149:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_10/storage/partition_106/Database19EtVbaMXn/scope1pAMKvKXz/LinkedDatasetuUroaejcPK/0/LinkedDatasetuUroaejcPK", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_106/Database19EtVbaMXn/scope1pAMKvKXz/LinkedDatasetuUroaejcPK/0/LinkedDatasetuUroaejcPK_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_106/Database19EtVbaMXn/scope1pAMKvKXz/LinkedDatasetuUroaejcPK/0/LinkedDatasetuUroaejcPK_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:26.972+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8235:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_10/storage/partition_90/Database9TdLzCFWt/scope1CdoPyywe/LinkedDatasetkiwYjiUuzm/0/LinkedDatasetkiwYjiUuzm", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_90/Database9TdLzCFWt/scope1CdoPyywe/LinkedDatasetkiwYjiUuzm/0/LinkedDatasetkiwYjiUuzm_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_90/Database9TdLzCFWt/scope1CdoPyywe/LinkedDatasetkiwYjiUuzm/0/LinkedDatasetkiwYjiUuzm_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      2024-07-21T05:25:26.972+00:00 INFO CBAS.runtime.TopicOffsetUpdateCallback [Executor-8239:c44c76108444df41c0cc5d85111d4b7b] triggering flush if needed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_6/storage/partition_86/Database19EtVbaMXn/scope0yKneyzBN/LinkedDatasetjBTKHjyPLf/0/LinkedDatasetjBTKHjyPLf", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[1,1]", "index":{"class": "BTree", "file": "storage/partition_86/Database19EtVbaMXn/scope0yKneyzBN/LinkedDatasetjBTKHjyPLf/0/LinkedDatasetjBTKHjyPLf_virtual_0"}}, {"state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[2,2]", "index":{"class": "BTree", "file": "storage/partition_86/Database19EtVbaMXn/scope0yKneyzBN/LinkedDatasetjBTKHjyPLf/0/LinkedDatasetjBTKHjyPLf_virtual_1"}}], "disk" : 1, "num-scheduled-flushes":0, "current-memory-component":1} to persist the topic state. reason stopping ingestion
      Terminating due to java.lang.OutOfMemoryError: Java heap space 

       

      At the time of this error, only Kafka ingestion was going on as far as I can analyse. There was no query/mutation workload. Ingestion via remote links was already complete. 

       

      Marking this as Major as the cluster was still usable after this. But if RCA deems this critical, please mark it up.

       

      cbcollect ->

      https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC1/collectinfo-2024-07-21T055816-ns_1%40svc-da-node-001.r7pgsp68tlb1w9-t.sandbox.nonprod-project-avengers.com.zip

      https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC1/collectinfo-2024-07-21T055816-ns_1%40svc-da-node-002.r7pgsp68tlb1w9-t.sandbox.nonprod-project-avengers.com.zip

      https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC1/collectinfo-2024-07-21T055816-ns_1%40svc-da-node-003.r7pgsp68tlb1w9-t.sandbox.nonprod-project-avengers.com.zip

      https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC1/collectinfo-2024-07-21T055816-ns_1%40svc-da-node-004.r7pgsp68tlb1w9-t.sandbox.nonprod-project-avengers.com.zip

       

      Supportal -> https://supportal.couchbase.com/snapshot/6221cab353b5dfd0fa019da9948047b8::20

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritik.raj Ritik Raj
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty