Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51950

[Magma] Magma bucket not honouring RAM quota allocated when analytics is ingesting data from it.

    XMLWordPrintable

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.1.1
    • 7.1.2
    • couchbase-bucket
    • Enterprise Edition 7.1.1 build 3025
    • Untriaged
    • Centos 64-bit
    • 1
    • No

    Description

      Observation -
      Although only 1 GB RAM is allocated to the magma bucket, the RAM usage is reaching around 3.9 GB. This is observed when analytics is trying to ingest data into it's datasets.

      Cluster Info -

      Node Services CPU_utilization Mem_total Mem_free Swap_mem_used Active / Replica Version
      172.23.108.0 cbas 2.46478873239 3.67 GiB 2.50 GiB 0.0 Byte / 3.50 GiB 0 / 0 7.1.1-3025-enterprise
      172.23.108.1 cbas 2.19143576826 3.67 GiB 2.67 GiB 0.0 Byte / 3.50 GiB 0 / 0 7.1.1-3025-enterprise
      172.23.108.102 kv 1.91194968553 3.67 GiB 3.05 GiB 72.00 MiB / 3.50 GiB 0 / 0 7.1.1-3025-enterprise
      172.23.108.100 kv, n1ql 2.16243399547 3.67 GiB 2.94 GiB 3.75 MiB / 3.50 GiB 0 / 0 7.1.1-3025-enterprise

       Steps to reproduce -
      1. Create cluster as mentioned above.
      2. Create a magma bucket with 512 MB RAM (since there are 2 KV nodes, total RAM for bucket becomes 1 GB) allocated to it and replica set as 1.
      3. Create 10 scopes (including the default) and each scope should have 25 collections.
      4. Load 5300 docs in each collection. Each doc size is 1 KB.
      5. Create dataset on each of the collection of the KV bucket created above.
      6. It is observed that the RAM usage of the bucket is exceeding the RAM allocation.

      RAM Quota :

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            ben.huddleston Ben Huddleston added a comment - - edited

            Thanks Michael Blow.

            I understand that a stream per collection probably makes code simpler on the analytics side, but it does impact the data nodes in cases like this where we have thousands of streams. We could probably chalk this up as a sizing issue should a customer hit it, and we've seen cases during the testing of magma with 2i where the number of collections and indexes we were testing with was inappropriate for the spec of the cluster. That being said, Varun did a good write up on how 2i manages streams here https://issues.couchbase.com/browse/MB-48532?focusedCommentId=547098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-547098. They create init streams which (ideally) use OSO backfills for single collections, and then 1 maintenance stream which streams the entire bucket to keep the collections up to date. This would reduce the number of streams required dramatically, and whilst it isn't a perfect solution for all cases (say you want to add analytics to only 1 of 250 collections) it is in keeping with how all DCP services have streamed DCP in the past (i.e. bucket wide). It might be worth chatting with the 2i team about this.

            Umang I can't comment on the CBAS quota, but I suspect that 1GB bucket quota might not be enough. We need to fix MB-51968 first to say for sure though.

            ben.huddleston Ben Huddleston added a comment - - edited Thanks Michael Blow . I understand that a stream per collection probably makes code simpler on the analytics side, but it does impact the data nodes in cases like this where we have thousands of streams. We could probably chalk this up as a sizing issue should a customer hit it, and we've seen cases during the testing of magma with 2i where the number of collections and indexes we were testing with was inappropriate for the spec of the cluster. That being said, Varun did a good write up on how 2i manages streams here https://issues.couchbase.com/browse/MB-48532?focusedCommentId=547098&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-547098 . They create init streams which (ideally) use OSO backfills for single collections, and then 1 maintenance stream which streams the entire bucket to keep the collections up to date. This would reduce the number of streams required dramatically, and whilst it isn't a perfect solution for all cases (say you want to add analytics to only 1 of 250 collections) it is in keeping with how all DCP services have streamed DCP in the past (i.e. bucket wide). It might be worth chatting with the 2i team about this. Umang I can't comment on the CBAS quota, but I suspect that 1GB bucket quota might not be enough. We need to fix MB-51968 first to say for sure though.
            michael.blow Michael Blow added a comment -

            We won’t use one stream per collection after initial backfill if/when Morpheus work is complete, it just did not fit into Cheshire-Cat and Neo. We wouldn’t stream the entire bucket, but we will only opt in for collections we are interested in.

            michael.blow Michael Blow added a comment - We won’t use one stream per collection after initial backfill if/when Morpheus work is complete, it just did not fit into Cheshire-Cat and Neo. We wouldn’t stream the entire bucket, but we will only opt in for collections we are interested in.

            Perfect, thanks Michael Blow.

            Umang, I'll leave this assigned to me for now, once we fix MB-51968 we can run the test again.

            ben.huddleston Ben Huddleston added a comment - Perfect, thanks Michael Blow . Umang , I'll leave this assigned to me for now, once we fix MB-51968 we can run the test again.
            ben.huddleston Ben Huddleston added a comment - - edited

            MB-51968 is fixed in build 7.2.0-1106. Not sure what the status is with 7.1.1 changes, and if it makes the bar, will check next week.

            ben.huddleston Ben Huddleston added a comment - - edited MB-51968 is fixed in build 7.2.0-1106. Not sure what the status is with 7.1.1 changes, and if it makes the bar, will check next week.
            owend Daniel Owen added a comment -

            Awaiting to see if the issue still existing now that MB-51968 has been resolved.

            owend Daniel Owen added a comment - Awaiting to see if the issue still existing now that MB-51968 has been resolved.

            People

              umang.agrawal Umang
              umang.agrawal Umang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty