Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44878

Adjust default decimation levels to reduce stats resource consumption

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Cheshire-Cat
    • 7.0.2
    • ns_server
    • None

    Description

      The current stats decimation levels are the following. This ticket tracks either confirming them for the initial release or changing them to a different default.

      decimation_definitions_default() ->
          [%% No decimation for the first 3 days
           {low, 3 * ?SECS_IN_DAY, skip},
           %% Keep 1 per minute for the next 4 days
           {medium, 4 * ?SECS_IN_DAY, 60},
           %% Keep 1 hour for the next 359 days
           {large, 359 * ?SECS_IN_DAY, 6 * 60 * 60}].
      
      

      Attachments

        1. 1hour.png
          1hour.png
          235 kB
        2. 24hour.png
          24hour.png
          162 kB
        3. 2hour.png
          2hour.png
          229 kB
        4. 30hour.png
          30hour.png
          150 kB
        5. 4hour.png
          4hour.png
          187 kB
        6. 5days.png
          5days.png
          155 kB
        7. image-2021-06-01-10-46-07-118.png
          image-2021-06-01-10-46-07-118.png
          119 kB
        8. image-2021-06-01-10-48-55-421.png
          image-2021-06-01-10-48-55-421.png
          122 kB
        9. Prom2Days.png
          Prom2Days.png
          140 kB
        10. screenshot-1.png
          screenshot-1.png
          102 kB
        11. screenshot-2.png
          screenshot-2.png
          39 kB
        12. Screen Shot 2021-05-24 at 11.40.40 AM.png
          Screen Shot 2021-05-24 at 11.40.40 AM.png
          63 kB
        13. Screen Shot 2021-05-27 at 10.32.56 AM.png
          Screen Shot 2021-05-27 at 10.32.56 AM.png
          158 kB
        14. Screen Shot 2021-07-09 at 11.05.00 AM.png
          Screen Shot 2021-07-09 at 11.05.00 AM.png
          46 kB
        15. Screen Shot 2021-07-19 at 10.31.49 AM.png
          Screen Shot 2021-07-19 at 10.31.49 AM.png
          127 kB
        16. Screen Shot 2021-07-21 at 10.26.08 AM.png
          Screen Shot 2021-07-21 at 10.26.08 AM.png
          159 kB
        17. screenshot-3.png
          screenshot-3.png
          179 kB
        18. screenshot-4.png
          screenshot-4.png
          299 kB
        19. screenshot-5.png
          screenshot-5.png
          148 kB
        20. screenshot-6.png
          screenshot-6.png
          131 kB
        21. UIZoomDay.png
          UIZoomDay.png
          94 kB
        22. UIZoomMonth.png
          UIZoomMonth.png
          92 kB
        23. UIZoomWeek.png
          UIZoomWeek.png
          93 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.1.0-1063 contains ns_server commit c31ac12 with commit message:
            MB-44878 Change stats decimation levels

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1063 contains ns_server commit c31ac12 with commit message: MB-44878 Change stats decimation levels

            Here's the latest cbcollect https://s3.amazonaws.com/cb-engineering/stevewatanabe-19JUL21-AWS/collectinfo-2021-07-19T165101-ns_1%40127.0.0.1.zip

            The prometheus display on the running system shows < 8 days of stats

            which from the directory usage indicates it's limited by --storage.tsdb.retention.size 1024 (MB)

            [root@ip-172-31-27-34 stats_data]# ll
            total 20
            drwxrwx--- 3 couchbase couchbase    68 Jul 13 23:00 01FAHSJXQ54RHRYA677BCDNRD2
            drwxrwx--- 3 couchbase couchbase    68 Jul 14 17:00 01FAKQCG11V5K1Z5VETB59PWAP
            drwxrwx--- 3 couchbase couchbase    68 Jul 15 11:00 01FANN62AW57QA8ZND3EKG6CZ9
            drwxrwx--- 3 couchbase couchbase    68 Jul 16 05:00 01FAQJZMN01D4G8GR9FACWYZXD
            drwxrwx--- 3 couchbase couchbase    68 Jul 16 23:00 01FASGS6YTDJGC4VD4ZHQ15NJD
            drwxrwx--- 3 couchbase couchbase    68 Jul 17 17:00 01FAVEJS8PPFQ4NK9JTB694812
            drwxrwx--- 3 couchbase couchbase    68 Jul 18 11:00 01FAXCCBJEE438NTC2Z168G2CH
            drwxrwx--- 3 couchbase couchbase    68 Jul 19 05:00 01FAZA5XWC1SRX8MRQJS90A6KJ
            drwxrwx--- 3 couchbase couchbase    68 Jul 19 10:00 01FAZVBJMJY71JAWYR38G90NH2
            drwxrwx--- 3 couchbase couchbase    68 Jul 19 10:30 01FAZX40RV5AJSMSJTVEGT13ZQ
            drwxrwx--- 3 couchbase couchbase    68 Jul 19 10:30 01FAZX42AZK50KNF5E51PMZ25T
            drwxrwx--- 3 couchbase couchbase    68 Jul 19 10:30 01FAZX4352FS85DKQ08Q4FY7M5
            drwxrwx--- 2 couchbase couchbase    34 Jul 19 10:00 chunks_head
            -rw-rw---- 1 couchbase couchbase 20001 Jul 19 10:30 queries.active
            drwxrwx--- 3 couchbase couchbase    81 Jul 19 10:00 wal
            [root@ip-172-31-27-34 stats_data]# du -sh
            960M	.
            

            steve.watanabe Steve Watanabe added a comment - Here's the latest cbcollect https://s3.amazonaws.com/cb-engineering/stevewatanabe-19JUL21-AWS/collectinfo-2021-07-19T165101-ns_1%40127.0.0.1.zip The prometheus display on the running system shows < 8 days of stats which from the directory usage indicates it's limited by --storage.tsdb.retention.size 1024 (MB) [root@ip-172-31-27-34 stats_data]# ll total 20 drwxrwx--- 3 couchbase couchbase 68 Jul 13 23:00 01FAHSJXQ54RHRYA677BCDNRD2 drwxrwx--- 3 couchbase couchbase 68 Jul 14 17:00 01FAKQCG11V5K1Z5VETB59PWAP drwxrwx--- 3 couchbase couchbase 68 Jul 15 11:00 01FANN62AW57QA8ZND3EKG6CZ9 drwxrwx--- 3 couchbase couchbase 68 Jul 16 05:00 01FAQJZMN01D4G8GR9FACWYZXD drwxrwx--- 3 couchbase couchbase 68 Jul 16 23:00 01FASGS6YTDJGC4VD4ZHQ15NJD drwxrwx--- 3 couchbase couchbase 68 Jul 17 17:00 01FAVEJS8PPFQ4NK9JTB694812 drwxrwx--- 3 couchbase couchbase 68 Jul 18 11:00 01FAXCCBJEE438NTC2Z168G2CH drwxrwx--- 3 couchbase couchbase 68 Jul 19 05:00 01FAZA5XWC1SRX8MRQJS90A6KJ drwxrwx--- 3 couchbase couchbase 68 Jul 19 10:00 01FAZVBJMJY71JAWYR38G90NH2 drwxrwx--- 3 couchbase couchbase 68 Jul 19 10:30 01FAZX40RV5AJSMSJTVEGT13ZQ drwxrwx--- 3 couchbase couchbase 68 Jul 19 10:30 01FAZX42AZK50KNF5E51PMZ25T drwxrwx--- 3 couchbase couchbase 68 Jul 19 10:30 01FAZX4352FS85DKQ08Q4FY7M5 drwxrwx--- 2 couchbase couchbase 34 Jul 19 10:00 chunks_head -rw-rw---- 1 couchbase couchbase 20001 Jul 19 10:30 queries.active drwxrwx--- 3 couchbase couchbase 81 Jul 19 10:00 wal [root@ip-172-31-27-34 stats_data]# du -sh 960M .

            With the decimation levels made via this ticket stats for more days are maintained. In the AWS long-runnning 30 bucket cluster we see it's been over 9 days.

            I don't plan on any additional changes for the 7.0.1 release.

            steve.watanabe Steve Watanabe added a comment - With the decimation levels made via this ticket stats for more days are maintained. In the AWS long-runnning 30 bucket cluster we see it's been over 9 days. I don't plan on any additional changes for the 7.0.1 release.

            Steve Watanabe - Is this a candidate for request-dev-verify? Or shall we setup a 1 node 30 bucket cluster just like we did on MB-47502 and we can review it later?

            In fact I believe we can upgrade the the cluster we setup for MB-47502 which is in 7.0.1 to 7.0.2 and review it later. Let me know your thoughts.

             

             

            Balakumaran.Gopal Balakumaran Gopal added a comment - Steve Watanabe  - Is this a candidate for request-dev-verify? Or shall we setup a 1 node 30 bucket cluster just like we did on  MB-47502 and we can review it later? In fact I believe we can upgrade the the cluster we setup for MB-47502 which is in 7.0.1 to 7.0.2 and review it later. Let me know your thoughts.    

            Balakumaran Gopal Stats decimation has been disabled in 7.0.2 due to memory leak in Prometheus. So there's no testing needed in this release. Also I've been running a couple of AWS experiments using 7.0.2 with decimation enabled and see longer durations of stats....here's an example

             

            steve.watanabe Steve Watanabe added a comment - Balakumaran Gopal  Stats decimation has been disabled in 7.0.2 due to memory leak in Prometheus. So there's no testing needed in this release. Also I've been running a couple of AWS experiments using 7.0.2 with decimation enabled and see longer durations of stats....here's an example  

            People

              steve.watanabe Steve Watanabe
              steve.watanabe Steve Watanabe
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty