Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34378

Optimize vbucket-details stats call to only return requested keys

    XMLWordPrintable

Details

    Description

      Currently, we have vbucket-details call that returns all the keys for respective vbuckets. However, getting all the keys when only few are needed is not optimal. 

      The vbucket-details stats call should take the keys requested and only return those key value pairs. 

       

      Due to the fact that ns_server uses vbucket-details to determine state and topology on every janitor cleanup run, this will vastly improve janitor cleanup times and also failover time(calls janitor cleanup).  

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Abhijeeth.Nuthan Abhijeeth Nuthan created issue -
            Abhijeeth.Nuthan Abhijeeth Nuthan made changes -
            Field Original Value New Value
            Epic Link MB-30048 [ 86039 ]
            Abhijeeth.Nuthan Abhijeeth Nuthan made changes -
            Link This issue blocks MB-33875 [ MB-33875 ]
            drigby Dave Rigby added a comment - See also Changes in ns_server for Sync Replication - STAT vbucket-details .
            drigby Dave Rigby made changes -
            CVSS/Severity High [ 11153 ]
            drigby Dave Rigby made changes -
            Component/s couchbase-bucket [ 10173 ]
            drigby Dave Rigby made changes -
            Issue Type Bug [ 1 ] Improvement [ 4 ]
            owend Daniel Owen made changes -
            Rank Ranked higher
            drigby Dave Rigby made changes -
            Labels mad-hatter-committed
            owend Daniel Owen added a comment -

            Hey Dave Finlay
            As this is a request from ns_server, we wanted to know whether we should keep this mad-hatter-commited (or can we convert to a candidate)?

            thanks

            owend Daniel Owen added a comment - Hey Dave Finlay As this is a request from ns_server, we wanted to know whether we should keep this mad-hatter-commited (or can we convert to a candidate)? thanks
            owend Daniel Owen made changes -
            Assignee Dave Rigby [ drigby ] Dave Finlay [ dfinlay ]
            owend Daniel Owen added a comment -

            Hi Dave Finlay In fact as a better action... will make as a candidate that we will try and do post beta if we have time.
            ns_server team can upgrade to committed - if deemed important enough.

            thanks

            owend Daniel Owen added a comment - Hi Dave Finlay In fact as a better action... will make as a candidate that we will try and do post beta if we have time. ns_server team can upgrade to committed - if deemed important enough. thanks
            owend Daniel Owen made changes -
            Assignee Dave Finlay [ dfinlay ] Daniel Owen [ owend ]
            owend Daniel Owen made changes -
            Labels mad-hatter-committed mad-hatter-candidate
            owend Daniel Owen made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]
            Fix Version/s Mad-Hatter [ 15037 ]

            ns_server makes this stat call on each node twice per bucket on each janitor run. In my measurements the size of the payload is over 1MiB for 1024 vbuckets. Each stat call takes over a 100ms and burns a lot of CPU, because out of ~50 stats per vbucket that are returned ns_server uses only two in the common case. With 30 buckets on my laptop ns_server uses about 60-80% of a CPU core all the time just making these stat calls. So I think we should do something about it.

            Note, that one could argue that there should be no need for ns_server to query this information so frequently. And I would agree one hundred per cent. Unfortunately this is one of those areas in ns_server that has a lot of legacy and that is scary to touch for exactly this reason. Certainly not at this point in the release cycle. So the only short-term option that I see is to enable ns_server to get the information it needs as cheaply as possible. And then in future releases we might want to revisit the approach altogether.

            Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - ns_server makes this stat call on each node twice per bucket on each janitor run. In my measurements the size of the payload is over 1MiB for 1024 vbuckets. Each stat call takes over a 100ms and burns a lot of CPU, because out of ~50 stats per vbucket that are returned ns_server uses only two in the common case. With 30 buckets on my laptop ns_server uses about 60-80% of a CPU core all the time just making these stat calls. So I think we should do something about it. Note, that one could argue that there should be no need for ns_server to query this information so frequently. And I would agree one hundred per cent. Unfortunately this is one of those areas in ns_server that has a lot of legacy and that is scary to touch for exactly this reason. Certainly not at this point in the release cycle. So the only short-term option that I see is to enable ns_server to get the information it needs as cheaply as possible. And then in future releases we might want to revisit the approach altogether.
            drigby Dave Rigby added a comment -

            Aliaksey Artamonau Thanks.

            If we want to try to improve the in the Mad-Hatter timeframe, I don't think we have time to implement the fully-flexible proposal outlined in Changes in ns_server for Sync Replication - STAT vbucket-details. I think what is feasible is something along the lines of the following:

            1. Add a new stat group - vbucket-details-durability <VBID> or similar which returns just the 2 stats ns_server typically needs. That would reduce down to 1024 STAT.request packets and 2048 STAT.response packets per call - i.e 25x fewer.
              1. We could actually encode this along the lines of the above proposal - i.e. STAT(key="vbucket-details", value="[\"state\", \"high_seqno\"]" - with the limitation that kv_enigne literally only supports filtering for those two specific stats. That should make it easier to expand to support a greater list of stats which can be requested in future.
            2. If the reduction in stats to 2 per vBucket still isn't sufficient (i.e. the problem is sending 1024x request and decoding 2048x response TCP packets; then we might be able to add a single stat call which returns the necessary stats for all vBuckets in a single JSON payload - i guess it depends on where the cost lies in ns_server.

            In the case of (1) we can definitely get you some PoC code in a day or so to test out and see what difference it makes; for (2) that's a bit more work (need to do a bunch of aggregation across all vBuckets) so I'd rather see the results of (1) first and see if that's sufficient.

            Let me know if you want us to prepare a patch for testing - including exactly which stats you need in the new stat group.

            drigby Dave Rigby added a comment - Aliaksey Artamonau Thanks. If we want to try to improve the in the Mad-Hatter timeframe, I don't think we have time to implement the fully-flexible proposal outlined in Changes in ns_server for Sync Replication - STAT vbucket-details . I think what is feasible is something along the lines of the following: Add a new stat group - vbucket-details-durability <VBID> or similar which returns just the 2 stats ns_server typically needs. That would reduce down to 1024 STAT.request packets and 2048 STAT.response packets per call - i.e 25x fewer. We could actually encode this along the lines of the above proposal - i.e. STAT(key="vbucket-details", value=" [\"state\", \"high_seqno\"] " - with the limitation that kv_enigne literally only supports filtering for those two specific stats. That should make it easier to expand to support a greater list of stats which can be requested in future. If the reduction in stats to 2 per vBucket still isn't sufficient (i.e. the problem is sending 1024x request and decoding 2048x response TCP packets; then we might be able to add a single stat call which returns the necessary stats for all vBuckets in a single JSON payload - i guess it depends on where the cost lies in ns_server. In the case of (1) we can definitely get you some PoC code in a day or so to test out and see what difference it makes; for (2) that's a bit more work (need to do a bunch of aggregation across all vBuckets) so I'd rather see the results of (1) first and see if that's sufficient. Let me know if you want us to prepare a patch for testing - including exactly which stats you need in the new stat group.
            dfinlay Dave Finlay added a comment - - edited

            Dave Rigby, Daniel Owen: Can I ask you to look into creating a PoC ns_server could play with along the lines of what you suggest in (1), Dave. This is with a view to getting it into Mad Hatter. I spoke with Aliaksey - actually ns_server needs the following 4 stats:

            1. state
            2. topology
            3. high_seqno
            4. high_prepared_seqno

            For this I suggest the names: vbucket-durability-state or vbucket-durable-state. I'd like us to do a before and after and see what the savings look like. Based on the measurements so far, my sense is that it will be worth a lot. Let me know what you think.

            dfinlay Dave Finlay added a comment - - edited Dave Rigby , Daniel Owen : Can I ask you to look into creating a PoC ns_server could play with along the lines of what you suggest in (1), Dave. This is with a view to getting it into Mad Hatter. I spoke with Aliaksey - actually ns_server needs the following 4 stats: state topology high_seqno high_prepared_seqno For this I suggest the names: vbucket-durability-state or vbucket-durable-state . I'd like us to do a before and after and see what the savings look like. Based on the measurements so far, my sense is that it will be worth a lot. Let me know what you think.
            drigby Dave Rigby added a comment -

            James, see the last couple of comments on this - could you create a patch for the ns_server team to experiment with along the lines of what I and DaveF described?

            drigby Dave Rigby added a comment - James, see the last couple of comments on this - could you create a patch for the ns_server team to experiment with along the lines of what I and DaveF described?
            drigby Dave Rigby made changes -
            Assignee Daniel Owen [ owend ] James Harrison [ james.harrison ]
            james.harrison James Harrison made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            PoC up at http://review.couchbase.org/#/c/117070/

            provides the stats as below

            $ ~/couchbase/install/bin/cbstats localhost:12000 -u Administrator -p asdasd vbucket-durability-state | head
             vb_0:                        active
             vb_0:high_prepared_seqno:    0
             vb_0:high_seqno:             0
             vb_0:topology:               [["n_0@cb.local",null]]
             vb_1:                        active
             vb_1:high_prepared_seqno:    0
             vb_1:high_seqno:             0
             vb_1:topology:               [["n_0@cb.local",null]]
             vb_2:                        active
             vb_2:high_prepared_seqno:    0
            ........
            $ ~/couchbase/install/bin/cbstats localhost:12000 -u Administrator -p asdasd vbucket-durability-state 12
             vb_12:                     active
             vb_12:high_prepared_seqno: 0
             vb_12:high_seqno:          0
             vb_12:topology:            [["n_0@cb.local",null]]
            $
            

            james.harrison James Harrison added a comment - PoC up at http://review.couchbase.org/#/c/117070/ provides the stats as below $ ~/couchbase/install/bin/cbstats localhost:12000 -u Administrator -p asdasd vbucket-durability-state | head vb_0: active vb_0:high_prepared_seqno: 0 vb_0:high_seqno: 0 vb_0:topology: [["n_0@cb.local",null]] vb_1: active vb_1:high_prepared_seqno: 0 vb_1:high_seqno: 0 vb_1:topology: [["n_0@cb.local",null]] vb_2: active vb_2:high_prepared_seqno: 0 ........ $ ~/couchbase/install/bin/cbstats localhost:12000 -u Administrator -p asdasd vbucket-durability-state 12 vb_12: active vb_12:high_prepared_seqno: 0 vb_12:high_seqno: 0 vb_12:topology: [["n_0@cb.local",null]] $
            james.harrison James Harrison made changes -
            Status In Progress [ 3 ] Open [ 1 ]

            Artem Stemkovski, can you please work on the patch utilizing this new stats call? Thanks.

            Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - Artem Stemkovski , can you please work on the patch utilizing this new stats call? Thanks.
            Aliaksey Artamonau Aliaksey Artamonau (Inactive) made changes -
            Assignee James Harrison [ james.harrison ] Artem Stemkovski [ artem ]

            Dave Rigby Maybe instead of string "vbucket-durability-state" that corresponds to the pre-defined set of keys we can pass into the same API a comma separated list of keys. For example "high_prepared_seqno,high_seqno,topology". That will make the whole thing flexible and will also cover the compaction use case where we need just "db_data_size,db_file_size".

            artem Artem Stemkovski added a comment - Dave Rigby Maybe instead of string "vbucket-durability-state" that corresponds to the pre-defined set of keys we can pass into the same API a comma separated list of keys. For example "high_prepared_seqno,high_seqno,topology". That will make the whole thing flexible and will also cover the compaction use case where we need just "db_data_size,db_file_size".
            dfinlay Dave Finlay added a comment -

            Artem Stemkovski: Yes, indeed, longer term what we want to do is to have ns_server ask for the stats it wants rather than to add new stats calls. However, there isn't time to do it this way and the CPU we expend requesting and parsing the vbucket-details payload is great enough that I wanted to see how if we could quickly put together a new stats call and get some benefit in Mad Hatter.

            dfinlay Dave Finlay added a comment - Artem Stemkovski : Yes, indeed, longer term what we want to do is to have ns_server ask for the stats it wants rather than to add new stats calls. However, there isn't time to do it this way and the CPU we expend requesting and parsing the vbucket-details payload is great enough that I wanted to see how if we could quickly put together a new stats call and get some benefit in Mad Hatter.
            drigby Dave Rigby added a comment - - edited

            Artem Stemkovski / Dave Finlay. Agreed. To clarify; I'm happy with (and we can provide for Mad-Hatter) an API where:

            • ns_server sends a STATS command with the key vbucket-details and the JSON value ["high_prepared_seqno","high_seqno","topology"] (or similar - wasn't clear if you also need "state".
            • KV-Engine will send back just those 3 (or 4) keys.
            • However, for Mad-Hatter we will literally only support that specific 4 keys; probably in a fixed order (i.e. we'll just be doing dumb strcmp of the value).

            In future we can actually implement a proper parser and individual calculations for each stat; it's not feasible to do that for MH.

            drigby Dave Rigby added a comment - - edited Artem Stemkovski / Dave Finlay . Agreed. To clarify; I'm happy with (and we can provide for Mad-Hatter) an API where: ns_server sends a STATS command with the key vbucket-details and the JSON value ["high_prepared_seqno","high_seqno","topology"] (or similar - wasn't clear if you also need "state". KV-Engine will send back just those 3 (or 4) keys. However , for Mad-Hatter we will literally only support that specific 4 keys; probably in a fixed order (i.e. we'll just be doing dumb strcmp of the value). In future we can actually implement a proper parser and individual calculations for each stat; it's not feasible to do that for MH.

            Dave Rigby I think in this case we should leave it as it is currently implemented here: http://review.couchbase.org/#/c/117070/ for Mad Hatter and do the proper version of this API in Cheshire Cat

            artem Artem Stemkovski added a comment - Dave Rigby I think in this case we should leave it as it is currently implemented here: http://review.couchbase.org/#/c/117070/ for Mad Hatter and do the proper version of this API in Cheshire Cat

            Artem Stemkovski - I've gone ahead and re-raised that patch against mad-hatter at http://review.couchbase.org/#/c/117421/ . Dave Rigby Happy to implement it as you described or in the manner of the PoC, whatever is preferred.

            james.harrison James Harrison added a comment - Artem Stemkovski - I've gone ahead and re-raised that patch against mad-hatter at http://review.couchbase.org/#/c/117421/ . Dave Rigby Happy to implement it as you described or in the manner of the PoC, whatever is preferred.
            drigby Dave Rigby added a comment -

            Let's keep it simple for MH - the existing API (vbucket-durability-state) is fine.

            drigby Dave Rigby added a comment - Let's keep it simple for MH - the existing API ( vbucket-durability-state ) is fine.
            owend Daniel Owen made changes -
            Fix Version/s Mad-Hatter [ 15037 ]
            Fix Version/s Cheshire-Cat [ 15915 ]
            owend Daniel Owen made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            owend Daniel Owen made changes -
            Assignee Artem Stemkovski [ artem ] James Harrison [ james.harrison ]
            james.harrison James Harrison made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            drigby Dave Rigby made changes -
            Rank Ranked higher
            drigby Dave Rigby made changes -
            Sprint KV-Engine Mad-Hatter GA [ 910 ]
            drigby Dave Rigby made changes -
            Rank Ranked lower
            drigby Dave Rigby added a comment -

            KV-Engine patch merged. Assigning over to ns_server for their half of the change.

            drigby Dave Rigby added a comment - KV-Engine patch merged. Assigning over to ns_server for their half of the change.
            drigby Dave Rigby made changes -
            Assignee James Harrison [ james.harrison ] Artem Stemkovski [ artem ]
            drigby Dave Rigby made changes -
            Component/s ns_server [ 10019 ]
            Component/s couchbase-bucket [ 10173 ]

            Build couchbase-server-6.5.0-4769 contains kv_engine commit e85a9cd with commit message:
            MB-34378: Add vbucket-durability-state stats

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4769 contains kv_engine commit e85a9cd with commit message: MB-34378 : Add vbucket-durability-state stats
            Aliaksey Artamonau Aliaksey Artamonau (Inactive) made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]
            Fix Version/s Mad-Hatter [ 15037 ]
            Aliaksey Artamonau Aliaksey Artamonau (Inactive) made changes -
            Component/s couchbase-bucket [ 10173 ]
            Aliaksey Artamonau Aliaksey Artamonau (Inactive) made changes -
            Labels mad-hatter-candidate

            ns_server patch is merged too. Retargeted the ticket to cheshire-cat for a more generic fix.

            Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - ns_server patch is merged too. Retargeted the ticket to cheshire-cat for a more generic fix.

            Build couchbase-server-6.5.0-4777 contains ns_server commit 5f0710d with commit message:
            MB-34378 adopt new vbucket-durability-state stats call

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4777 contains ns_server commit 5f0710d with commit message: MB-34378 adopt new vbucket-durability-state stats call

            Build couchbase-server-7.0.0-1037 contains ns_server commit 5f0710d with commit message:
            MB-34378 adopt new vbucket-durability-state stats call

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1037 contains ns_server commit 5f0710d with commit message: MB-34378 adopt new vbucket-durability-state stats call
            dfinlay Dave Finlay made changes -
            Labels approved-for-mad-hatter

            Build couchbase-server-7.0.0-1040 contains kv_engine commit e85a9cd with commit message:
            MB-34378: Add vbucket-durability-state stats

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1040 contains kv_engine commit e85a9cd with commit message: MB-34378 : Add vbucket-durability-state stats
            drigby Dave Rigby made changes -
            Fix Version/s Mad-Hatter [ 15037 ]
            Fix Version/s Cheshire-Cat [ 15915 ]
            drigby Dave Rigby added a comment -

            Artem Stemkovski Can this MB be resolved now?

            drigby Dave Rigby added a comment - Artem Stemkovski Can this MB be resolved now?
            dfinlay Dave Finlay made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            ritam.sharma Ritam Sharma made changes -
            Labels approved-for-mad-hatter approved-for-mad-hatter request-dev-verify
            owend Daniel Owen made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              artem Artem Stemkovski
              Abhijeeth.Nuthan Abhijeeth Nuthan
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty