Thanks so much Junyie.
[pk] - Will this result in the current "mutations to replicate" and "XDCR docs to replicate" to be merged into one stat called "Outbound XDCR mutations" in the UI?
[jx] - Yes, this will unify these two stats. Actually they are the same stat with different names, one is at Main Section and the other is in Outbound XDCR stats. Per your comments, I will use the same name to remove the confusion.
[pk] - Perfect, thank you.
[pk] - Can it be made clear that this is measured in KB/MB/GB? As per Ketaki's note, this is a memory size, not a number of items nor mutations in queue. It would be good to explain even further in the "hover over" description of the statistic to say that it will be reflected in the beam.smp/erl.exe memory usage.
[jx] - This is defined in Bytes. If you move your mouse over the stat on UI, you will see the text "Size of bytes of XDC replication queue". If the data is a KB, MB, GB scale, you will see KB, MB, GB on the UI. There should not be confusion.
[pk] - Yes, that will be great, thanks.
[pk]Digging in further, it was my understanding that we need nearly 2GB of "extra" RAM to support XDCR...yet it appears from Ketaki's description that the maximum memory usage is 12.8MB, can you explain the rest?
[jx] - This 12.8MB is the just user-data (docs, mutations) queued to be replicated, it is just the queue created by XDCR but not including any other overhead. XDCR lives in ns_server erlang process, per node it will create 32 replicator, each replicator will create several worker process, and other erlang processes at run-time, for which there will be some memory overhead, which could be big, but I do not have number at this time.
Fro where do you get 2GB of "extra memory"? Is it per node or per cluster?
[pk] - This was the recommendation from QE based upon some analysis we did at Concur. Would be extremely helpful to get accurate and specific sizing information, and what takes up that size in whatever form.
[pk] - Just wanted to clarify that these are requested to be displayed per-replication stream in the XDCR configuration section...not the graphed stats.
[jx] - Oh, I thought these new stats are in Outbound XDCR section, which is graph and per replication base. Why do we need a separate stat at different places?
[pk] - This has to do with how and why these stats are being consumed. When a user is looking at their cluster to determine the replication status, it will be much easier to look at all the streams together...this is much harder to do when you have to click into each bucket and look at each individual stream. It's in the same line as why we have item counts on the manage servers screen.
[pk] - Can you explain further the difference between the secs in checkpointing meausurement and the secs in replicating measurement? Will those be renamed/removed?
First, both are aggregated elapsed time from each vb replicator.
"secs in checkpointing" means how much time XDCR vb replicator is working on checkpointing.
"secs in replicating measurement" means how much time XDCR vb replicator is working on replicating the mutations.
By monitoring these two stats, we can have some idea where XDCR spent the time and what XDCR is busy working on.
For these two stats, I understand they may create some confusion at customer side. As Ketaki said, these stats are still useful for QE and performance team. If customers really dislike these stats, we can remove them. Personally I am OK with either.
[pk] - Thanks for the explanation. I would still advocate for removing them. The main reason being that they do not materially help identify any issue or behavior after the cluster has been running for an extended period of time. The up-to-the-second monitoring of these stats will show an extremely high number for both after just a few days or a week of a replication stream running...let alone multiple weeks or months. I can definitely see that they would be useful when debugging the initial stream or trying to identify an issue, but I would ask that they be moved to the log or other stat area outside of the UI.
Which leads me to another question Do we have documented already (or can you help with that) where and how to get these "other" stats regarding XDCR? Is there only a REST API to query? Are they printed into some log periodically? Could we get that detailed and written up?