Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53252

[Backport MB-53057 to 7.0.5] Snapshot invariant "snapStart > previous completed snapEnd" broken during autofailover

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Summary

      During a Data node failure due to OOM-kill, GSI indexer received a snapshot marker where the snapStart was less than the snapEnd of the previous, complete Snapshot.

      This breaks invariants assumed by DCP consumers (indexer here), if the consumer is disconnected and has to reconnect + streamRequest to resume then they may be told to rollback unnecessarily.

      Steps To Reproduce

      <TBD>

      Details

      1. Data service node-d5 failed over around 2022-07-17T01:07.
      2. Index nodes started receiving StreamEnd messages around 2022-07-17T01:01:12 (around the time memcached process got OOM killed on node-d5)

        2022-07-17T01:01:12.492+07:00 [Info] TK StreamEnd MAINT_STREAM bucket-product 1012 146870309391523 1810315
        

      3. These StreamEnd messages were generated by projector, once TCP connection to memcached got closed:

        2022-07-17T01:01:12.570+07:00 [Error] DCPT[secidx:proj-bucket-product-MAINT_STREAM_TOPIC_96b58eb2e82a7cf57d51a9ae0637625d-12990437825450808209/1] doReceive(): read tcp 10.26.5.17:16919->10.26.5.17:11210: read: connection reset by peer
        

      4. Indexer attempted to re-establish DCP connection by following its DCP repair mechanisms. Initial attempts failed as memcached on node-d5 was not available.

        2022-07-17T01:06:15.078+07:00 [Warn] Slow/Hung Operation: KVSender::sendRestartVbuckets did not respond for 5m0.701959279s for projector node-d5:9999 topic MAINT_STREAM_TOPIC_2f75a165518bf54fd561760749a827a1 bucket bucket-productorder
        

      5. After node-d5 node failover, Indexer was able to re-negotiate the DCP connection for disconnected vbuckets. But this time, DCP responded that on some of the vbuckets, a rollback to a lower seqno was required.

        2022-07-17T01:11:33.746+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 687. Rollback
        (1803022,     d77d8839c0a0).
        

      6. Indexer then checked if the 2 available disk snapshots had seqno lower than/equal to what DCP had requested.

        2022-07-17T01:24:44.816+07:00 [Info] SnapshotContainer::GetOlderThanTS Returning nil as no matching snapshot found
        

      7. As both the disk snapshots didn’t satisfy this requirement, indexer decided to rollback to zero.

        2022-07-17T01:25:02.391+07:00 [Info] StorageMgr::rollbackAllToZero MAINT_STREAM bucket-product
        

      Detailed analysis of specific vBucket

      Taking vb 1012 for bucket bucket-product as example on index node node-i1:

      1. Indexer snapshots have seqno=1810315 for vb 1012 in both on-disk snapshots:

        2022-07-17T01:24:44.762+07:00 [Info] SnapshotContainer::GetOlderThanTS Comparing with snapshot SnapshotInfo: count:71747364 committed:false, snapTs = [687=1803028, 694=1797300, 754=1810066, 760=1803299, 789=1800879, 797=1804008, 798=1802954, 829=1804326,1012=1810315}
        2022-07-17T01:24:44.807+07:00 [Info] SnapshotContainer::GetOlderThanTS Comparing with snapshot SnapshotInfo: count:71349292 committed:false, snapTs =[687=1803028, 694=1797300, 754=1810066, 760=1803299, 789=1800879, 797=1804008, 798=1802954, 829=1804326,1012=1810315}
        

      2. Also indexer in-memory bookkeeping has the following DCP snapshot information for vb 1012

        {seqno:1810315 snapStart:1810045 snapEnd:1815372}
        

        2022-07-17T01:06:32.994+07:00 [Error] Timekeeper::sendRestartMsg Error Response from KV feed.feeder For Request
                Message: MsgRestartVbuckets
                StreamId: MAINT_STREAM
                Bucket: bucket-product
                SessionId: 1
                RestartTS: bucket: bucket-product, vbuckets: 113 Crc64: 0 snapType NO_SNAP -
            {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
        ...
            { 1012     8593e89e8ca3    1810315    1810045    1815372}
        

      3. One important thing to note is that indexer can only create storage snapshots if it has receivd complete DCP snapshots for all vbuckets ie. seqno==snapEnd.
        So this combination cannot be present in storage snapshots(which are used for rollback):

        {seqno:1810315 snapstart:1810045 snapend:1815372}
        

        And it is the in-memory bookkeeping state.

      4. This can be confirmed from the indexer stats. "num_nonalign_ts" tracks how many non snapshot aligned(i.e. seqno != snapEnd) timestamps were generated. It is 0 for all the buckets.

        2022-07-17T00:59:30.990+07:00 [Info] PeriodicStats = "bucket-product:num_nonalign_ts":0
        2022-07-17T01:23:31.087+07:00 [Info] PeriodicStats = "bucket-product:num_nonalign_ts":0
        

      5. One possible explanation here is that the index storage snapshot has something like this:

        { 1012 8593e89e8ca3 1810315 1810315 1810315}
        

        And then later DCP sent snapshot with:

        {snapStart:1810045 snapEnd:1815372}
        

        but no more mutations.
        This leaves indexer in-memory bookkeeping state as(which is what it is asking DCP to restart with):

        { 1012 8593e89e8ca3 1810315 1810045 1815372}
        

        This kind of snapshot generation where snapStart of the next snapshot is lower than the snapEnd of previous snapshot can lead to such a behavior.

      6. Indexer in-memory book-keeping shows that MAINT_STREAM is caught up with DCP(before rollback) and DCP snapshots are either 1 or 2 in size for all vbuckets (except 9)

        2022-07-17T01:24:44.753+07:00 [Info] Timekeeper::initiateRecovery StreamId MAINT_STREAM Bucket bucket-product SessionId 1 RestartTs bucket: bucket-product, vbuckets: 1024 Crc64: 5159790531118676693 snapType INMEM_SNAP -
            {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
            {    0     b46f8737a9ac    1815031    1815031    1815031}
            {    1     e93fec117803    1806674    1806674    1806674}
            {    2     7a12c129ca3a    1811929    1811929    1811929}
            {    3     efa0c55a52b3    1812471    1812471    1812471}
            {    4     d4307a6486a6    1808308    1808308    1808308}
            {    5     85b7d45a8923    1815052    1815052    1815052}
            {    6     14282371d3ff    1815735    1815735    1815735}
            {    7     6de2b44ffabf    1815739    1815739    1815739}
        

        And for all the 9 vbuckets which have large size snapshots, all of those did get rollback from DCP(all other vbuckets worked fine)

            {  687     d77d8839c0a0    1803028    1803022    1804857}
            {  694     e5916446b3e9    1797300    1796844    1797777}
            {  754     94678efbcc4a    1810066    1809611    1810196}
            {  760     4372be2b28da    1803299    1803297    1803595}
            {  789     b5614fa86ea1    1800879    1800853    1801123}
            {  797      ee1cd1e056e    1804008    1803946    1804893}
            {  798     f6798a8a6f9e    1802954    1802684    1806442}
            {  829     75b47a9e6803    1804326    1804245    1804705}
            { 1012     8593e89e8ca3    1810315    1810045    1815372}
        

        2022-07-17T01:11:33.746+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 687. Rollback (1803022,     d77d8839c0a0).
        2022-07-17T01:11:34.221+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 694. Rollback (1796843,     e5916446b3e9).
        2022-07-17T01:14:52.227+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 1012. Rollback (1810045,     8593e89e8ca3).
        2022-07-17T01:15:19.652+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 760. Rollback (1803296,     4372be2b28da).
        2022-07-17T01:15:19.655+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 754. Rollback (1809611,     94678efbcc4a).
        2022-07-17T01:15:29.239+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 789. Rollback (1800853,     b5614fa86ea1).
        2022-07-17T01:15:29.260+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 797. Rollback (1803945,      ee1cd1e056e).
        2022-07-17T01:15:29.261+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 798. Rollback (1802684,     f6798a8a6f9e).
        2022-07-17T01:15:37.939+07:00 [Warn] Timekeeper::handleStreamBegin StreamBegin rollback for StreamId MAINT_STREAM Bucket bucket-product vbucket 829. Rollback (1804244,     75b47a9e6803).
        

      7. After failover, indexer requested the new vb master to start the DCP stream with the following information

        {seqno:1810315 snapstart:1810045 snapend:1815372}
        

        However, the new vb master (node-d5, earlier vb-replica) was behind the indexer and asked the indexer to rollback.

        2022-07-17T01:14:52.197896+07:00 WARNING 1817: (bucket-product) DCP (Producer) eq_dcpq:secidx:proj-bucket-product-MAINT_STREAM_TOPIC_2f75a165518bf54fd561760749a827a1-1990393963162632936/0 - (vb:1012) Stream request requires rollback to seqno:1810045 because consumer ahead of producer - producer upper at 1815371. Client requested seqnos:{1810315,18446744073709551615} snapshot:{1810045,1815372} uuid:146870309391523
        

        The seqno of rollback seqno:1810045 is lower than both index disk snapshots (seqno:1810315). This lead to indexer deciding to rollback to zero.

      Attachments

        Issue Links

          Activity

            People

              hemant.rajput Hemant Rajput
              amit.kulkarni Amit Kulkarni
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty