Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43055

ItemPager available flag not reset if run() early exits

    XMLWordPrintable

    Details

    • Triage:
      Triaged
    • Story Points:
      1
    • Is this a Regression?:
      Yes
    • Sprint:
      KV-Engine 2021-Jan

      Description

      There is a bug which, if triggered, will stop a bucket from successfully evicting ever again (until restart/bucket deletion).


      Bug introduced in http://review.couchbase.org/c/kv_engine/+/133197 (6.6.1 pre-GA).

      item_pager.cc,ItemPager::run

      112
          if (((current > upper) || doEvict || wasNotified) &&
      113
              (*available).compare_exchange_strong(inverse, false)) {
      114
              if (kvBucket->getItemEvictionPolicy() == EvictionPolicy::Value) {
      115
                  doEvict = true;
      116
              }
      117
       
      118
              ++stats.pagerRuns;
      119
       
      120
              if (current <= lower) {
      121
                  // early exit - no need to run a paging visitor
      122
                  return true;
      123
              }
      

      The above change introduced the highlighted early exit; in the event that the item pager is notified to run, but the memory usage has dropped below the low watermark, don't evict. However, this is not safe to do -

      item_pager.cc,ItemPager::run

      112
          if (((current > upper) || doEvict || wasNotified) &&
      113
              (*available).compare_exchange_strong(inverse, false)) {
      

      Available is an atomic bool used to track if an item pager is currently running. It has been set to false here, but is only reset once the created PagingVisitor is completed. If the visitor creation is skipped, the flag is never reset, so no further item pager runs will occur.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            james.harrison James Harrison created issue -
            james.harrison James Harrison made changes -
            Field Original Value New Value
            Link This issue is triggering CBSE-9293 [ CBSE-9293 ]
            james.harrison James Harrison made changes -
            Description A bug can potentially stop the {{ItemPager}} from running, halting eviction for a bucket until restart/bucket deletion.

            ----

            Bug introduced in http://review.couchbase.org/c/kv_engine/+/133197

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=122}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
                    if (kvBucket->getItemEvictionPolicy() == EvictionPolicy::Value) {
                        doEvict = true;
                    }

                    ++stats.pagerRuns;

                    if (current <= lower) {
                        // early exit - no need to run a paging visitor
                        return true;
                    }
            {noformat}

            The above change introduced the highlighted early exit; in the event that the item pager is notified to run, but the memory usage has dropped below the low watermark, don't evict. However, this is not safe to do -

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=113}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
            {noformat}

            Available is an atomic bool used to track if an item pager is currently running. It has been set to false _here_, but is only reset once the created {{PagingVisitor}} is completed. If the visitor creation is skipped, the flag is never reset, so no further item pager runs will occur.

            There is a bug which, if triggered, will stop a bucket from successfully evicting ever again (until restart/bucket deletion).

            ----

            Bug introduced in http://review.couchbase.org/c/kv_engine/+/133197

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=122}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
                    if (kvBucket->getItemEvictionPolicy() == EvictionPolicy::Value) {
                        doEvict = true;
                    }

                    ++stats.pagerRuns;

                    if (current <= lower) {
                        // early exit - no need to run a paging visitor
                        return true;
                    }
            {noformat}

            The above change introduced the highlighted early exit; in the event that the item pager is notified to run, but the memory usage has dropped below the low watermark, don't evict. However, this is not safe to do -

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=113}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
            {noformat}

            Available is an atomic bool used to track if an item pager is currently running. It has been set to false _here_, but is only reset once the created {{PagingVisitor}} is completed. If the visitor creation is skipped, the flag is never reset, so no further item pager runs will occur.

            owend Daniel Owen made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            owend Daniel Owen made changes -
            Fix Version/s 6.6.1 [ 17002 ]
            owend Daniel Owen made changes -
            Labels approved-for-6.6.1
            owend Daniel Owen made changes -
            Link This issue blocks MB-40528 [ MB-40528 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9203 contains kv_engine commit ac69da7 with commit message:
            MB-43055: Ensure ItemPager available is not left set to false

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9203 contains kv_engine commit ac69da7 with commit message: MB-43055 : Ensure ItemPager available is not left set to false
            drigby Dave Rigby made changes -
            Affects Version/s Cheshire-Cat [ 15915 ]
            drigby Dave Rigby made changes -
            Is this a Regression? Unknown [ 10452 ] Yes [ 10450 ]
            drigby Dave Rigby made changes -
            Description There is a bug which, if triggered, will stop a bucket from successfully evicting ever again (until restart/bucket deletion).

            ----

            Bug introduced in http://review.couchbase.org/c/kv_engine/+/133197

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=122}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
                    if (kvBucket->getItemEvictionPolicy() == EvictionPolicy::Value) {
                        doEvict = true;
                    }

                    ++stats.pagerRuns;

                    if (current <= lower) {
                        // early exit - no need to run a paging visitor
                        return true;
                    }
            {noformat}

            The above change introduced the highlighted early exit; in the event that the item pager is notified to run, but the memory usage has dropped below the low watermark, don't evict. However, this is not safe to do -

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=113}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
            {noformat}

            Available is an atomic bool used to track if an item pager is currently running. It has been set to false _here_, but is only reset once the created {{PagingVisitor}} is completed. If the visitor creation is skipped, the flag is never reset, so no further item pager runs will occur.

            There is a bug which, if triggered, will stop a bucket from successfully evicting ever again (until restart/bucket deletion).

            ----

            Bug introduced in http://review.couchbase.org/c/kv_engine/+/133197 (6.6.1 pre-GA).

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=122}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
                    if (kvBucket->getItemEvictionPolicy() == EvictionPolicy::Value) {
                        doEvict = true;
                    }

                    ++stats.pagerRuns;

                    if (current <= lower) {
                        // early exit - no need to run a paging visitor
                        return true;
                    }
            {noformat}

            The above change introduced the highlighted early exit; in the event that the item pager is notified to run, but the memory usage has dropped below the low watermark, don't evict. However, this is not safe to do -

            {noformat:title=item_pager.cc,ItemPager::run|linenumbers|firstline=112|highlight=113}
                if (((current > upper) || doEvict || wasNotified) &&
                    (*available).compare_exchange_strong(inverse, false)) {
            {noformat}

            Available is an atomic bool used to track if an item pager is currently running. It has been set to false _here_, but is only reset once the created {{PagingVisitor}} is completed. If the visitor creation is skipped, the flag is never reset, so no further item pager runs will occur.

            drigby Dave Rigby made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            owend Daniel Owen made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            owend Daniel Owen made changes -
            Assignee James Harrison [ james.harrison ] Daniel Owen [ owend ]
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            Validated the fix using 6.6.1-9203 build.

            Functional steps followed to validate the fix:

            1. Load till low_wm
            2. Make non_io_threads=0
            3. Load few more docs and so that we do exceed the high_wm,
               this schedules the item pager
            4. Delete few docs to go below low_wm
            5. Make non_io_threads=default. Now the item pager tries to run,
               but finds mem_used < low_wat so exits without paging anything,
               triggering the bug
            6. Load docs to cross high_wm
            7. Confirm that the item pager never runs successfully,
               even though the memory usage is back above the high watermark

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Validated the fix using 6.6.1-9203 build. Functional steps followed to validate the fix: 1. Load till low_wm 2. Make non_io_threads=0 3. Load few more docs and so that we do exceed the high_wm, this schedules the item pager 4. Delete few docs to go below low_wm 5. Make non_io_threads=default. Now the item pager tries to run, but finds mem_used < low_wat so exits without paging anything, triggering the bug 6. Load docs to cross high_wm 7. Confirm that the item pager never runs successfully, even though the memory usage is back above the high watermark
            drigby Dave Rigby made changes -
            Link This issue causes CBSE-9345 [ CBSE-9345 ]
            drigby Dave Rigby made changes -
            Affects Version/s 7.0.0-Beta1 [ 17286 ]
            drigby Dave Rigby made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]
            shivani.gupta Shivani Gupta made changes -
            Link This issue blocks CBSE-9345 [ CBSE-9345 ]
            drigby Dave Rigby made changes -
            Affects Version/s 6.6.1 [ 17002 ]
            drigby Dave Rigby made changes -
            Link This issue blocks CBSE-9345 [ CBSE-9345 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-4084 contains kv_engine commit ac69da7 with commit message:
            MB-43055: Ensure ItemPager available is not left set to false

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4084 contains kv_engine commit ac69da7 with commit message: MB-43055 : Ensure ItemPager available is not left set to false
            james.harrison James Harrison made changes -
            Link This issue is caused by MB-40531 [ MB-40531 ]
            Hide
            james.harrison James Harrison added a comment -

            With MB-40531 being backported to 6.5.2 and 6.0.5, this MB now also affects those branches

            Show
            james.harrison James Harrison added a comment - With MB-40531 being backported to 6.5.2 and 6.0.5, this MB now also affects those branches
            james.harrison James Harrison made changes -
            Assignee Daniel Owen [ owend ] James Harrison [ james.harrison ]
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]
            owend Daniel Owen made changes -
            Fix Version/s 6.5.2 [ 17223 ]
            Fix Version/s 6.0.5 [ 16729 ]
            james.harrison James Harrison made changes -
            Affects Version/s 6.5.2 [ 17223 ]
            Affects Version/s 6.0.5 [ 16729 ]
            drigby Dave Rigby made changes -
            Affects Version/s 6.5.2 [ 17223 ]
            Affects Version/s 6.0.5 [ 16729 ]
            Hide
            drigby Dave Rigby added a comment -

            Removing 6.0.5 and 6.5.2 from the affectsVersions - the GA of 6.0.5 & 6.5.2 won't be affected by this problem (it's only the intermediate pre-release builds of 6.0.5); as such it would be misleading for people to search for bugs which 6.0.5 suffers from and see this MB.

            Show
            drigby Dave Rigby added a comment - Removing 6.0.5 and 6.5.2 from the affectsVersions - the GA of 6.0.5 & 6.5.2 won't be affected by this problem (it's only the intermediate pre-release builds of 6.0.5); as such it would be misleading for people to search for bugs which 6.0.5 suffers from and see this MB.
            wayne Wayne Siu made changes -
            Link This issue blocks MB-40511 [ MB-40511 ]
            wayne Wayne Siu made changes -
            Labels approved-for-6.6.1 approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1
            Hide
            james.harrison James Harrison added a comment -

            Good point Dave Rigby - thanks.

            Show
            james.harrison James Harrison added a comment - Good point Dave Rigby - thanks.
            wayne Wayne Siu made changes -
            Link This issue blocks MB-42583 [ MB-42583 ]
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.2-6612 contains kv_engine commit ec21fca with commit message:
            MB-43055: [BP] Ensure ItemPager available is not left set to false

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.2-6612 contains kv_engine commit ec21fca with commit message: MB-43055 : [BP] Ensure ItemPager available is not left set to false
            owend Daniel Owen made changes -
            Sprint KV-Engine 2021-Jan [ 1383 ]
            owend Daniel Owen made changes -
            Rank Ranked higher
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.0.5-3340 contains kv_engine commit 0df2087 with commit message:
            MB-43055: [BP] Ensure ItemPager available is not left set to false

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.5-3340 contains kv_engine commit 0df2087 with commit message: MB-43055 : [BP] Ensure ItemPager available is not left set to false
            drigby Dave Rigby made changes -
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            Validated the fix against 6.5.2 build:6614.

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Validated the fix against 6.5.2 build:6614.
            wayne Wayne Siu made changes -
            Assignee James Harrison [ james.harrison ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Test fails on 6.0.5-3340 build due to reason "ItemPager not run with lower_wm levels" Test run link: http://qa.sc.couchbase.com/job/temp_centos7_4node_jython/1556/console cbcollect: https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB_43055_6.0.5_3340/collectinfo-2021-01-09T101055-ns_1%40127.0.0.1.zip   Same test passes on 6.5.2-6614. Cbcollect: https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB_43055_6.5.2_6614/collectinfo-2021-01-09T102153-ns_1%40cb.local.zip Test logs: http://qa.sc.couchbase.com/job/temp_centos7_4node_jython/1557/console
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Assignee Ashwin Govindarajulu [ ashwin.govindarajulu ] James Harrison [ james.harrison ]
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            Hide
            james.harrison James Harrison added a comment -

            The 6.0.5 behaviour is correct despite failing the test, ep_num_pager_runs should not increase while mem_used < low_wm, confirmed locally and in unit tests.

            It appears 6.5.2 passes the test with the incorrect expectation on ep_num_pager_runs due to unrelated allocations raising mem_used above low_wm again at the time the item pager runs - unfortunately this situation does not test the fix in this MB.

            A test change has been discussed with Ashwin Govindarajulu that should lead to consistent results across both versions. No further KV code changes should be necessary.

            Show
            james.harrison James Harrison added a comment - The 6.0.5 behaviour is correct despite failing the test, ep_num_pager_runs should not increase while mem_used < low_wm , confirmed locally and in unit tests. It appears 6.5.2 passes the test with the incorrect expectation on ep_num_pager_runs due to unrelated allocations raising mem_used above low_wm again at the time the item pager runs - unfortunately this situation does not test the fix in this MB. A test change has been discussed with Ashwin Govindarajulu that should lead to consistent results across both versions. No further KV code changes should be necessary.
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            As discussed with James Harrison updated the test case.

            Validated the test patch - http://review.couchbase.org/c/TAF/+/143266 and below are the test results,

            Test failed on "6.6.1-9202-enterprise" as expected

            Test passed on following build:

            6.0.5-3340-enterprise

            6.6.1-9203-enterprise

            6.5.2-6614-enterprise

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - As discussed with James Harrison updated the test case. Validated the test patch - http://review.couchbase.org/c/TAF/+/143266 and below are the test results, Test failed on "6.6.1-9202-enterprise" as expected Test passed on following build: 6.0.5-3340-enterprise 6.6.1-9203-enterprise 6.5.2-6614-enterprise
            Hide
            ashwin.govindarajulu Ashwin Govindarajulu added a comment -

            Updated test passes on latest builds.

            Closing the ticket.

            Show
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Updated test passes on latest builds. Closing the ticket.
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Assignee James Harrison [ james.harrison ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Closed [ 6 ]
            arunkumar Arunkumar Senthilnathan made changes -
            Labels approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1 approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1 releasenote
            drigby Dave Rigby made changes -
            Labels approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1 releasenote approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.2-9429 contains kv_engine commit ec21fca with commit message:
            MB-43055: [BP] Ensure ItemPager available is not left set to false

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.2-9429 contains kv_engine commit ec21fca with commit message: MB-43055 : [BP] Ensure ItemPager available is not left set to false
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-4258 contains kv_engine commit ec21fca with commit message:
            MB-43055: [BP] Ensure ItemPager available is not left set to false

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4258 contains kv_engine commit ec21fca with commit message: MB-43055 : [BP] Ensure ItemPager available is not left set to false
            arunkumar Arunkumar Senthilnathan made changes -
            Labels approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1 approved-for-6.0.5 approved-for-6.5.2 approved-for-6.6.1 releasenote
            lynn.straus Lynn Straus made changes -
            Fix Version/s 7.0.0 [ 17233 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s Cheshire-Cat [ 15915 ]

              People

              Assignee:
              ashwin.govindarajulu Ashwin Govindarajulu
              Reporter:
              james.harrison James Harrison
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  PagerDuty