Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7597

stats.numRemainingBgJobs in Ep-engine code is not updated correctly

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
    • Environment:
      Linux

      Description

      With 5 buckets on the same server node, even without any client workload, the memcached runs 50% of CPU means making 2 CPUs busy in 4 CPU box.
      While running gdb, it shows that stats.numRemainingBgJobs is not updated correctly.

      Here is the snapshot of the perf output:
      19.88% memcached ep.so [.] VBucketMap::getBucket(unsigned short) const
      12.31% memcached ep.so [.] BgFetcher::run(SingleThreadedRCPtr<Task>&)
      11.81% memcached libpthread-2.12.so [.] pthread_mutex_lock
      9.44% memcached ep.so [.] SpinLock::acquire()
      6.60% memcached libpthread-2.12.so [.] pthread_mutex_unlock
      6.39% memcached ep.so [.] VBucket::getBGFetchItems(std::tr1::unordered_map<unsigned long, std::list<VBucketBGFetchItem*, std::allocator<VBucketBGFetchItem*> >, std::tr1::hash<u
      4.59% memcached ep.so [.] Mutex::release()
      2.15% memcached ep.so [.] SpinLock::release()
      1.80% memcached ep.so [.] Dispatcher::moveReadyTasks(timeval const&)
      1.73% memcached ep.so [.] Mutex::acquire()
      1.49% memcached ep.so [.] SpinLock::~SpinLock()

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        xiaoqin Xiaoqin Ma (Inactive) added a comment -

        There are bugs I found:
        One is about using the items2fetch.size. When we delete an element in items2fetch, the size changes. The logic in clearItems() is wrong:
        #1 0x00007f9e35eb6e0b in BgFetcher::clearItems (this=0x7f9e2003ae50, vbId=35) at src/bgfetcher.cc:71
        71 vb_bgfetch_queue_t::iterator itr = items2fetch.begin();
        (gdb) p items2fetch.size()
        $1 = 1
        (gdb) c
        Continuing.

        Breakpoint 3, BgFetcher::clearItems (this=0x7f9e2003ae50, vbId=35) at src/bgfetcher.cc:85
        85 delete *dItr;
        (gdb) n
        86 assert(items2fetch_size == items2fetch.size());
        (gdb) p items2fetch.size()
        $2 = 1

        The other one is the step to update numRemainingBgJobs. There is a condition that, the bgfetcher can run in between the item is just put on the bgfetcher queue, but the numRemainingBgJobs has not been updated. So the value can reach to -1 which is max value in the linux box I tested. This could be adjusted by later ++ operation on linux the version I am testing. But it doesn't always guaranteed across all OS.

        Show
        xiaoqin Xiaoqin Ma (Inactive) added a comment - There are bugs I found: One is about using the items2fetch.size. When we delete an element in items2fetch, the size changes. The logic in clearItems() is wrong: #1 0x00007f9e35eb6e0b in BgFetcher::clearItems (this=0x7f9e2003ae50, vbId=35) at src/bgfetcher.cc:71 71 vb_bgfetch_queue_t::iterator itr = items2fetch.begin(); (gdb) p items2fetch.size() $1 = 1 (gdb) c Continuing. Breakpoint 3, BgFetcher::clearItems (this=0x7f9e2003ae50, vbId=35) at src/bgfetcher.cc:85 85 delete *dItr; (gdb) n 86 assert(items2fetch_size == items2fetch.size()); (gdb) p items2fetch.size() $2 = 1 The other one is the step to update numRemainingBgJobs. There is a condition that, the bgfetcher can run in between the item is just put on the bgfetcher queue, but the numRemainingBgJobs has not been updated. So the value can reach to -1 which is max value in the linux box I tested. This could be adjusted by later ++ operation on linux the version I am testing. But it doesn't always guaranteed across all OS.
        Hide
        jin Jin Lim added a comment -

        The fix provided by Xiaoqin, http://review.couchbase.org/#/c/24430/, is under review before merging to 2.0.1. Not a hot fix candidate for 2.0.0.

        Show
        jin Jin Lim added a comment - The fix provided by Xiaoqin, http://review.couchbase.org/#/c/24430/ , is under review before merging to 2.0.1. Not a hot fix candidate for 2.0.0.
        Hide
        jin Jin Lim added a comment -

        regression identified from the fix that was previously merged for this bug (MB-7729 - 100% CPU consumption). reopen this until the regression gets addressed.

        Show
        jin Jin Lim added a comment - regression identified from the fix that was previously merged for this bug ( MB-7729 - 100% CPU consumption). reopen this until the regression gets addressed.
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #477 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/477/)
        MB-7597: correct the logic of updating bgfetcher global variable (Revision f6b583f3760cc1e7df85b5bf3abbc2e016a270fc)
        MB-7597: adjust the changes for high CPU usage. (Revision eee2e9564ef844cf8cc435911d9e80af0dece244)

        Result = SUCCESS
        xiaoqin :
        Files :

        • src/bgfetcher.cc
        • src/vbucket.cc
        • src/bgfetcher.hh
        • src/vbucket.hh

        xiaoqin :
        Files :

        • src/bgfetcher.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #477 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/477/ ) MB-7597 : correct the logic of updating bgfetcher global variable (Revision f6b583f3760cc1e7df85b5bf3abbc2e016a270fc) MB-7597 : adjust the changes for high CPU usage. (Revision eee2e9564ef844cf8cc435911d9e80af0dece244) Result = SUCCESS xiaoqin : Files : src/bgfetcher.cc src/vbucket.cc src/bgfetcher.hh src/vbucket.hh xiaoqin : Files : src/bgfetcher.cc
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        backed out from 2.0.1 release

        Show
        farshid Farshid Ghods (Inactive) added a comment - backed out from 2.0.1 release
        Hide
        mikew Mike Wiederhold added a comment -

        Duplicate of MB-7729 which is the 100% CPU issue.

        Show
        mikew Mike Wiederhold added a comment - Duplicate of MB-7729 which is the 100% CPU issue.
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #481 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/481/)
        Revert "MB-7597: adjust the changes for high CPU usage." (Revision ae946eeefa2b111d35b21ac93ed14a2a17732be8)
        Revert "MB-7597: correct the logic of updating bgfetcher global variable" (Revision dff392d62f86a70b19807a110754d0b7a7d4b62f)

        Result = SUCCESS
        Mike Wiederhold :
        Files :

        • src/bgfetcher.cc

        Mike Wiederhold :
        Files :

        • src/vbucket.hh
        • src/vbucket.cc
        • src/bgfetcher.hh
        • src/bgfetcher.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #481 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/481/ ) Revert " MB-7597 : adjust the changes for high CPU usage." (Revision ae946eeefa2b111d35b21ac93ed14a2a17732be8) Revert " MB-7597 : correct the logic of updating bgfetcher global variable" (Revision dff392d62f86a70b19807a110754d0b7a7d4b62f) Result = SUCCESS Mike Wiederhold : Files : src/bgfetcher.cc Mike Wiederhold : Files : src/vbucket.hh src/vbucket.cc src/bgfetcher.hh src/bgfetcher.cc
        Hide
        maria Maria McDuff (Inactive) added a comment -

        closing as dupes.

        Show
        maria Maria McDuff (Inactive) added a comment - closing as dupes.

          People

          • Assignee:
            jin Jin Lim
            Reporter:
            xiaoqin Xiaoqin Ma (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 16h
              16h
              Remaining:
              Remaining Estimate - 16h
              16h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Gerrit Reviews

                There are no open Gerrit changes