Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6714

ep-engine needs around 1 seconds to mark empty and closed checkpoint as persisted

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None

      Description

      I'm pretty sure this was not a case up until very recently. Suspect this is caused by recent XDCR changes where we enabled closing of empty checkpoints. (BTW there's no need IMHO)

      Steps to reproduce:

      • grab everything latest and enable consistent views
      • rebalance 1-2 or 1->4
      • observe how slow it is
      • in logs of ns_server you would see a lot of this:

      [rebalance:info,2012-09-25T6:54:32.463,n_0@192.168.1.2:<0.18358.0>:janitor_agent:get_replication_persistence_checkpoint_id:470]default: Doing get_replication_persistence_checkpoint_id call for vbucket 127 on n_0@192.168.1.2
      [ns_server:debug,2012-09-25T6:54:32.463,n_0@192.168.1.2:'janitor_agent-default':janitor_agent:handle_call:661]After creating new checkpoint here's what we have:

      {2,3,4}

      [ns_server:debug,2012-09-25T6:54:32.464,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.464,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.475,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.475,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.486,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.486,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.497,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.497,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.508,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.508,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.519,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.519,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.550,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.550,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.581,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.581,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.612,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.612,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.643,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.643,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.674,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.674,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.775,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.775,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.876,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.876,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.977,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:32.977,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.078,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.078,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.179,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.179,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.280,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.280,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.381,n_2@192.168.1.2:<0.12504.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.381,n_3@192.168.1.2:<0.12508.0>:janitor_agent:do_wait_checkpoint_persisted:746]Waiting persisted checkpoint 3 on 127, now 2
      [ns_server:debug,2012-09-25T6:54:33.482,n_3@192.168.1.2:'janitor_agent-default':janitor_agent:handle_info:677]Got done message from subprocess: <0.12508.0> (ok)

      P.S.: given that it'll greatly slow down testing I'd raise this to blocker, but not sure people would understand me so set this to just critical.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        alkondratenko Aleksey Kondratenko (Inactive) created issue -
        Hide
        chiyoung Chiyoung Seo added a comment -

        I understood what you mentioned above.

        I pushed a couple of changes to address this issue:

        http://review.couchbase.org/#/c/21067/
        http://review.couchbase.org/#/c/21068/

        Please reopen the bug if you still have the same issue.

        Show
        chiyoung Chiyoung Seo added a comment - I understood what you mentioned above. I pushed a couple of changes to address this issue: http://review.couchbase.org/#/c/21067/ http://review.couchbase.org/#/c/21068/ Please reopen the bug if you still have the same issue.
        chiyoung Chiyoung Seo made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #434 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/434/)
        MB-6714 Wake up the flusher after creating a checkpoint by force (Revision 63062710c9790aad0dabc9896f952de6f33cae3e)
        MB-6714 Create a new checkpoint for the non-empty open checkpoint (Revision 31c007f6d5c73d96db242425b6fc299abc3ae9cf)

        Result = SUCCESS
        Chiyoung Seo :
        Files :

        • src/ep_engine.cc

        Chiyoung Seo :
        Files :

        • src/checkpoint.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #434 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/434/ ) MB-6714 Wake up the flusher after creating a checkpoint by force (Revision 63062710c9790aad0dabc9896f952de6f33cae3e) MB-6714 Create a new checkpoint for the non-empty open checkpoint (Revision 31c007f6d5c73d96db242425b6fc299abc3ae9cf) Result = SUCCESS Chiyoung Seo : Files : src/ep_engine.cc Chiyoung Seo : Files : src/checkpoint.cc
        Show
        junyi Junyi Xie (Inactive) added a comment - http://review.couchbase.org/#/c/21086/
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #435 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/435/)
        MB-6714 Persisted chk id is included in checkpoint_creation reply (Revision 97ef3af84057703c2be262aa17aaa4704964dba0)

        Result = SUCCESS
        Chiyoung Seo :
        Files :

        • src/ep_engine.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #435 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/435/ ) MB-6714 Persisted chk id is included in checkpoint_creation reply (Revision 97ef3af84057703c2be262aa17aaa4704964dba0) Result = SUCCESS Chiyoung Seo : Files : src/ep_engine.cc
        Hide
        kzeller kzeller added a comment -

        RN: Couchbase Server had experienced a one second delay to mark empty
        and closed checkpoings on a persistence queue. Performance improvements
        made to eliminate delay.

        Show
        kzeller kzeller added a comment - RN: Couchbase Server had experienced a one second delay to mark empty and closed checkpoings on a persistence queue. Performance improvements made to eliminate delay.

          People

          • Assignee:
            chiyoung Chiyoung Seo
            Reporter:
            alkondratenko Aleksey Kondratenko (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes