Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36973

Few nodes stay in warmup state in high bucket density test

    XMLWordPrintable

Details

    • Triaged
    • Yes
    • KV-Engine Mad-Hatter GA

    Attachments

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

      Activity

        mahesh.mandhare Mahesh Mandhare (Inactive) created issue -
        owend Daniel Owen made changes -
        Field Original Value New Value
        Assignee Daniel Owen [ owend ] Ben Huddleston [ ben.huddleston ]
        owend Daniel Owen added a comment -

        seeing on 172.23.97.12

        WARNING 328 - Failed to create bucket [bucket-27]: Failed to create a thread-specific key: Resource temporarily unavailable: Resource temporarily unavailable
        

        owend Daniel Owen added a comment - seeing on 172.23.97.12 WARNING 328 - Failed to create bucket [bucket-27]: Failed to create a thread-specific key: Resource temporarily unavailable: Resource temporarily unavailable
        owend Daniel Owen made changes -
        Fix Version/s Mad-Hatter [ 15037 ]
        owend Daniel Owen made changes -
        Assignee Ben Huddleston [ ben.huddleston ] Daniel Owen [ owend ]
        drigby Dave Rigby made changes -
        Assignee Daniel Owen [ owend ] Dave Rigby [ drigby ]
        owend Daniel Owen made changes -
        Due Date 22/Nov/19
        owend Daniel Owen made changes -
        Sprint KV-Engine Mad-Hatter GA [ 910 ]
        owend Daniel Owen made changes -
        Rank Ranked higher
        drigby Dave Rigby added a comment -

        Problem is that we have hit the limit of how many pthread_key (thread local objects) we can create - as defined by PTHREAD_KEYS_MAX.

        On my local Ubuntu 18.04 box this is 1024. Inside KV-Engine we use pthread keys in a few places, but most significantly they are used in the implementation of AtomicQueue, and there is one object of this class per CouchKVStore object, which there is one instance per shard.

        Given each bucket has N shards, where N is the number of CPU cores; and in this test there are 32 buckets on a 24 core machine; that's 768 pthread_keys straight away.

        Add in other uses of pthread_key (one per Bucket for tracking the current running engine, one per Bucket for ConnMap, used within Folly libraries etc) and I can see us hitting the 1024 limit.

        drigby Dave Rigby added a comment - Problem is that we have hit the limit of how many pthread_key (thread local objects) we can create - as defined by PTHREAD_KEYS_MAX . On my local Ubuntu 18.04 box this is 1024. Inside KV-Engine we use pthread keys in a few places, but most significantly they are used in the implementation of AtomicQueue , and there is one object of this class per CouchKVStore object, which there is one instance per shard. Given each bucket has N shards, where N is the number of CPU cores; and in this test there are 32 buckets on a 24 core machine; that's 768 pthread_keys straight away. Add in other uses of pthread_key (one per Bucket for tracking the current running engine, one per Bucket for ConnMap, used within Folly libraries etc) and I can see us hitting the 1024 limit.
        drigby Dave Rigby made changes -
        Status Open [ 1 ] In Progress [ 3 ]
        drigby Dave Rigby made changes -
        Due Date 22/Nov/19 21/Nov/19
        drigby Dave Rigby made changes -
        Assignee Dave Rigby [ drigby ] Mahesh Mandhare [ mahesh.mandhare ]
        Resolution Fixed [ 1 ]
        Status In Progress [ 3 ] Resolved [ 5 ]

        Build couchbase-server-6.5.0-4860 contains kv_engine commit 117d7a9 with commit message:
        MB-36973: Don't use ThreadLocalPtr for CouchKVStore::pendingFileDeletions

        build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4860 contains kv_engine commit 117d7a9 with commit message: MB-36973 : Don't use ThreadLocalPtr for CouchKVStore::pendingFileDeletions
        owend Daniel Owen added a comment -

        Hi DF,

        It looks like Dave Rigby was a bit eager and the fix has got committed.

        I agree its certainly a critical fix that needs to go into MH - however the MB is currently lacking the "approved-for-mad-hatter" label.

        owend Daniel Owen added a comment - Hi DF , It looks like Dave Rigby was a bit eager and the fix has got committed. I agree its certainly a critical fix that needs to go into MH - however the MB is currently lacking the "approved-for-mad-hatter" label.
        owend Daniel Owen made changes -
        Triage Untriaged [ 10351 ] Triaged [ 10350 ]
        dfinlay Dave Finlay added a comment -

        At this point in the release, let's try and get approval first.

        Yes, this certainly needs to go in. Approved.

        dfinlay Dave Finlay added a comment - At this point in the release, let's try and get approval first. Yes, this certainly needs to go in. Approved.
        dfinlay Dave Finlay made changes -
        Labels high-bucket-density approved-for-mad-hatter high-bucket-density

        Build couchbase-server-7.0.0-1065 contains kv_engine commit 117d7a9 with commit message:
        MB-36973: Don't use ThreadLocalPtr for CouchKVStore::pendingFileDeletions

        build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-1065 contains kv_engine commit 117d7a9 with commit message: MB-36973 : Don't use ThreadLocalPtr for CouchKVStore::pendingFileDeletions

        Build 6.5.0-4926

        Verified that after creating 32 buckets, kv nodes are not staying in warmup state.

        Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/339

        mahesh.mandhare Mahesh Mandhare (Inactive) added a comment - Build 6.5.0-4926 Verified that after creating 32 buckets, kv nodes are not staying in warmup state. Job-  http://perf.jenkins.couchbase.com/job/arke-multi-bucket/339
        mahesh.mandhare Mahesh Mandhare (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

        People

          mahesh.mandhare Mahesh Mandhare (Inactive)
          mahesh.mandhare Mahesh Mandhare (Inactive)
          Votes:
          0 Vote for this issue
          Watchers:
          4 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved:

            Gerrit Reviews

              There are no open Gerrit changes

              PagerDuty