Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.1.2
Affects Version/s: 4.1.0
Component/s: secondary-index
Labels:
None

Triage:
Untriaged
Operating System:
Ubuntu 64-bit
Is this a Regression?:
Unknown

Description

There are GSI's combinations that cause following situation: after lots of documents upsertions, high CPU utilization remains for nodes with GSI indexes for indefinite time (observed and confirmed: ~1h 30 min) even if there are no more upsertions.

Related forum topic: https://forums.couchbase.com/t/issue-reproducible-gsi-problem-high-after-load-cpu-utilization/8374/2

How to reproduce:
1.4.1.0-EE GA (5005), 3 nodes (4CPU / 4G RAM) [4.1.2-6033 also affected]
2.All nodes have all services enabled (data,index,query)
3.Establish a cluster with "default" bucket (256Mb, full eviction, no password, 1 replica, view index replicas, flush=enable)
4. Now, establish ssh connection to all nodes, run htop, and just watch for a while, to ensure, that total observed CPU utilization load is ~ [0..5]% for each node
5. Run the following code (this will take a while ~ 12 min) or write your own:

package highcpuafterload;

import com.couchbase.client.java.Bucket;

import com.couchbase.client.java.Cluster;

import com.couchbase.client.java.CouchbaseCluster;

import com.couchbase.client.java.document.JsonDocument;

import com.couchbase.client.java.document.json.JsonObject;

import com.couchbase.client.java.env.CouchbaseEnvironment;

import com.couchbase.client.java.env.DefaultCouchbaseEnvironment;

import com.couchbase.client.java.query.N1qlQuery;

import java.util.LinkedList;

import java.util.concurrent.Phaser;

public class BombardaMaxima extends Thread {

    private final int tid;

    // configure here

    private static final int threads = 20;

    private static final int docsPerThread = 50000;

    private static final int docTTLms = 86400 * 1000;

    private static final int dumpToConsoleEachNDocs = 1000;

    private static final Phaser phaser = new Phaser(threads + 1);

    private static final CouchbaseEnvironment ce;

    private static final Cluster cluster;

    private static final String bucket = "default";

    static {

        ce = DefaultCouchbaseEnvironment.create();

        final LinkedList<String> nodes = new LinkedList();

        nodes.add("A.node");

        nodes.add("B.node");

        nodes.add("C.node");

        cluster = CouchbaseCluster.create(ce, nodes);

        final Bucket b = cluster.openBucket(bucket);

        // three indexes for query

        final String iQA = "CREATE INDEX iQA ON `default`(a, b) WHERE a is valued USING GSI";

        final String iQB = "CREATE INDEX iQB ON `default`(a, b) WHERE a is valued USING GSI";

        final String iQC = "CREATE INDEX iQC ON `default`(a, b) WHERE a is valued USING GSI";

        final String iQX = "CREATE INDEX iQX ON `default`(a, c) WHERE a is valued USING GSI";

        final String iQY = "CREATE INDEX iQY ON `default`(a, c) WHERE a is valued USING GSI";

        final String iQZ = "CREATE INDEX iQZ ON `default`(a, c) WHERE a is valued USING GSI";

        b.query(N1qlQuery.simple(iQA));

        b.query(N1qlQuery.simple(iQB));

        b.query(N1qlQuery.simple(iQC));

        b.query(N1qlQuery.simple(iQX));

        b.query(N1qlQuery.simple(iQY));

        b.query(N1qlQuery.simple(iQZ));

    public BombardaMaxima(final int tid) {

        this.tid = tid;

    public final void run() {

        try {

            Bucket b = null;

            synchronized(cluster) { b = cluster.openBucket(bucket); }

            final long stm = System.currentTimeMillis();

            final JsonObject jo = JsonObject

                    .empty()

                    .put("a", stm)

                    .put("b", stm)

                    .put("c", stm);

            for(int i = 0; i< docsPerThread; i++) {

                b.upsert(JsonDocument.create(

                        tid + ":" + System.currentTimeMillis(),

                        (int)((System.currentTimeMillis() + docTTLms) / 1000),

jo)

);

                if (i % dumpToConsoleEachNDocs == 0) System.out.println("T[" + tid + "] = " + i);

        } catch(final Exception e) {

            e.printStackTrace();

        } finally {

            phaser.arriveAndAwaitAdvance();

    public static void main(String[] args) {

        for(int i = 0; i< threads; i++) new BombardaMaxima(i).start();

        phaser.arriveAndAwaitAdvance();

        System.out.println("DONE");

6. Wait code run to complete (you can also wait 10, 15, 20 minutes more after compete, if you wish)
7. Now watch htop again: you'll see, that CPU utilization is ~ [15...40]% for all nodes.
8. Now restart one of nodes, wait for a while for "node initialization to pass", and, watch for htop once again (this node load is [0...5]% now)
9. restart second node and watch for htop once again (this node load is [0...5]% now)
10. restart last node and watch for htop once again (this node load is [0...5]% now)

Additional info (between runs below bucket is deleted and recreated + all servers are restarted):
1. No GSI = No problem. With removed (from code) GSI creation, there is no problem at all. So, the problem is definitely with GSI's.
2. There is no (at least for 1 try) problem with only one index (iQA, for example)
3. There is a strange effect with 3 indexes (iQA, iQB, iQC):
1) first run: only one node is affected with "post-load-high-cpu-utilization".
2) second run: all three nodes are affected
3) third run: no nodes affected at all
4. 6 confirmed all-nodes-affected runs with all 6 GSI (iQA, iQB, iQC, iQX, iQY, iQZ)

Also, atop reports about ~80% CPU for "indexer" process on each affected node.
Seems like "something stuck within indexer", but indexer.log has no explicit errors (log level is too low ?)

P.S. And please, think about possible port of this fix (if any) for 4.1.X branch

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

140.pprof.svg
95 kB
23/May/16 1:07 AM
73.pprof.svg
91 kB
23/May/16 1:07 AM
load-graph.png
29 kB
25/May/16 12:33 AM

Issue Links

relates to

MB-20099 Ability to gracefully shutdown/restart the indexer process/service on its own

Open

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-19714
#	Subject	Branch	Project	Status	CR	V
64393,2	MB-19714: Explicitly stop all tickers in indexer/manager.	sherlock	indexing	Status: MERGED	+2	+1

Activity

People

Assignee:: Pratap Chakravarthy (Inactive)

Reporter:: Georgy Repin

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 22/May/16 11:48 AM

Updated:: 06/Oct/16 10:01 AM

Resolved:: 18/Jul/16 11:10 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-19714: Explicitly stop all tickers in indexer/manager.: Gerrit Review:

GSI indexes: high CPU utilization after load-cycle

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty