Details

    • Technical task
    • Resolution: Fixed
    • Major
    • 7.0.2
    • Cheshire-Cat
    • analytics
    • 7.0.0-155879 (Toy build provided by Murtadha)
    • 1
    • CX Sprint 251, CX Sprint 252

    Description

      Test Setup

       

      Number of dataverses – 5
      Number of Remote links – 8
      Number of datasets – 32 (4 dataset per link)
      

      Observations –

      1. The RAM utilisation on servers are close to 100%. It caused 1 of the node to go out of memory and on another  node following log was seen -

      2021-06-16T06:25:48.433-07:00 WARN CBAS.cbas analytics driver has exited w/ error signal: killed

      2. Also after the node that went out of memory was brought back up and added back to cluster, even after successful rebalance, the analytics service is not available and following error is seen in logs -

      2021-06-16T06:51:40.032-07:00 ERRO CBAS.rebalance.Rebalance [Executor-7:ClusterController] Rebalance a00f919d6547c33bc1bf913224013132 failedjava.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [873bd3b913d3596c061e550339fce454], state: UNUSABLE)at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:614) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) [cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) [cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) [cbas-connector.jar:7.0.0-155879]at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]at java.lang.Thread.run(Unknown Source) [?:?]2021-06-16T06:51:40.033-07:00 WARN CBAS.rebalance.Rebalance [Executor-7:ClusterController] exit Rebalance a00f919d6547c33bc1bf9132240131322021-06-16T06:51:40.033-07:00 INFO CBAS.rebalance.RebalanceProgress [Executor-6:ClusterController] dataset size fetcher interrupted2021-06-16T06:51:40.168-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-5] Rebalance a00f919d6547c33bc1bf913224013132 failedjava.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [873bd3b913d3596c061e550339fce454], state: UNUSABLE)at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:614) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) ~[cbas-server.jar:7.0.0-155879]at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector.jar:7.0.0-155879]at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]at java.lang.Thread.run(Unknown Source) [?:?]2021-06-16T06:51:40.217-07:00 INFO CBAS.cbas requesting isBalanced for a00f919d6547c33bc1bf913224013132 from driver2021-06-16T06:51:40.218-07:00 INFO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-7] +post request: {"nodes":[{"nodeId":"138adf0c7056f404f6f1a4a02ec2153e","priority":1970324836974848,"opaque":{"cbas-version":"7.0.0-155879","cc-http-port":"9111","controller-id":"0","host":"172.23.110.67","ns-server-port":"8091","num-iodevices":"16","starting-partition-id":"0","svc-http-port":"8095"}},{"nodeId":"4d5cd0518014b4200d78ebd136965587","priority":1970324836974592,"opaque":{"cbas-version":"7.0.0-155879","cc-http-port":"9111","controller-id":"1","host":"172.23.110.68","ns-server-port":"8091","num-iodevices":"16","starting-partition-id":"16","svc-http-port":"8095"}},{"nodeId":"873bd3b913d3596c061e550339fce454","priority":1970324836974592,"opaque":{"cbas-version":"7.0.0-155879","cc-http-port":"9111","controller-id":"2","host":"172.23.110.69","ns-server-port":"8091","num-iodevices":"16","starting-partition-id":"32","svc-http-port":"8095"}},{"nodeId":"b484922f0e152f1e10e678be92b6b118","priority":1970324836974592,"opaque":{"cbas-version":"7.0.0-155879","cc-http-port":"9111","controller-id":"3","host":"172.23.110.70","ns-server-port":"8091","num-iodevices":"16","starting-partition-id":"48","svc-http-port":"8095"}}],"id":"a00f919d6547c33bc1bf913224013132","type":"topology-change-rebalance","ccNodeId":"138adf0c7056f404f6f1a4a02ec2153e","metadataNodeId":"138adf0c7056f404f6f1a4a02ec2153e","metadataPartition":0,"rev":8,"configVersion":1,"balanceState":"unknown","keepNodesUpdated":false,"keepNodes":["138adf0c7056f404f6f1a4a02ec2153e","4d5cd0518014b4200d78ebd136965587","873bd3b913d3596c061e550339fce454","b484922f0e152f1e10e678be92b6b118"]} 2021-06-16T06:51:40.218-07:00 INFO CBAS.rebalance.Rebalance [HttpExecutor(port:9111)-7] Topology a00f919d6547c33bc1bf913224013132 is unbalanced due to node mismatch: ours: [4d5cd0518014b4200d78ebd136965587, b484922f0e152f1e10e678be92b6b118, 138adf0c7056f404f6f1a4a02ec2153e], theirs: [873bd3b913d3596c061e550339fce454, 4d5cd0518014b4200d78ebd136965587, b484922f0e152f1e10e678be92b6b118, 138adf0c7056f404f6f1a4a02ec2153e]2021-06-16T06:51:40.219-07:00 INFO CBAS.cbas setting balanced state to unbalanced for a00f919d6547c33bc1bf9132240131322021-06-16T06:51:40.222-07:00 INFO CBAS.cbas updating balance state unbalanced for a00f919d6547c33bc1bf9132240131322021-06-16T06:52:02.674-07:00 INFO CBAS.servlet.LinkServlet [HttpExecutor(port:9111)-9] returning 503 on cluster state UNUSABLE2021-06-16T06:52:02.674-07:00 INFO CBAS.server.AbstractServlet [HttpExecutor(port:9111)-9] sendError: status=503 Service Unavailable, message=cluster state is not ACTIVE2021-06-16T06:52:02.947-07:00 INFO CBAS.servlet.LinkServlet [HttpExecutor(port:9111)-8] returning 503 on cluster state UNUSABLE2021-06-16T06:52:02.947-07:00 INFO CBAS.server.AbstractServlet [HttpExecutor(port:9111)-8] sendError: status=503 Service Unavailable, message=cluster state is not ACTIVE2021-06-16T06:52:02.962-07:00 INFO CBAS.server.QueryServiceServlet [HighPriorityHttpExecutor(port:8095)-9] handleRequest: <ud>{"host":"172.23.110.67:8091","path":"/query/service","statement":"select meta.* from (SELECT   ds.DataverseName,   ds.DataverseName || '.' || ds.DatasetName AS datasetFullyQualifiedName,   ds.DatasetName AS id,   TRUE AS isDataset,   ds.BucketName AS bucketName,   ds.ScopeName AS scopeName,   ds.CollectionName AS collectionName,   ds.BucketDataverseName as linkDataverseName,   ds.`Filter` AS `filter`,   ds.LinkName,    ds.DatasetType,    concat2(', ', (select value FieldName || ' ' || (          CASE WHEN lower(FieldType) = 'int64' THEN 'BIGINT'               ELSE upper(FieldType) END) || (          CASE WHEN NOT IsNullable AND NOT IsMissable THEN ' IS NOT UNKNOWN'               ELSE '' END)        from t.Derived.Record.Fields)) AS TypeString,  ( SELECT       idx.IndexName,       idx.SearchKey,       idx.SearchKeyType     FROM       Metadata.`Index` AS idx     WHERE idx.IsPrimary = false       AND idx.DatasetName = ds.DatasetName      AND idx.DataverseName = ds.DataverseName) AS indexes,    ds.ExternalDetails.Properties AS externalDetails FROM   Metadata.`Dataset` AS ds left join Metadata.Datatype t on  ds.DataverseName = t.DataverseName and t.DatatypeName = ds.DatatypeName WHERE   (ds.BucketName IS NOT missing OR  ds.DatasetType = 'EXTERNAL')UNION ALL SELECT   dv.DataverseName,   TRUE AS isDataverse,   ( SELECT       l.Name     FROM       Metadata.`Link` AS l     WHERE       l.DataverseName = dv.DataverseName) AS links FROM   Metadata.`Dataverse` AS dv WHERE   dv.DataverseName != 'Metadata' UNION ALL SELECT   DataverseName,   Name,   IsActive,   `Type` as LinkType,   TRUE as isLink FROM   Metadata.`Link`) meta order by meta.isDataverse desc, meta.isLink desc;","pretty":false,"mode":"immediate","clientContextID":null,"format":"CLEAN_JSON","timeout":9223372036854775807,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":true,"parseOnly":false,"readOnly":false,"maxWarnings":0,"scanConsistency":null,"scanWait":null}</ud>2021-06-16T06:52:02.965-07:00 INFO CBAS.messaging.CCMessageBroker [Executor-6:ClusterController] Received message: ExecuteStatementRequestMessage(id=27, from=138adf0c7056f404f6f1a4a02ec2153e): <ud>select meta.* from (SELECT   ds.DataverseName,   ds.DataverseName || '.' || ds.DatasetName AS datasetFullyQualifiedName,   ds.DatasetName AS id,   TRUE AS isDataset,   ds.BucketName AS bucketName,   ds.ScopeName AS scopeName,   ds.CollectionName AS collectionName,   ds.BucketDataverseName as linkDataverseName,   ds.`Filter` AS `filter`,   ds.LinkName,    ds.DatasetType,    concat2(', ', (select value FieldName || ' ' || (          CASE WHEN lower(FieldType) = 'int64' THEN 'BIGINT'               ELSE upper(FieldType) END) || (          CASE WHEN NOT IsNullable AND NOT IsMissable THEN ' IS NOT UNKNOWN'               ELSE '' END)        from t.Derived.Record.Fields)) AS TypeString,  ( SELECT       idx.IndexName,       idx.SearchKey,       idx.SearchKeyType     FROM       Metadata.`Index` AS idx     WHERE idx.IsPrimary = false       AND idx.DatasetName = ds.DatasetName      AND idx.DataverseName = ds.DataverseName) AS indexes,    ds.ExternalDetails.Properties AS externalDetails FROM   Metadata.`Dataset` AS ds left join Metadata.Datatype t on  ds.DataverseName = t.DataverseName and t.DatatypeName = ds.DatatypeName WHERE   (ds.BucketName IS NOT missing OR  ds.DatasetType = 'EXTERNAL')UNION ALL SELECT   dv.DataverseName,   TRUE AS isDataverse,   ( SELECT       l.Name     FROM       Metadata.`Link` AS l     WHERE       l.DataverseName = dv.DataverseName) AS links FROM   Metadata.`Dataverse` AS dv WHERE   dv.DataverseName != 'Metadata' UNION ALL SELECT   DataverseName,   Name,   IsActive,   `Type` as LinkType,   TRUE as isLink FROM   Metadata.`Link`) meta order by meta.isDataverse desc, meta.isLink desc;</ud>2021-06-16T06:52:02.966-07:00 INFO CBAS.messaging.NCMessageBroker [Worker:138adf0c7056f404f6f1a4a02ec2153e] Received message: ExecuteStatementResponseMessage(id=27): 0 characters2021-06-16T06:52:02.967-07:00 WARN CBAS.server.QueryServiceServlet [HighPriorityHttpExecutor(port:8095)-9] handleException: ASX0032: Cannot execute request, cluster is UNUSABLE: <ud>{"host":"172.23.110.67:8091","path":"/query/service","statement":"select meta.* from (SELECT   ds.DataverseName,   ds.DataverseName || '.' || ds.DatasetName AS datasetFullyQualifiedName,   ds.DatasetName AS id,   TRUE AS isDataset,   ds.BucketName AS bucketName,   ds.ScopeName AS scopeName,   ds.CollectionName AS collectionName,   ds.BucketDataverseName as linkDataverseName,   ds.`Filter` AS `filter`,   ds.LinkName,    ds.DatasetType,    concat2(', ', (select value FieldName || ' ' || (          CASE WHEN lower(FieldType) = 'int64' THEN 'BIGINT'               ELSE upper(FieldType) END) || (          CASE WHEN NOT IsNullable AND NOT IsMissable THEN ' IS NOT UNKNOWN'               ELSE '' END)        from t.Derived.Record.Fields)) AS TypeString,  ( SELECT       idx.IndexName,       idx.SearchKey,       idx.SearchKeyType     FROM       Metadata.`Index` AS idx     WHERE idx.IsPrimary = false       AND idx.DatasetName = ds.DatasetName      AND idx.DataverseName = ds.DataverseName) AS indexes,    ds.ExternalDetails.Properties AS externalDetails FROM   Metadata.`Dataset` AS ds left join Metadata.Datatype t on  ds.DataverseName = t.DataverseName and t.DatatypeName = ds.DatatypeName WHERE   (ds.BucketName IS NOT missing OR  ds.DatasetType = 'EXTERNAL')UNION ALL SELECT   dv.DataverseName,   TRUE AS isDataverse,   ( SELECT       l.Name     FROM       Metadata.`Link` AS l     WHERE       l.DataverseName = dv.DataverseName) AS links FROM   Metadata.`Dataverse` AS dv WHERE   dv.DataverseName != 'Metadata' UNION ALL SELECT   DataverseName,   Name,   IsActive,   `Type` as LinkType,   TRUE as isLink FROM   Metadata.`Link`) meta order by meta.isDataverse desc, meta.isLink desc;","pretty":false,"mode":"immediate","clientContextID":null,"format":"CLEAN_JSON","timeout":9223372036854775807,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":false,"parseOnly":false,"readOnly":false,"maxWarnings":0,"scanConsistency":null,"scanWait":null}</ud>2021-06-16T06:52:02.968-07:00 WARN CBAS.server.QueryServiceServlet [HighPriorityHttpExecutor(port:8095)-9] handleException: unexpected exception: <ud>{"host":"172.23.110.67:8091","path":"/query/service","statement":"select meta.* from (SELECT   ds.DataverseName,   ds.DataverseName || '.' || ds.DatasetName AS datasetFullyQualifiedName,   ds.DatasetName AS id,   TRUE AS isDataset,   ds.BucketName AS bucketName,   ds.ScopeName AS scopeName,   ds.CollectionName AS collectionName,   ds.BucketDataverseName as linkDataverseName,   ds.`Filter` AS `filter`,   ds.LinkName,    ds.DatasetType,    concat2(', ', (select value FieldName || ' ' || (          CASE WHEN lower(FieldType) = 'int64' THEN 'BIGINT'               ELSE upper(FieldType) END) || (          CASE WHEN NOT IsNullable AND NOT IsMissable THEN ' IS NOT UNKNOWN'               ELSE '' END)        from t.Derived.Record.Fields)) AS TypeString,  ( SELECT       idx.IndexName,       idx.SearchKey,       idx.SearchKeyType     FROM       Metadata.`Index` AS idx     WHERE idx.IsPrimary = false       AND idx.DatasetName = ds.DatasetName      AND idx.DataverseName = ds.DataverseName) AS indexes,    ds.ExternalDetails.Properties AS externalDetails FROM   Metadata.`Dataset` AS ds left join Metadata.Datatype t on  ds.DataverseName = t.DataverseName and t.DatatypeName = ds.DatatypeName WHERE   (ds.BucketName IS NOT missing OR  ds.DatasetType = 'EXTERNAL')UNION ALL SELECT   dv.DataverseName,   TRUE AS isDataverse,   ( SELECT       l.Name     FROM       Metadata.`Link` AS l     WHERE       l.DataverseName = dv.DataverseName) AS links FROM   Metadata.`Dataverse` AS dv WHERE   dv.DataverseName != 'Metadata' UNION ALL SELECT   DataverseName,   Name,   IsActive,   `Type` as LinkType,   TRUE as isLink FROM   Metadata.`Link`) meta order by meta.isDataverse desc, meta.isLink desc;","pretty":false,"mode":"immediate","clientContextID":null,"format":"CLEAN_JSON","timeout":9223372036854775807,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":false,"parseOnly":false,"readOnly":false,"maxWarnings":0,"scanConsistency":null,"scanWait":null}</ud>org.apache.asterix.common.exceptions.RuntimeDataException: ASX0032: Cannot execute request, cluster is UNUSABLEat org.apache.asterix.app.message.ExecuteStatementRequestMessage.getRejectionReason(ExecuteStatementRequestMessage.java:208) ~[asterix-app.jar:7.0.0-155879]at org.apache.asterix.app.message.ExecuteStatementRequestMessage.handle(ExecuteStatementRequestMessage.java:132) ~[asterix-app.jar:7.0.0-155879]at org.apache.asterix.messaging.CCMessageBroker.receivedMessage(CCMessageBroker.java:64) ~[asterix-app.jar:7.0.0-155879]at org.apache.hyracks.control.cc.work.ApplicationMessageWork.lambda$notifyMessageBroker$0(ApplicationMessageWork.java:68) ~[hyracks-control-cc.jar:7.0.0-155879]at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]at java.lang.Thread.run(Unknown Source) [?:?]2021-06-16T06:52:03.219-07:00 INFO CBAS.servlet.LinkServlet [HttpExecutor(port:9111)-10] returning 503 on cluster state UNUSABLE2021-06-16T06:52:03.219-07:00 INFO CBAS.server.AbstractServlet [HttpExecutor(port:9111)-10] sendError: status=503 Service Unavailable, message=cluster state is not ACTIVE

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            michael.blow Michael Blow
            umang.agrawal Umang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty