Details
-
Bug
-
Resolution: Fixed
-
Major
-
CBAS DP3
-
None
-
Untriaged
-
Unknown
-
CX Sprint 68
Description
With ingestion active, if a node is restarted followed by a rebalance-in, intermittently an array out of bounds exception is observed on the new node, due to the recovery of the primary node erroneously being run on it:
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.asterix.external.util.FeedUtils.getFeedLogManager(FeedUtils.java:110) ~[asterix-external-data-0.9.3-SNAPSHOT.jar:0.9.3-SNAPSHOT]
at org.apache.asterix.external.adapter.factory.GenericAdapterFactory.createAdapter(GenericAdapterFactory.java:105) ~[asterix-external-data-0.9.3-SNAPSHOT.jar:0.9.3-SNAPSHOT]
at com.couchbase.analytics.runtime.BucketOperatorNodePushable.<init>(BucketOperatorNodePushable.java:31) ~[cbas-connector-1.0.0-SNAPSHOT.jar:1.0.0-SNAPSHOT]
the recovery task seems to not find out about the failed connect, and hangs:
"name": "RecoveryTask (Default.beersample(CouchbaseMetadataExtension))","stack": ["java.lang.Object.wait(Native Method)","java.lang.Object.wait(Object.java:502)","org.apache.asterix.external.feed.watch.AbstractSubscriber.sync(AbstractSubscriber.java:57)","com.couchbase.analytics.lang.ConnectBucketStatement.doConnect(ConnectBucketStatement.java:491)","com.couchbase.analytics.metadata.BucketEventsListener.doConnect(BucketEventsListener.java:242)","com.couchbase.analytics.metadata.BucketEventsListener.doStart(BucketEventsListener.java:220)","org.apache.asterix.app.active.RecoveryTask.doRecover(RecoveryTask.java:146)"...
this blocks any rebalances due to the lock being held by the recovery task:
"org.apache.asterix.metadata.lock.MetadataLockManager.acquireActiveEntityWriteLock(MetadataLockManager.java:139)","org.apache.asterix.app.active.ActiveNotificationHandler.suspend(ActiveNotificationHandler.java:245)","com.couchbase.analytics.control.rebalance.Rebalance.call(Rebalance.java:113)",