Details
-
Bug
-
Resolution: Duplicate
-
Major
-
5.0.0
-
Triaged
-
Centos 64-bit
-
-
Yes
Description
Setup:
- 1 -> 1 uni-directional replication
- 1 Couchbase bucket
- 25M documents
First time this issue was noticed in our daily regression test (build 2990, see the linked log files):
2017-05-31T18:27:19.761-07:00 INFO GOXDCR.ReplMgr: ReplicationManager is running
|
2017-05-31T18:27:19.761-07:00 INFO GOXDCR.HttpServer: [xdcr:127.0.0.1:9998] new http server xdcr 127.0.0.1:9998 /
|
2017-05-31T18:27:19.761-07:00 INFO GOXDCR.AdminPort: http server started 127.0.0.1:9998 !
|
2017-05-31T18:27:19.761-07:00 INFO GOXDCR.HttpServer: [xdcr:127.0.0.1:9998] starting ...
|
2017-05-31T18:27:50.592-07:00 INFO GOXDCR.ReplMgr: GOMAXPROCS=4
|
2017-05-31T18:27:50.628-07:00 INFO GOXDCR.AuditSvc: Created audit service.
|
2017-05-31T18:27:50.628-07:00 INFO GOXDCR.UILogSvc: Created ui log service.
|
2017-05-31T18:27:50.630-07:00 INFO GOXDCR.ReplSpecSvc: Init cache for ReplicationSpecService...
|
2017-05-31T18:27:50.632-07:00 INFO GOXDCR.ReplSpecSvc: Cache has been initialized for ReplicationSpecService
|
2017-05-31T18:27:50.634-07:00 INFO GOXDCR.IntSettSvc: Internal settings spec not found. Using default values
|
2017-05-31T18:27:50.634-07:00 INFO GOXDCR.IntSettSvc: Internal settings for the current XCR process: &{10 30 30 5 180 1 15 6 180 10 60 <nil>}
|
2017-05-31T18:27:50.634-07:00 INFO GOXDCR.ReplMgr: pollEOF: About to start stdin polling
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.IntSettSvc: Internal settings spec not found. Using default values
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: XDCR internal settings: map[RefreshRemoteClusterRefInterval:15 CapiMaxRetryBatchUpdateDocs:6 TopologyChangeCheckInterval:10 MaxWorkersForCheckpointing:5 TimeoutCheckpointBeforeStop:180 CapiDataChanSizeMultiplier:1 CapiBatchTimeout:180 CapiWriteTimeout:10 CapiReadTimeout:60 MaxTopologyChangeCountBeforeRestart:30 MaxTopologyStableCountBeforeRestart:30]
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.GenericSupervisor: Adding child PipelineMasterSupervisor to supervisor ReplicationManagerSupervisor
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.PipelineMgr: Pipeline Manager is constucted
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: Replication manager is initialized
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.GenericSupervisor: Starting supervisor PipelineMasterSupervisor.
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.GenericSupervisor: Started supervisor PipelineMasterSupervisor.
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: Master supervisor has started
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.GenericSupervisor: Starting supervisor ReplicationManagerSupervisor.
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.GenericSupervisor: Started supervisor ReplicationManagerSupervisor.
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: initPausedReplications succeeded
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: upgradeRemoteClusterRefs started.
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: upgradeRemoteClusterRefs exited
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: globalSettingChangeListener successfully started
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.ReplicationSpecChangeListener: Started MetakvChangeListener ReplicationSpecChangeListener
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.RemoteClusterChangeListener: Started MetakvChangeListener RemoteClusterChangeListener
|
2017-05-31T18:27:50.636-07:00 INFO GOXDCR.ReplMgr: checkReplicationStatus started.
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.GlobalSettingChangeListener: Started MetakvChangeListener GlobalSettingChangeListener
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.InternalSettingChangeListener: Started MetakvChangeListener InternalSettingsChangeListener
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.AdminPort: Constructed adminport
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.ReplMgr: Admin port has been launched
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.GenericSupervisor: Adding child AdminportSupervisor to supervisor ReplicationManagerSupervisor
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.ReplMgr: ReplicationManager is running
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.ReplMgr: logMemStats started.
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.HttpServer: [xdcr:127.0.0.1:9998] new http server xdcr 127.0.0.1:9998 /
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.ReplMgr: refreshRemoteClusterRef started.
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.AdminPort: http server started 127.0.0.1:9998 !
|
2017-05-31T18:27:50.637-07:00 INFO GOXDCR.HttpServer: [xdcr:127.0.0.1:9998] starting ...
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: metakv.RunObserveChildren failed, err=unexpected EOF
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: metakv.RunObserveChildren failed, err=unexpected EOF
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: metakv.RunObserveChildren failed, err=unexpected EOF
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: Started MetakvChangeListener RemoteClusterChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: Started MetakvChangeListener ReplicationSpecChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: Started MetakvChangeListener GlobalSettingChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: metakv.RunObserveChildren failed, err=unexpected EOF
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: Started MetakvChangeListener InternalSettingsChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/replicationSpec/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/InternalSettings/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: Started MetakvChangeListener ReplicationSpecChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: Started MetakvChangeListener InternalSettingsChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/GlobalConfiguration/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: Started MetakvChangeListener RemoteClusterChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: Started MetakvChangeListener GlobalSettingChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/replicationSpec/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/InternalSettings/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: Started MetakvChangeListener ReplicationSpecChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: Started MetakvChangeListener InternalSettingsChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/GlobalConfiguration/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: Started MetakvChangeListener GlobalSettingChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: Started MetakvChangeListener RemoteClusterChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/InternalSettings/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/replicationSpec/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.InternalSettingChangeListener: Started MetakvChangeListener InternalSettingsChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.ReplicationSpecChangeListener: Started MetakvChangeListener ReplicationSpecChangeListener
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.GlobalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/GlobalConfiguration/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.389-07:00 INFO GOXDCR.RemoteClusterChangeListener: Started MetakvChangeListener RemoteClusterChangeListener
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.GlobalSettingChangeListener: Started MetakvChangeListener GlobalSettingChangeListener
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.InternalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/InternalSettings/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.InternalSettingChangeListener: Started MetakvChangeListener InternalSettingsChangeListener
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.ReplicationSpecChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/replicationSpec/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.ReplicationSpecChangeListener: Started MetakvChangeListener ReplicationSpecChangeListener
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.RemoteClusterChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.GlobalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/GlobalConfiguration/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.GlobalSettingChangeListener: Started MetakvChangeListener GlobalSettingChangeListener
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.RemoteClusterChangeListener: Started MetakvChangeListener RemoteClusterChangeListener
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.InternalSettingChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/InternalSettings/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.ReplMgr: Replication manager is exiting...
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.ReplicationSpecChangeListener: metakv.RunObserveChildren failed, err=Get http://127.0.0.1:8091/_metakv/replicationSpec/?feed=continuous: dial tcp 127.0.0.1:8091: getsockopt: connection refused
|
2017-05-31T18:28:08.390-07:00 INFO GOXDCR.ReplMgr: Replication manager is already in the processof stopping, no-op on this stop request
|
2017/05/31 18:28:08 revrpc: Got error (dial tcp 127.0.0.1:8091: getsockopt: connection refused) and will retry in 1s
|
2017-05-31T18:28:08.493-07:00 INFO GOXDCR.ReplMgr: GOMAXPROCS=4
|
2017-05-31T18:28:08.493-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=0
|
2017-05-31T18:28:08.493-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=1
|
2017-05-31T18:28:08.493-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=2
|
2017-05-31T18:28:08.493-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=3
|
2017-05-31T18:28:08.493-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=4
|
Metadata service not available. Retrying after 1s.
|
2017/05/31 18:28:09 revrpc: Got error (dial tcp 127.0.0.1:8091: getsockopt: connection refused) and will retry in 1s
|
2017-05-31T18:28:09.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=0
|
2017-05-31T18:28:09.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=1
|
2017-05-31T18:28:09.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=2
|
2017-05-31T18:28:09.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=3
|
2017-05-31T18:28:09.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=4
|
Metadata service not available. Retrying after 1s.
|
2017-05-31T18:28:10.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=0
|
2017-05-31T18:28:10.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=1
|
2017-05-31T18:28:10.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=2
|
2017-05-31T18:28:10.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=3
|
2017-05-31T18:28:10.494-07:00 ERRO GOXDCR.MetadataSvc: metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: getsockopt: connection refused, num_of_retry=4
|
Metadata service not available. Retrying after 1s.
|
Test output:
http://perf.jenkins.couchbase.com/job/triton/4240/console
|
Later the same issue occurred in our weekly tests (5 -> 4 replication, 1B items, build 3008). Test output and logs:
http://perf.jenkins.couchbase.com/job/titan/1059/console
There seems to be a trouble with 172.23.96.107.
Attachments
Issue Links
- is duplicated by
-
MB-24717 panic on updater
- Closed