Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.0.0
-
Security Level: Public
-
Untriaged
-
Centos 64-bit
-
-
Unknown
Description
Build
4.0.0-4044 , RC3 build
Not seen in previous builds. May not be easily reproducible. Do not see network errors in logs. Is probably timing related.
Testcase
--------
./testrunner -i INI_FILE get-cbcollect-info=True,get-logs=False,stop-on-failure=False,get-coredumps=True,demand_encryption=1,fail_on_errors=1,GROUP=ALL -t xdcr.rebalanceXDCR.Rebalance.swap_rebalance_out_master,items=100000,rdirection=bidirection,ctopology=chain,update=C1-C2,delete=C1-C2,rebalance=C2,GROUP=P1
Steps
1. C1 [.157,.158] <--> C2[.159,.160] , replication with ssl
2. Add 100K keys on C1.default and C2.default
3. Swap rebalance .159 in C2 with .164.
4. Update and delete few keys on C1 and C2.
Updates/deletes made on .157 after step 3 are missing on C2.
The testcase is similar to MB-16061 but in this case, I see that .157 was able to detect .159 is out in step 3. Later it connects to .164 but is not able to replicate keys.
However, after the "Failed to get starting seqno for pipeline " error seen at 10:06, pipeline is not reconstructed for the next 30 mins.
On .157, I see -
|
XmemNozzle 2015-08-15T10:06:07.483-07:00 [ERROR] Received error response from memcached in target cluster.. err=NOT_MY_VBUCKET
|
XmemNozzle 2015-08-15T10:06:07.483-07:00 [INFO] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.159:11210_1 is not running, no need to repairConn
|
XmemNozzle 2015-08-15T10:06:07.483-07:00 [INFO] Part is in 5 state, Recieve did no-op
|
GenericSupervisor 2015-08-15T10:06:07.484-07:00 [INFO] Received error report : map[CheckpointMgr:Failed to get starting seqno for pipeline 25caabaee87545501e47a589b6c7ab42/default/default-179339428], but error is ignored. pipeline_state=5
|
DcpNozzle 2015-08-15T10:06:07.484-07:00 [ERROR] Raise error condition Part is stopping or already stopped, exit
|
DcpNozzle 2015-08-15T10:06:07.484-07:00 [INFO] dcp_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.157:11210_0 processData exits
|
XmemNozzle 2015-08-15T10:06:07.483-07:00 [INFO] xmem xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.159:11210_1 has stopped, exit from getMeta receiver
|
XmemNozzle 2015-08-15T10:06:07.484-07:00 [INFO] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.159:11210_1 processData_batch exits
|
:
|
:
|
XmemNozzle 2015-08-15T10:06:31.480-07:00 [INFO] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_0 done with initializeConnection.
|
XmemNozzle 2015-08-15T10:06:31.480-07:00 [INFO] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_0 took 0.000679144s to start
|
XmemNozzle 2015-08-15T10:06:31.480-07:00 [INFO] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_0 processData_sendbatch starts..........
|
XmemNozzle 2015-08-15T10:06:32.298-07:00 [ERROR] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_0 writeToClient error: use of closed network connection
|
XmemNozzle 2015-08-15T10:06:32.298-07:00 [INFO] xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_0 release pool 25caabaee87545501e47a589b6c7ab42/default/default/Couch_Xmem_172.23.121.164:11210/default
|
:
|
StatisticsManager 2015-08-15T10:36:03.443-07:00 [INFO] Stats for pipeline 25caabaee87545501e47a589b6c7ab42/default/default-929328462 {"CkptMgr": {"num_checkpoints": 28, "num_failedckpts": 1, "time_committing": {"count": 28, "max": 0, "mean": 0, "min": 0}}, "Errors": "[{\"time\":\"2015-08-15T10:06:32.620639695-07:00\",\"errMsg\":\"CheckpointMgr:Failed to get starting seqno for pipeline 25caabaee87545501e47a589b6c7ab42/default/default-932242482\"}]", "Overview": {"bandwidth_usage": 0, "changes_left": 160000, "data_replicated": 0, "dcp_datach_length": 0, "dcp_dispatch_time": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_filtered": 0, "deletion_received_from_dcp": 0, "docs_checked": 0, "docs_failed_cr_source": 0, "docs_filtered": 0, "docs_opt_repd": 0, "docs_processed": 0, "docs_received_from_dcp": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_filtered": 0, "expiry_received_from_dcp": 0, "num_checkpoints": 28, "num_failedckpts": 1, "rate_doc_checks": 0, "rate_doc_opt_repd": 0, "rate_received_from_dcp": 0, "rate_replicated": 0, "resp_wait_time": 0, "set_docs_written": 0, "set_failed_cr_source": 0, "set_filtered": 0, "set_received_from_dcp": 0, "size_rep_queue": 0, "time_committing": 0, "wtavg_docs_latency": 0, "wtavg_meta_latency": 0}, "Progress": "Pipeline is running", "Router_dcp_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.157:11210_0": {"deletion_filtered": 0, "docs_filtered": 0, "expiry_filtered": 0, "set_filtered": 0}, "Router_dcp_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.157:11210_1": {"deletion_filtered": 0, "docs_filtered": 0, "expiry_filtered": 0, "set_filtered": 0}, "Status": "Replicating", "dcp_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.157:11210_0": {"dcp_datach_length": 0, "dcp_dispatch_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "deletion_received_from_dcp": 0, "docs_received_from_dcp": 0, "expiry_received_from_dcp": 0, "set_received_from_dcp": 0}, "dcp_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.157:11210_1": {"dcp_datach_length": 0, "dcp_dispatch_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "deletion_received_from_dcp": 0, "docs_received_from_dcp": 0, "expiry_received_from_dcp": 0, "set_received_from_dcp": 0}, "xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_0": {"data_replicated": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 0, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "resp_wait_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "set_docs_written": 0, "set_failed_cr_source": 0, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "xmem_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.164:11210_1": {"data_replicated": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 0, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "resp_wait_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "set_docs_written": 0, "set_failed_cr_source": 0, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}}
|
No replication is seen for 30 mins -
|
StatisticsManager 2015-08-15T10:07:34.419-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=127500, docs_processed=0, changes_left=127500
|
StatisticsManager 2015-08-15T10:07:35.418-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=128000, docs_processed=0, changes_left=128000
|
StatisticsManager 2015-08-15T10:07:36.448-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=129000, docs_processed=0, changes_left=129000
|
StatisticsManager 2015-08-15T10:07:37.438-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=129500, docs_processed=0, changes_left=129500
|
StatisticsManager 2015-08-15T10:07:38.454-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=130000, docs_processed=0, changes_left=130000
|
StatisticsManager 2015-08-15T10:07:39.444-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=130273, docs_processed=0, changes_left=130273
|
StatisticsManager 2015-08-15T10:07:40.433-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=130734, docs_processed=0, changes_left=130734
|
StatisticsManager 2015-08-15T10:07:41.426-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=131225, docs_processed=0, changes_left=131225
|
StatisticsManager 2015-08-15T10:07:42.434-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=131738, docs_processed=0, changes_left=131738
|
:
|
StatisticsManager 2015-08-15T10:08:37.414-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=160000, docs_processed=0, changes_left=160000
|
StatisticsManager 2015-08-15T10:08:38.452-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=160000, docs_processed=0, changes_left=160000
|
StatisticsManager 2015-08-15T10:08:39.410-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=160000, docs_processed=0, changes_left=160000
|
:
|
:
|
|
StatisticsManager 2015-08-15T10:36:06.417-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=160000, docs_processed=0, changes_left=160000
|
StatisticsManager 2015-08-15T10:36:07.414-07:00 [INFO] 25caabaee87545501e47a589b6c7ab42/default/default total_docs=160000, docs_processed=0, changes_left=160000
|
DCP streams do not seem to be active since the last checkpoint error in failing to get start sequence number.
StatisticsManager 2015-08-15T10:35:33.424-07:00 [INFO] Dcp dcp_25caabaee87545501e47a589b6c7ab42/default/default_172.23.121.157:11210_0 received 0 items, sent 0 items. streams inactive: [34 224 47 120 173 203 159 209 75 79 240 9 11 107 133 161 114 130 141 195 198 244 13 25 147 191 210 221 7 164 188 10 99 230 92 96 100 124 180 227 78 122 237 50 58 238 33 40 91 189 143 162 167 207 126 148 181 41 49 60 74 151 177 208 59 113 217 43 48 76 131 178 183 62 88 127 213 19 31 106 223 250 197 56 72 160 4 6 53 97 29 252 239 112 121 179 255 2 61 187 200 201 253 87 90 98 152 168 232 235 73 137 71 83 94 156 199 245 1 109 111 205 233 63 66 157 216 219 215 248 110 175 186 5 129 132 202 103 105 182 140 163 218 225 226 234 254 52 104 172 243 15 20 69 125 144 149 185 231 82 176 190 193 242 3 55 57 117 142 44 46 51 85 153 247 145 196 236 249 28 30 192 22 39 228 171 0 17 23 35 93 102 165 27 37 18 42 80 139 204 222 211 12 24 54 70 77 128 174 246 32 68 229 14 45 158 194 118 119 134 136 170 206 89 95 101 135 8 65 150 241 251 86 154 155 169 16 38 115 123 166 214 64 67 220 21 26 36 84 116 184 212 81 108 138 146]
|
Attachments
For Gerrit Dashboard: MB-16078 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
54495,5 | MB-16078 register error handler on ckmgr earlier | master | goxdcr | Status: MERGED | +2 | +1 |
54793,2 | MB-16061, MB-16078: Update goxdcr SHA for RC5 | sherlock-4.0.0 | manifest | Status: MERGED | +2 | +1 |