1. scan timeout only happens on the replica index for sg_access_1(Replica 1) on 172.23.100.192. All scans work fine on 172.23.100.193.
2. The retry of the scan after timeout goes to 172.23.100.193 and is successful.
3. Comparing the last PeriodicStats of the index(where retry succeeds) and its replica(where the scan timeouts), the last stat before timeout is at 2019-12-23T13:27:27.
stats |
sg_access_1 172.23.100.193 |
sg_access_1(Replica 1) 172.23.100.192 |
num_requests |
988 |
1013 |
num_completed_requests |
988 |
1012 |
num_scan_timeouts |
0 |
0 |
num_open_snapshots |
1 |
1 |
num_snapshots |
1 |
1 |
num_snapshot_waiters |
0 |
1 |
num_docs_pending |
0 |
0 |
num_docs_queued |
0 |
0 |
items_count |
0 |
0 |
mutation_queue_size |
0 |
0 |
flush_queue_size |
0 |
0 |
num_docs_indexed |
2102870 |
2102878 |
num_docs_processed |
2103107 |
2103107 |
num_flush_queued |
2102870 |
2102878 |
num_items_flushed |
0 |
0 |
a. items_count is 0.
b. Nothing is pending/queued to be indexed.
c. Stats are very similar. The difference of 8 for num_docs_indexed/num_flush_queued can be attributed to dedup.
The next stats after the timeout on .93 and successful retry on .92 at 2019-12-23T13:28:27
stats |
sg_access_1 172.23.100.193 |
sg_access_1(Replica 1) 172.23.100.192 |
num_requests |
989 |
1013 |
num_completed_requests |
989 |
1013 |
num_scan_timeouts |
0 |
2 |
num_open_snapshots |
1 |
1 |
num_snapshots |
1 |
1 |
num_snapshot_waiters |
0 |
1 |
num_docs_pending |
0 |
0 |
num_docs_queued |
0 |
0 |
items_count |
0 |
0 |
mutation_queue_size |
0 |
0 |
flush_queue_size |
0 |
0 |
num_docs_indexed |
2102870 |
2102878 |
num_docs_processed |
2103107 |
2103107 |
num_flush_queued |
2102870 |
2102878 |
num_items_flushed |
0 |
0 |
a. There is nothing more indexed(no new snapshot created) when the scan succeeds on 172.23.100.193.
b. num_requests/num_completed requests increments by 1 on 172.23.100.193 recording the successful retry.
4. Indexer is using session_consistency_strict for the scan. This needs to be looked into further as there doesn't seem to be any rollback on KV. There could be a potential issue here.
Sharath Sulochana, I didn't find crash/panic in the indexer logs of any of the four nodes. Please provide more details about what's failing.