Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25690

[FTS] Sometimes partial rollback doesn't happen

    XMLWordPrintable

Details

    • Triaged
    • Unknown

    Description

      Build
      5.0.0-3490

      Testcase
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,level_compaction=False,GROUP=P1 -t fts.moving_topology_fts.MovingTopFTS.partial_rollback,cluster=D,D,F,items=5000,delete=True,del=10,GROUP=P1

      Seen sporadically.

      Steps:
      1. Create a [D(.96),D(.190),F(.66)] cluster
      2. Create a bucket and a default index on it.
      3. Load 5000 docs into it, wait for it to get indexed. Index count=5000
      4. Now stop persistence on both data nodes, say A(.96) and B(.190).
      5. Then delete 500 docs across A and B.
      6. At this point, there are only 4500 docs in memcached but 5000 in couchstore. The curr_items stat comes from couchstore so still shows 5000 but fts picks up the deletes from memc so the index count is now 4500 as you can see below.
      7. Now, kill memc on A. When it comes back immediately, persistence is on by default. More importantly, now the deletes sitting in memc are lost and dcp-backfill takes place.
      8. Then start persistence on B. Immediately hard-failover B. At this point all the replicas in A get activated and understandably there are no deletes.
      9. Now the bucket count is 5000, but index count remains 4500 for 2 mins after which the test times out.

      [2017-08-11 19:54:09,285] - [rest_client:2550] INFO - Index default_index created
      [2017-08-11 19:54:09,286] - [fts_base:3447] INFO - Validating index distribution for default_index ...
      [2017-08-11 19:54:09,414] - [fts_base:3465] INFO - Validated: Number of PIndexes = 6
      [2017-08-11 19:54:09,435] - [fts_base:3477] INFO - Validated: Every pIndex serves 171 partitions or lesser
      [2017-08-11 19:54:09,436] - [fts_base:3501] INFO - Expecting num of partitions in each node in range 853-1024
      [2017-08-11 19:54:09,437] - [fts_base:3516] INFO - Validated: Node da730f23f671bc62eaba214528bea99e houses 6 pindexes which serve 1024 partitions
      [2017-08-11 19:54:11,358] - [data_helper:295] INFO - creating direct client 172.23.105.190:11210 default
      [2017-08-11 19:54:11,542] - [data_helper:295] INFO - creating direct client 172.23.105.96:11210 default
      [2017-08-11 19:54:18,651] - [fts_base:3753] INFO - Loading phase complete!
      [2017-08-11 19:54:18,789] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 2066
      [2017-08-11 19:54:24,911] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 5000
      [2017-08-11 19:54:24,913] - [moving_topology_fts:1042] INFO - Stopping persistence on [ip:172.23.105.96 port:8091 ssh_username:root, ip:172.23.105.190 port:8091 ssh_username:root]
      [2017-08-11 19:54:24,935] - [data_helper:295] INFO - creating direct client 172.23.105.96:11210 default
      [2017-08-11 19:54:25,086] - [data_helper:295] INFO - creating direct client 172.23.105.190:11210 default
      [2017-08-11 19:54:25,244] - [fts_base:3280] INFO - Deleting keys @ C1
      [2017-08-11 19:54:25,372] - [data_helper:295] INFO - creating direct client 172.23.105.190:11210 default
      [2017-08-11 19:54:25,546] - [data_helper:295] INFO - creating direct client 172.23.105.96:11210 default
      [2017-08-11 19:54:27,532] - [fts_base:3300] INFO - Batched deletes sent to cluster(s)
      [2017-08-11 19:54:27,675] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4545
      [2017-08-11 19:54:33,802] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:54:39,946] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:54:46,084] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:54:52,220] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:54:58,371] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:04,524] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:10,663] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:16,814] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:22,953] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:29,103] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:35,243] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:41,387] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:47,532] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:53,676] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:55:59,820] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:05,965] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:12,106] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:18,266] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:24,412] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:30,566] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:36,706] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:42,746] - [fts_base:1961] INFO - Running query {"indexName": "default_index", "from": 0, "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "mutated:>0"}, "size": 10000000} on node: 172.23.106.66:
      [2017-08-11 19:56:42,765] - [moving_topology_fts:1061] INFO - Hits before rollback: 0
      [2017-08-11 19:56:42,887] - [moving_topology_fts:1068] INFO - Docs in Bucket : 5000, Docs in Index : 4500
      [2017-08-11 19:56:42,888] - [moving_topology_fts:1071] INFO - Killing memcached on 172.23.105.96
      [2017-08-11 19:56:42,893] - [remote_util:197] INFO - connecting to 172.23.105.96 with username:root password:couchbase ssh_key:
      [2017-08-11 19:56:43,265] - [remote_util:231] INFO - Connected to 172.23.105.96
      [2017-08-11 19:56:43,843] - [remote_util:2792] INFO - running command.raw on 172.23.105.96: kill -9 $(ps aux | grep 'memcached'  | awk '{print $2}')
      [2017-08-11 19:56:43,954] - [remote_util:2829] INFO - command executed successfully
      [2017-08-11 19:56:43,955] - [moving_topology_fts:1077] INFO - Starting persistence on 172.23.105.190
      [2017-08-11 19:56:43,981] - [data_helper:295] INFO - creating direct client 172.23.105.190:11210 default
      [2017-08-11 19:56:44,125] - [fts_base:2630] INFO - Starting failover for nodes:[ip:172.23.105.190 port:8091 ssh_username:root] at C1 cluster 172.23.105.96
      [2017-08-11 19:56:44,367] - [task:3364] INFO - Failing over 172.23.105.190:8091 with graceful=False
      [2017-08-11 19:56:45,188] - [rest_client:1336] INFO - fail_over node ns_1@172.23.105.190 successful
      [2017-08-11 19:56:45,189] - [task:3345] INFO - 0 seconds sleep after failover, for nodes to go pending....
      [2017-08-11 19:56:45,218] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:56:45,300] - [fts_base:3377] INFO - Docs in bucket = 2499, docs in FTS index 'default_index': 2243
      [2017-08-11 19:56:51,333] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:56:51,415] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:56:57,447] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:56:57,532] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:03,565] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:03,669] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:09,705] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:09,795] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:15,840] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:15,922] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:21,959] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:22,056] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:28,092] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:28,181] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:34,219] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:34,300] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:40,335] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:40,428] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:46,462] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:46,551] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:52,591] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:52,677] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:57:58,714] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:57:58,803] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:04,839] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:04,927] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:10,964] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:11,058] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:17,093] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:17,178] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:23,217] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:23,309] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:29,343] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:29,436] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:35,469] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:35,564] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:41,600] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:41,682] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:47,760] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:47,879] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:58:53,915] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:58:54,008] - [fts_base:3377] INFO - Docs in bucket = 5000, docs in FTS index 'default_index': 4500
      [2017-08-11 19:59:00,042] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:59:00,043] - [fts_base:1961] INFO - Running query {"indexName": "default_index", "from": 0, "fields": [], "explain": false, "ctl": {"timeout": 60000, "consistency": {"vectors": {}, "level": ""}}, "query": {"query": "mutated:>0"}, "size": 10000000} on node: 172.23.106.66:
      [2017-08-11 19:59:00,063] - [moving_topology_fts:1093] INFO - Hits after rollback: 0
      [2017-08-11 19:59:00,085] - [rest_client:1903] INFO - Node 172.23.105.190 not part of cluster inactiveFailed
      [2017-08-11 19:59:00,171] - [moving_topology_fts:1100] INFO - Docs in Bucket : 5000, Docs in Index : 4500
      FAIL
      

      Attaching cbcollect from all 3 nodes

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Sreekanth Sivasankaran Sreekanth Sivasankaran (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty