Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34505

System Test : HYR0081: File is Already Mapped error seen after restarting cbas process

    XMLWordPrintable

Details

    Description

      Build : 6.0.2-2407
      Test : -test tests/integration/test_allFeatures_alice_timers.yml -scope tests/integration/scope_Xattrs_Alice.yml
      Scale : 3
      Iteration : 4th (day 3)

      In the longevity test, there is step to kill the cbas service and see how it recovers. In the 3rd iteration, we see some errors in the analytics logs on the node (172.23.99.20) on which the cbas process was killed like the following:

      2019-06-06T03:24:14.638-07:00 INFO CBAS.bootstrap.AnalyticsNCApplication [main] Starting Couchbase Analytics driver, version: {"build.date":"Fri May 31 11:25:21 UTC 2019","build.number":"2407","build.version":"6.0.2-2407"}
      2019-06-06T03:24:14.784-07:00 WARN CBAS.recovery.AbstractCheckpointManager [main] Reading checkpoint file: /data/@analytics/v_iodevice_0/txn-log/checkpoint_0
      2019-06-06T03:24:14.801-07:00 WARN CBAS.recovery.AbstractCheckpointManager [main] Reading checkpoint file: /data/@analytics/v_iodevice_0/txn-log/checkpoint_0
      2019-06-06T03:24:14.837-07:00 WARN CBAS.recovery.AbstractCheckpointManager [main] Reading checkpoint file: /data/@analytics/v_iodevice_0/txn-log/checkpoint_0
      2019-06-06T03:24:14.839-07:00 WARN CBAS.recovery.AbstractCheckpointManager [main] Reading checkpoint file: /data/@analytics/v_iodevice_0/txn-log/checkpoint_0
      2019-06-06T03:24:15.025-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_0/storage/partition_96/Default/ds1/3/ds1/110_110_b based on valid sequence 109
      2019-06-06T03:24:15.030-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_0/storage/partition_96/Default/ds1/3/ds1/110_110_f based on valid sequence 109
      2019-06-06T03:24:15.061-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_1/storage/partition_97/Default/ds1/3/ds1/110_110_b based on valid sequence 109
      2019-06-06T03:24:15.066-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_1/storage/partition_97/Default/ds1/3/ds1/110_110_f based on valid sequence 109
      2019-06-06T03:24:15.099-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_2/storage/partition_98/Default/ds1/3/ds1/109_109_b based on valid sequence 108
      2019-06-06T03:24:15.103-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_2/storage/partition_98/Default/ds1/3/ds1/109_109_f based on valid sequence 108
      2019-06-06T03:24:15.140-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_3/storage/partition_99/Default/ds1/3/ds1/110_110_b based on valid sequence 109
      2019-06-06T03:24:15.146-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_3/storage/partition_99/Default/ds1/3/ds1/110_110_f based on valid sequence 109
      2019-06-06T03:24:15.169-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_4/storage/partition_100/Default/ds1/3/idx_result1/88_88_b based on valid sequence 87
      2019-06-06T03:24:15.171-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_4/storage/partition_100/Default/ds1/3/ds1/111_111_b based on valid sequence 110
      2019-06-06T03:24:15.187-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_5/storage/partition_101/Default/ds1/3/idx_result1/88_88_b based on valid sequence 87
      2019-06-06T03:24:15.189-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_5/storage/partition_101/Default/ds1/3/ds1/111_111_b based on valid sequence 110
      2019-06-06T03:24:15.195-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_5/storage/partition_101/Default/ds1/3/ds1/111_111_f based on valid sequence 110
      2019-06-06T03:24:15.216-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_6/storage/partition_102/Default/ds1/3/ds1/110_110_b based on valid sequence 109
      2019-06-06T03:24:15.221-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_6/storage/partition_102/Default/ds1/3/ds1/110_110_f based on valid sequence 109
      2019-06-06T03:24:15.239-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_7/storage/partition_103/Default/ds1/3/ds1/109_109_b based on valid sequence 108
      2019-06-06T03:24:15.243-07:00 WARN CBAS.resource.PersistentLocalResourceRepository [main] Deleting invalid component file /data/@analytics/v_iodevice_7/storage/partition_103/Default/ds1/3/ds1/109_109_f based on valid sequence 108
      2019-06-06T03:24:16.073-07:00 WARN CBAS.work.StartTasksWork [Worker:5bbdcafc89966fefe35ed20a1c21ffc2] Failure starting a task
      org.apache.hyracks.api.exceptions.HyracksException: HYR0114: Node (5bbdcafc89966fefe35ed20a1c21ffc2) is not active
      2019-06-06T03:24:17.608-07:00 WARN CBAS.bootstrap.AnalyticsLocalRecoveryManager [Executor-9:5bbdcafc89966fefe35ed20a1c21ffc2] Removing component {"class":LSMBTreeWithBloomFilterDiskComponent", "index":{"class":"DiskBTree","file":"storage/
      partition_103/Default/ds1/3/ds1/100_108_b"}}
      2019-06-06T03:24:17.638-07:00 WARN CBAS.bootstrap.AnalyticsLocalRecoveryManager [Executor-9:5bbdcafc89966fefe35ed20a1c21ffc2] Removing component {"class":LSMBTreeWithBloomFilterDiskComponent", "index":{"class":"DiskBTree","file":"storage/
      partition_103/Default/ds1/3/ds1/0_99_b"}}
      2019-06-06T03:24:17.725-07:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:8095)-2] handleException: ASX0032: Cannot execute request, cluster is UNUSABLE: {"host":"172.23.99.20:8095","path":"/query/service","statement":"select
      sum(friends.num_friends) from (select array_length(profile.friends) as num_friends from ds3) as friends;","pretty":true,"mode":null,"clientContextID":"query_thread_124111","format":null,"timeout":"300s","maxResultReads":null,"planFormat":
      null,"expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"signature":true,"multiStatement":false}
      2019-06-06T03:24:17.726-07:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:8095)-3] handleException: ASX0032: Cannot execute request, cluster is UNUSABLE: {"host":"172.23.99.20:8095","path":"/query/service","statement":"SELECT
      name as id, result as bucketName, `type` as `Type`, array_length(profile.friends) as num_friends FROM  ds1 where duration between 3009 and 3010 and profile is not missing and array_length(profile.friends) > 5 limit 100;","pretty":true,"mo
      de":null,"clientContextID":"query_thread_124112","format":null,"timeout":"300s","maxResultReads":null,"planFormat":null,"expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"s
      ignature":true,"multiStatement":false}
      2019-06-06T03:24:24.969-07:00 WARN CBAS.metadata.MaintainDcpCallbackFactory [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:8.199856:TAID:TID:ANID:ODID:2:0:9:0:0] Previous flush didn't produce a dcp state
      2019-06-06T03:24:24.979-07:00 WARN CBAS.metadata.MaintainDcpCallbackFactory [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:8.199856:TAID:TID:ANID:ODID:2:0:9:0:0] Previous flush didn't produce a dcp state
      2019-06-06T03:24:24.980-07:00 WARN CBAS.metadata.MaintainDcpCallbackFactory [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:8.199856:TAID:TID:ANID:ODID:2:0:9:0:0] The DCP state is stored in the callback is null when a component
       is being recycled
      2019-06-06T03:49:23.517-07:00 ERRO CBAS.impls.LSMHarness [Executor-365:5bbdcafc89966fefe35ed20a1c21ffc2] FLUSH operation failed on {"class" : "LSMBTree", "dir" : "/data/@analytics/v_iodevice_1/storage/partition_97/Default/ds1/3/idx_result
      1", "memory" : [{"class":"LSMBTreeMemoryComponent", "state":"READABLE_UNWRITABLE_FLUSHING", "writers":0, "readers":1, "pendingFlushes":0, "id":"[195,195]"}, {"class":"LSMBTreeMemoryComponent", "state":"READABLE_WRITABLE", "writers":0, "re
      aders":0, "pendingFlushes":0, "id":"[196,196]"}], "disk" : 4, "num-scheduled-flushes":1, "current-memory-component":1}
      org.apache.hyracks.api.exceptions.HyracksDataException: HYR0081: File /data/@analytics/v_iodevice_1/storage/partition_97/Default/ds1/3/idx_result1/84_84_b is already mapped as /data/@analytics/v_iodevice_1/storage/partition_97/Default/ds1
      /3/idx_result1/84_84_b registered at Thu Jun 06 03:24:17 PDT 2019
      	at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:60) ~[hyracks-api.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.common.file.FileMapManager.registerFile(FileMapManager.java:82) ~[hyracks-storage-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.common.buffercache.BufferCache.createFile(BufferCache.java:817) ~[hyracks-storage-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.common.impls.AbstractTreeIndex.create(AbstractTreeIndex.java:83) ~[hyracks-storage-am-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMDiskComponent.activate(AbstractLSMDiskComponent.java:169) ~[hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.createDiskComponent(AbstractLSMIndex.java:536) ~[hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.doFlush(LSMBTree.java:277) ~[hyracks-storage-am-lsm-btree.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.flush(AbstractLSMIndex.java:850) ~[hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.doIo(LSMHarness.java:530) [hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.flush(LSMHarness.java:513) [hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.flush(LSMTreeIndexAccessor.java:122) [hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.FlushOperation.call(FlushOperation.java:38) [hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at org.apache.hyracks.storage.am.lsm.common.impls.FlushOperation.call(FlushOperation.java:29) [hyracks-storage-am-lsm-common.jar:6.0.2-2407]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      	at java.lang.Thread.run(Thread.java:834) [?:?]
      

      Not sure if this is a regression. This wasn't seen in the previous system testing runs in 6.0.0 or 6.0.1, but it could also be a race condition which wasn't triggered previously.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Not critical as the system will recover from the exception (at the cost of a process restart, so we should definitely fix it).

            till Till Westmann added a comment - Not critical as the system will recover from the exception (at the cost of a process restart, so we should definitely fix it).

            The issue has been identified and the fix is ready. We will merge the fix when 6.0.3 is open.

            murtadha.hubail Murtadha Hubail added a comment - The issue has been identified and the fix is ready. We will merge the fix when 6.0.3 is open.

            Verified in the system test run for 6.0.3-2888. Didnt encounter this issue.

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Verified in the system test run for 6.0.3-2888. Didnt encounter this issue.

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty