Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62456

Failure while trying to read a page from disk. Cannot read closed file

    XMLWordPrintable

Details

    Description

      1. Create a 3 node provisioned cluster, a magma bucket, 5 collections and load 1BM items in each with 5B items in total.
      2. Create columnar instance with 2 nodes.
      3. Create 5 remote collection each ingesting 1B items.
      4. Ingestion couldn't complete in 8 hours timeout. Test proceeded
      5. Start an upsert kv workload on provisioned cluster.
      6. Start query workload.
      7. Create 5 new collections on the same KV collections.

        2024-06-24 09:37:29,013 | test  | INFO    | MainThread | [datasources:create_cbas_collections:212] creating remote collections on couchbase: remote_EEsA1_volCollection_0_kisct
        2024-06-24 09:39:27,647 | test  | INFO    | MainThread | [datasources:create_cbas_collections:212] creating remote collections on couchbase: remote_EEsA1_volCollection_1_fmgte
        2024-06-24 09:39:32,144 | test  | INFO    | MainThread | [datasources:create_cbas_collections:212] creating remote collections on couchbase: remote_EEsA1_volCollection_2_tytmw
        2024-06-24 09:39:39,546 | test  | INFO    | MainThread | [datasources:create_cbas_collections:212] creating remote collections on couchbase: remote_EEsA1_volCollection_3_bvvyr
        2024-06-24 09:39:48,996 | test  | INFO    | MainThread | [datasources:create_cbas_collections:212] creating remote collections on couchbase: remote_EEsA1_volCollection_4_emhbi
        

      8. Scale Columnar from 2 -> 4 -> 8 nodes. All successful.
      9. Scale down from 8 to 4 nodes
      10. The ingestion is hung during the above step although scaling operations from 8 to 4 nodes was successful.
      11. Test failed due to AV-80865 and went into teardown where it deleted the last 5 remote collections successfully.

        Test Logs

        2024-06-24 19:05:11,130 | test  | INFO    | MainThread | [CbasUtil:disconnect_link:178] Disconnect link remote_EEsA1 is SUCCESS
        2024-06-24 19:05:32,519 | test  | INFO    | MainThread | [CbasUtil:drop_collections:172] Dropping Collection remote_EEsA1_volCollection_0_kisct is SUCCESS
        2024-06-24 19:05:49,928 | test  | INFO    | MainThread | [CbasUtil:drop_collections:172] Dropping Collection remote_EEsA1_volCollection_1_fmgte is SUCCESS
        2024-06-24 19:06:07,884 | test  | INFO    | MainThread | [CbasUtil:drop_collections:172] Dropping Collection remote_EEsA1_volCollection_2_tytmw is SUCCESS
        2024-06-24 19:06:25,539 | test  | INFO    | MainThread | [CbasUtil:drop_collections:172] Dropping Collection remote_EEsA1_volCollection_3_bvvyr is SUCCESS
        2024-06-24 19:06:44,790 | test  | INFO    | MainThread | [CbasUtil:drop_collections:172] Dropping Collection remote_EEsA1_volCollection_4_emhbi is SUCCESS
        

      12. Now there are 5 collection remaining which were created initially in the beginning of the test. Tried to check the ingestion progress on them and saw:
        Failure while trying to read a page from disk org.apache.hyracks.api.exceptions.HyracksDataException: HYR0094: Cannot read closed file (/var/cb-cache/@analytics/v_iodevice_2/storage/partition_66/Default/Default/remote_EEsA1_volCollection_0_gitbq/0/remote_EEsA1_volCollection_0_gitbq/0_152_b)

      All the queries are failing due to:

      [
        {
          "code": 25000,
          "msg": "Internal error",
          "retriable": false,
          "query_from_user": "select count(*) from remote_EEsA1_volCollection_0_gitbq;"
        }
      ]
      

      Attachments

        Issue Links

          Activity

            People

              wail.alkowaileet Wail Alkowaileet (Inactive)
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty