Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53665

index creation time can be very long after creating many collections + indexes

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Major
    • None
    • 7.0.4, 7.1.1
    • secondary-index
    • None
    • 1

    Description

      Sync Gateway runs integration testing against Couchbase Server. Test setup requires an empty collection with Sync Gateway's 6 indexes created and ready. The test suite runs ~1k tests.

      We’re not able to run the full test suite without encountering index creation problems - after running [n] tests, we find that creation of indexes no longer completes in <1 minute. We consider index creation to be BUILD INDEX + time to wait for index availability. After this point in time, we can’t create any new indexes or run queries.

      The lack of availability errors look like and can last a long time (1+ minutes):

      planning failure | {"statement":"SELECT sgQueryKeyspaceAlias._sync.access.foo as val FROM `testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` AS sgQueryKeyspaceAlias USE INDEX (sg_access_1) WHERE ANY op in OBJECT_PAIRS(sgQueryKeyspaceAlias._sync.access) SATISFIES op.name = 'foo' end LIMIT 1","client_context_id":"7d47266e-a5aa-4ec4-ba98-70388b9620bd","errors":[{"code":4000,"message":"No index available on keyspace `default`:`testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` that matches your query. Use CREATE PRIMARY INDEX ON `default`:`testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` to create a primary index, or check that your expected index is online."}],"http_status_code":404}
      

      I’ve created a standalone go application that reproduces this issue. It’s creating the Sync Gateway indexes, but without the rest of SG complexity
      https://github.com/torcolvin/indexcreator contains this replication. Running on:

      8GB KV/Data quota
      8GB Query/Index quota
      8GB tmpfs/ramdisk for all storage
      repro running via localhost
      repro running default (6 buckets)

      Gives us the following output: https://supportal.couchbase.com/snapshot/2d822dce68680e699f2ddec24e7c73d2%3A%3A0 + iTerm2 Session 8 Sep 2022 at 1_58_30 pm.txt

      This reproduction creates an index in defer_build, then calls build index, then waits for index to come online in a backoff loop like:

      2022/09/08 12:24:09 Executing N1QL query: SELECT sgQueryKeyspaceAlias._sync.role_access.foo as val FROM `testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` AS sgQueryKeyspaceAlias USE INDEX (sg_roleAccess_1) WHERE ANY op in OBJECT_PAIRS(sgQueryKeyspaceAlias._sync.role_access) SATISFIES op.name = 'foo' end LIMIT 1 - map[]
      2022/09/08 12:24:09 Error when querying index using statement: [SELECT sgQueryKeyspaceAlias._sync.role_access.foo as val FROM `testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` AS sgQueryKeyspaceAlias USE INDEX (sg_roleAccess_1) WHERE ANY op in OBJECT_PAIRS(sgQueryKeyspaceAlias._sync.role_access) SATISFIES op.name = 'foo' end LIMIT 1] parameters: [map[]] error:%!w(*gocb.QueryError=&{{0xc000025100} SELECT sgQueryKeyspaceAlias._sync.role_access.foo as val FROM `testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` AS sgQueryKeyspaceAlias USE INDEX (sg_roleAccess_1) WHERE ANY op in OBJECT_PAIRS(sgQueryKeyspaceAlias._sync.role_access) SATISFIES op.name = 'foo' end LIMIT 1 0354e3b8-3acd-42cf-a04e-95510ed57fce [{4000 No index available on keyspace `default`:`testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` that matches your query. Use CREATE PRIMARY INDEX ON `default`:`testBucket_0`.`sg_test_scope_2ee10888d8dff3c0cb97ade88ca85a62`.`sg_test_collection_7ef937c2a8e807bbf7b803badf267264` to create a primary index, or check that your expected index is online. false map[]}]  [] 0  404})
      

      At the beginning, these queries to validate that the indexes are online come back quickly (3s) but occasionally we see them slow down to 25-45s. At the end of the output, the query for the index availability doesn’t return in 1 minute.

      Additionally, within the web UI, we can see that there are 404k remaining mutations on a given index.

      Can you help identify what is causing the performance degradation we’re seeing? We’re open to changing the way we prepare collections for tests, but need to understand what the issue is that we’re hitting.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            amit.kulkarni Amit Kulkarni
            tor.colvin Tor Colvin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty