Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-26151

503 errors when upgrading from CBS 3.1.6 to 5.0

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • 5.1.0
    • 5.0.0
    • ns_server
    • None

    Description

      I ran upgrade tests in channel cache mode (only sync gateways talking to CBS) with server going from 3.1.6 to 5.0.0. I ran the test with 10, 100, 1000, 10000, 100000 and 1000000 docs. Latest test run is at http://uberjenkins.sc.couchbase.com/job/cen7-sync-gateway-upgrade/177/
      The upgrade scenario is detailed at the end.

      With just 10 docs, the test passes. If there are more than 10 docs, the upgrade itself completes but the test fails with missing doc updates.
      I see lots of 503 errors in the SG logs. These errors are different compared to “(MB-26144) Server sporadically fails to select bucket on auth”.

      I showed the errors messages to Adam and Adam said that this is a “server not responding" error. This could mean that the server may not be responsive throughout the upgrade. I can reproduce it every time with 100 docs or more. I have attached the cbcollect logs, client logs and packet captures.

      Test log:

      14:41:05 Verifying that doc ls_db_upgrade_doc_7_2 has rev 4-2911d8055071383c2289f0f5d2b13674
      14:41:05 FAILEDGET http://localhost:59840/_all_dbs 200
      14:41:12 E               assert '3-4fdfb03699...95a8cac21a02c' == '4-2911d805507...9f0f5d2b13674'
      14:41:12 E                 - 3-4fdfb036992d645aa7595a8cac21a02c
      14:41:12 E                 + 4-2911d8055071383c2289f0f5d2b13674
      

      CBL log:

      14:39:46.834‖ RemoteRequest: CBLRemoteJSONRequest[POST http://s61706cnt72.sc.couchbase.com:4984/db/_bulk_docs]: Finished loading
      14:39:46.834‖ WARNING: CBLRestPusher[http://s61706cnt72.sc.couchbase.com:4984/db]: _bulk_docs got an error: {
          error = 503;
          id = "ls_db_upgrade_doc_7_2";
          reason = "Database timeout error (gocb.ErrTimeout)";
          status = 503;
      } {at __40-[CBLRestPusher uploadBulkDocs:changes:]_block_invoke:402}
      14:39:46.834‖ Sync: CBLRestPusher[http://s61706cnt72.sc.couchbase.com:4984/db]: Sent (
          "{ls_db_upgrade_doc_7_2 #4-2911d8055071383c2289f0f5d2b13674}"
      )
      

      SG log:

      2017-09-24T14:39:46.826-07:00 WARNING: RetryLoop for Get ls_db_upgrade_doc_7_2 giving up after 11 attempts -- base.RetryLoop() at util.go:298
      2017-09-24T14:39:46.827-07:00   BulkDocs: Doc "ls_db_upgrade_doc_7_2" --> 503 Database timeout error (gocb.ErrTimeout) (operation has timed out)
      2017-09-24T14:39:46.827-07:00 HTTP+: #572:     --> 201   (32620.6 ms)
      

      Upgrade scenario:

      • Install 2 nodes with 3.1.6 CBS, 2 nodes with 1.4.1 SG and 1 load balancer for SGs.
      • Start CBL that talks to load balancer and start push pull replication between CBL and SG.
      • Add docs through CBL.
      • In a separate thread, start updating docs.
      • In the main thread, upgrade SGs
        o Stop SG service
        o Rpm –U couchbase-sync-gateway
        o Start SG service.
      • Upgrade CBS
        o Rebalance out the node
        o Rpm –e couchbase-server
        o Install new 5.0 rpm
        o Add node back to the cluster
        o Rebalance in
      • Stop doc updates
      • Verify doc ids, revisions, revision history and doc body.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              raghu.sarangapani Raghu Sarangapani (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty