Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54201

[BP 7.0.x] Reclaim memory during recovery log replay

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.5
    • 6.6.5, 7.0.2, 7.1.0
    • storage-engine
    • RC1 - Enterprise Edition 7.1.0 build 2475

    Description

      1. Load 1 billion 5k(body field contains the major chunk of data) items in a default magma bucket.
      2. Create 50% fragmentation in KV
      3. Create an secondary index idx0 and wait for it to build.
      4. Start a data load as below:

      Read Start: 0
      Read End: 500000000
      Update Start: 0
      Update End: 500000000
      Expiry Start: 0
      Expiry End: 0
      Delete Start: 500000000
      Delete End: 1000000000
      Create Start: 1000000000
      Create End: 1500000000
      

      5. Rebalance IN 1 KV node. Rebalance finished in 11019.869 sec
      6. Create another index idx1 on body field while the above rebalance was at 35%. Index was building while KV rebalances were running. Total plasma disk size was at ~2TB.
      7. Rebalance OUT 1 KV node. Rebalance finished in 10870.6289999 sec
      8. Rebalance IN 2 and out 1 KV node. Rebalance failed at 80%.

      {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.110.64', u'tstamp': 1647477583397L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:39:43.397Z', u'text': u"Rebalance exited with reason {service_rebalance_failed,index,\n                              {agent_died,<26112.18949.96>,\n                               {lost_connection,\n                                {'ns_1@172.23.110.66',shutdown}}}}.\nRebalance Operation Id = 6e737e537f7cc57e8962bec1a189e45e"}
      2022-03-16 17:39:52,405 | test  | ERROR   | pool-3-thread-6 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.110.66', u'tstamp': 1647477583388L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:39:43.388Z', u'text': u"Service 'index' exited with status 2. Restarting. Messages:\nruntime.gopark(0x14603b8, 0x0, 0x1809, 0x1)\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/proc.go:336 +0xe5 fp=0xc018ea99e0 sp=0xc018ea99c0 pc=0x441ae5\nruntime.selectgo(0xc018ea9c50, 0xc018ea9ba4, 0x0, 0x0, 0x3, 0x1459601, 0x0, 0x1)\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/select.go:327 +0xef7 fp=0xc018ea9b18 sp=0xc018ea99e0 pc=0x452a97\ngithub.com/couchbase/indexing/secondary/indexer.(*Rebalancer).processDropIndexQueue(0xc018e96000)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:1383 +0x1b2 fp=0xc018ea9fd8 sp=0xc018ea9b18 pc=0xfe9cb2\nruntime.goexit()\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc018ea9fe0 sp=0xc018ea9fd8 pc=0x4784c1\ncreated by github.com/couchbase/indexing/secondary/indexer.NewRebalancer\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:171 +0x545\n"}
      2022-03-16 17:39:52,407 | test  | ERROR   | pool-3-thread-6 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.110.65', u'tstamp': 1647477562188L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:39:22.188Z', u'text': u'Shutting down bucket "GleamBookUsers0" on \'ns_1@172.23.110.65\' for deletion'}
      2022-03-16 17:39:52,407 | test  | ERROR   | pool-3-thread-6 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.110.66', u'tstamp': 1647476292642L, u'shortText': u'message', u'serverTime': u'2022-03-16T17:18:12.642Z', u'text': u"Service 'index' exited with status 2. Restarting. Messages:\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/mvcc.go:824 +0x3bd fp=0xc05021feb8 sp=0xc05021fc70 pc=0xe2c33d\ngithub.com/couchbase/plasma.(*Plasma).VisitPartition(0xc005ae2000, 0x46, 0xc014640000, 0xc014641300, 0xc0136419c0, 0x0, 0x0)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page_visitor.go:87 +0x24a fp=0xc05021ff38 sp=0xc05021feb8 pc=0xddbdca\ngithub.com/couchbase/plasma.(*Plasma).PageVisitor.func1(0xc048f98530, 0xc005ae2000, 0xc0136419c0, 0xc0007c4900, 0x48, 0x48, 0x46, 0xc014640000, 0xc014641300)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page_visitor.go:41 +0x88 fp=0xc05021ff98 sp=0xc05021ff38 pc=0xe2d1c8\nruntime.goexit()\n\t/home/couchbase/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc05021ffa0 sp=0xc05021ff98 pc=0x4784c1\ncreated by github.com/couchbase/plasma.(*Plasma).PageVisitor\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/page_visitor.go:39 +0x18b\n"}
      

      Note: This is found while trying out a recently reported customer use case(CBSE-11583). The env is much better in this case. These are physical machines with 256GB RAM and 72 cores.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-54201
          # Subject Branch Project Status CR V

          Activity

            People

              sanjit.chauhan Sanjit Chauhan (Inactive)
              srinath.duvuru Srinath Duvuru
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty