Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48827

magma backend on indexer causes whole cluster to crash

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Basic q2 n1ql plasma dgm test but we are now running with magma bucket backend. The runs for q1 n1ql work and do not use indexer. The tests with indexer fail at the cluster phase. After the nodes are joined and rebalanced into a cluster, we will update indexer and query settings. At this point the whole cluster crashes, no UI of any of the 6 nodes is reachable. I have grabbed the logs for the first kv node and the indexer node.

      job: http://perf.jenkins.couchbase.com/job/iris/26136/console

      kv logs: logs_kv.zip

      indexer logs: logs.zip

      Attachments

        1. logs_kv.zip
          37.17 MB
        2. logs.zip
          22.97 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          jeelan.poola Jeelan Poola added a comment -

          Removing secondary-index from components to avoid double counting in Jira dashboards. Please add back and assign the ticket to indexing team if GSI needs to analyse something. Thank you!

          jeelan.poola Jeelan Poola added a comment - Removing secondary-index from components to avoid double counting in Jira dashboards. Please add back and assign the ticket to indexing team if GSI needs to analyse something. Thank you!
          owend Daniel Owen added a comment - - edited

          Took a look at memcached.log in logs_kv.zip and they contain nothing of interest.
          They don't even contain any WARN messages

          Also took a look at the memcached.log in logs.zip - just in case but again nothing of interest.

          Think we need to move focus to the indexer issue. Therefore assigning to 2i team.

          owend Daniel Owen added a comment - - edited Took a look at memcached.log in logs_kv.zip and they contain nothing of interest. They don't even contain any WARN messages Also took a look at the memcached.log in logs.zip - just in case but again nothing of interest. Think we need to move focus to the indexer issue. Therefore assigning to 2i team.

          Looks like ns_server may have terminated.

          2021-10-08T10:41:47.858-07:00 [Warn] serviceChangeNotifier: Connection terminated for pool notifier instance of http://%40index-cbauth@127.0.0.1:8091, default (invalid byte in chunk length)
          2021-10-08T10:41:47.957-07:00 [Warn] serviceChangeNotifier: Connection terminated for collection manifest notifier instance of http://%40index-cbauth@127.0.0.1:8091, default, bucket: bucket-1, (unexpected EOF)
          2021-10-08T10:41:47.957-07:00 [Warn] serviceChangeNotifier: Connection terminated for services notifier instance of http://%40index-cbauth@127.0.0.1:8091, default (unexpected EOF)
          2021-10-08T10:41:47.957-07:00 [Info] ServiceMgr::listenMoveIndex metakv err unexpected EOF. Retrying...
          2021-10-08T10:41:48.958-07:00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1
          2021-10-08T10:41:48.958-07:00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1
          2021-10-08T10:41:48.958-07:00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1
          2021-10-08T10:41:48.958-07:00 [Error] IndexerSettingsManager: metakv notifier failed (unexpected EOF)..Restarting 1
          2021-10-08T10:41:48.958-07:00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1
          2021-10-08T10:41:49.958-07:00 [Info] ServiceMgr::listenMoveIndex metakv err Get "http://127.0.0.1:8091/_metakv/indexing/rebalance/?feed=continuous": dial tcp 127.0.0.1:8091: connect: connection refused. Retrying...
          

          Once this happens, indexer tries to connect to 8091 for /pools and /_metakv. Both keep failing connection refused error.

          2021-10-08T10:41:50.860-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.860-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.860-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.860-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.860-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.860-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.869-07:00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools": dial tcp 127.0.0.1:8091: connect: connection refused) .. Retrying(1)
          2021-10-08T10:41:50.959-07:00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous": dial tcp 127.0.0.1:8091: connect: connection refused)..Restarting 2
          2021-10-08T10:41:50.959-07:00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous": dial tcp 127.0.0.1:8091: connect: connection refused)..Restarting 2
          2021-10-08T10:41:50.959-07:00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous": dial tcp 127.0.0.1:8091: connect: connection refused)..Restarting 2
          2021-10-08T10:41:50.959-07:00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous": dial tcp 127.0.0.1:8091: connect: connection refused)..Restarting 2
          2021-10-08T10:41:50.959-07:00 [Error] IndexerSettingsManager: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/?feed=continuous": dial tcp 127.0.0.1:8091: connect: connection refused)..Restarting 2
          

          There are also errors due to invalid config param.

          2021-10-08T10:41:40.535-07:00 [Error] invalid config param "indexer.settings.enable_page_bloom_filter"
          

          amit.kulkarni Amit Kulkarni added a comment - Looks like ns_server may have terminated. 2021 - 10 -08T10: 41 : 47.858 - 07 : 00 [Warn] serviceChangeNotifier: Connection terminated for pool notifier instance of http: //%40index-cbauth@127.0.0.1:8091, default (invalid byte in chunk length) 2021 - 10 -08T10: 41 : 47.957 - 07 : 00 [Warn] serviceChangeNotifier: Connection terminated for collection manifest notifier instance of http: //%40index-cbauth@127.0.0.1:8091, default, bucket: bucket-1, (unexpected EOF) 2021 - 10 -08T10: 41 : 47.957 - 07 : 00 [Warn] serviceChangeNotifier: Connection terminated for services notifier instance of http: //%40index-cbauth@127.0.0.1:8091, default (unexpected EOF) 2021 - 10 -08T10: 41 : 47.957 - 07 : 00 [Info] ServiceMgr::listenMoveIndex metakv err unexpected EOF. Retrying... 2021 - 10 -08T10: 41 : 48.958 - 07 : 00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1 2021 - 10 -08T10: 41 : 48.958 - 07 : 00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1 2021 - 10 -08T10: 41 : 48.958 - 07 : 00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1 2021 - 10 -08T10: 41 : 48.958 - 07 : 00 [Error] IndexerSettingsManager: metakv notifier failed (unexpected EOF)..Restarting 1 2021 - 10 -08T10: 41 : 48.958 - 07 : 00 [Error] CommandListener: metakv notifier failed (unexpected EOF)..Restarting 1 2021 - 10 -08T10: 41 : 49.958 - 07 : 00 [Info] ServiceMgr::listenMoveIndex metakv err Get "http://127.0.0.1:8091/_metakv/indexing/rebalance/?feed=continuous" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused. Retrying... Once this happens, indexer tries to connect to 8091 for /pools and /_metakv. Both keep failing connection refused error. 2021 - 10 -08T10: 41 : 50.860 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.860 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.860 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.860 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.860 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.860 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.869 - 07 : 00 [Info] Error occurred during cluster info update (Get "http://127.0.0.1:8091/pools" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused) .. Retrying( 1 ) 2021 - 10 -08T10: 41 : 50.959 - 07 : 00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused)..Restarting 2 2021 - 10 -08T10: 41 : 50.959 - 07 : 00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused)..Restarting 2 2021 - 10 -08T10: 41 : 50.959 - 07 : 00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused)..Restarting 2 2021 - 10 -08T10: 41 : 50.959 - 07 : 00 [Error] CommandListener: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/indexing/ddl/commandToken/?feed=continuous" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused)..Restarting 2 2021 - 10 -08T10: 41 : 50.959 - 07 : 00 [Error] IndexerSettingsManager: metakv notifier failed (Get "http://127.0.0.1:8091/_metakv/?feed=continuous" : dial tcp 127.0 . 0.1 : 8091 : connect: connection refused)..Restarting 2 There are also errors due to invalid config param. 2021 - 10 -08T10: 41 : 40.535 - 07 : 00 [Error] invalid config param "indexer.settings.enable_page_bloom_filter"

          Korrigan Clark,

          Looks like it must have been fixed after changes from MB-47195 and MB-48679 were merged.

          Resolving this as duplicate of MB-48679 for now. Please reopen if it is seen again.

          amit.kulkarni Amit Kulkarni added a comment - Korrigan Clark , Looks like it must have been fixed after changes from MB-47195 and MB-48679 were merged. Resolving this as duplicate of MB-48679 for now. Please reopen if it is seen again.
          korrigan.clark Korrigan Clark added a comment - Amit Kulkarni  queued a run on 1650 to verify:  http://perf.jenkins.couchbase.com/job/iris/26274/

          People

            amit.kulkarni Amit Kulkarni
            korrigan.clark Korrigan Clark
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty