Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40835

Seeing indexer and FTS process crashing with exit status 2 and 1 respectively

    XMLWordPrintable

Details

    Description

       

      Build: 7.0.0-2792

      Scenario:

      1. Two node cluster

        +----------------+----------------------+-----------------+------------+------------+----------------------+
        | Node           | Services             | CPU_utilization | Mem_total  | Mem_free   | Swap_mem_used        |
        +----------------+----------------------+-----------------+------------+------------+----------------------+
        | 172.23.123.162 | cbas                 | 16.5407854985   | 4201672704 | 3151945728 | 3145728 / 3758092288 |
        | 172.23.123.164 | fts, index, kv, n1ql | 1.02963335008   | 4201672704 | 3381792768 | 6815744 / 3758092288 |
        +----------------+----------------------+-----------------+------------+------------+----------------------+

      1. Couchbase bucket with replica=0

        +----------------+----------------------+-----------------+------------+------------+----------------------+
        | Node           | Services             | CPU_utilization | Mem_total  | Mem_free   | Swap_mem_used        |
        +----------------+----------------------+-----------------+------------+------------+----------------------+
        | 172.23.123.162 | cbas                 | 16.5407854985   | 4201672704 | 3151945728 | 3145728 / 3758092288 |
        | 172.23.123.164 | fts, index, kv, n1ql | 1.02963335008   | 4201672704 | 3381792768 | 6815744 / 3758092288 |
        +----------------+----------------------+-----------------+------------+------------+----------------------+

      1. Create CBAS dataset and connect to the couchbase bucket
      2. Restart KV node

      Observation:

      After node restart at step#4, seeing continnuous indexer and fts services crash message from the node-1 (172.23.123.164)

      Cbcollect logs:

      https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_indexer_crash/collectinfo-2020-08-10T143824-ns_1%40172.23.123.162.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/fts_indexer_crash/collectinfo-2020-08-10T143824-ns_1%40172.23.123.164.zip

      TAF test case:

      ./testrunner -i /tmp/testexec.26703.ini replicas=0,durability=MAJORITY,GROUP=P1;durability,upgrade_version=7.0.0-2792 -t cbas.cbas_bucket_operations.CBASBucketOperations.test_restart_kv_server_impact_on_bucket,num_items=100000,create_secondary_indexes=False,cb_bucket_name=default,cbas_bucket_name=default_bucket,cbas_dataset_name=default_ds,GROUP=P1;durability

       

      Test log link: http://qa.sc.couchbase.com/job/test_suite_executor-TAF/49619/consoleText

      [<<"Service 'indexer' exited with status 2. Restarting. Messages:\n2020-08-10T07:25:56.775-07:00 [Info] Indexer started with command line: [/opt/couchbase/bin/indexer -adminPort=9100 -scanPort=9101 -httpPort=9102 -streamInitPort=9103 -streamCatchupPort=9104 -streamMaintPort=9105 --httpsPort=19102 --certFile=/opt/couchbase/var/lib/couchbase/config/memcached-cert.pem --keyFile=/opt/couchbase/var/lib/couchbase/config/memcached-key.pem -vbuckets=1024 -cluster=127.0.0.1:8091 -storageDir=/data/idx/@2i -diagDir=/opt/couchbase/var/lib/couchbase/crash -logDir=/opt/couchbase/var/lib/couchbase/logs -nodeUUID=fe088e028f2f4b27bf0efe524b9611c3 -ipv6=false -isEnterprise=true]\npanic: mkdir /data/idx: permission denied\n\ngoroutine 1 [running]:\ngithub.com/couchbase/indexing/secondary/common.CrashOnError(...)\n\t/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/common/util.go:413\nmain.main()\n\tgoproj/src/github.com/couchbase/indexing/secondary/cmd/indexer/main.go:111 +0x1658\n">>],
                  [],info,
                  {{2020,8,10},{7,25,56}}},
       {log_entry,{1597,69556,972091},
                  'ns_1@172.23.123.164',ns_log,0,
                  [<<"Service 'fts' exited with status 1. Restarting. Messages:\n2020-08-10T07:25:56.964-07:00 [INFO]   -staticDir=\"static\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -staticETag=\"\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -tags=\"feed,janitor,pindex,queryer,cbauth_service\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -tlsCertFile=\"/opt/couchbase/var/lib/couchbase/config/memcached-cert.pem\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -tlsKeyFile=\"/opt/couchbase/var/lib/couchbase/config/memcached-key.pem\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -uuid=\"fe088e028f2f4b27bf0efe524b9611c3\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -version=\"false\"\n2020-08-10T07:25:56.964-07:00 [INFO]   -weight=\"1\"\n2020-08-10T07:25:56.964-07:00 [INFO]   GOMAXPROCS=4\n2020-08-10T07:25:56.964-07:00 [INFO] main: registered bleve stores\n2020-08-10T07:25:56.964-07:00 [INFO]   goleveldb\n2020-08-10T07:25:56.964-07:00 [INFO]   moss\n2020-08-10T07:25:56.964-07:00 [INFO]   gtreap\n2020-08-10T07:25:56.964-07:00 [INFO]   boltdb\n2020-08-10T07:25:56.964-07:00 [INFO]   metrics\n2020-08-10T07:25:56.964-07:00 [FATA] main: data directory does not exist, dataDir: /data/idx/@fts -- main.main() at main.go:116\n">>],
                  [],info,
                  {{2020,8,10},{7,25,56}}},

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - - edited

          Ashwin Govindarajulu, had a quick look into the logs and it mentions that

          2020-08-09T23:24:48.580-07:00 [FATA] main: data directory does not exist, dataDir: /data/idx/@fts -- main.main() at main.go:116 

          So this is expected behaviour for FTS as there is no directory paths for it to start booting from.

          I am not sure how this happened in your test environment?

          Can you please cross-check this and ensure that the boot directory paths are available for FTS? 

          And mostly the same issue can cause indexer also restarts. (FTS shares the same higher level directory tree with that of indexer)

          Also, please note GSI and FTS are totally unrelated products and hence two tickets might need next time any issues happens.

           

           

          Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - - edited Ashwin Govindarajulu , had a quick look into the logs and it mentions that 2020 - 08 -09T23: 24 : 48.580 - 07 : 00 [FATA] main: data directory does not exist, dataDir: /data/idx/ @fts -- main.main() at main.go: 116 So this is expected behaviour for FTS as there is no directory paths for it to start booting from. I am not sure how this happened in your test environment? Can you please cross-check this and ensure that the boot directory paths are available for FTS?  And mostly the same issue can cause indexer also restarts. (FTS shares the same higher level directory tree with that of indexer) Also, please note GSI and FTS are totally unrelated products and hence two tickets might need next time any issues happens.    

          Also, is this consistently reproducible? 

          Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - Also, is this consistently reproducible? 

          Sreekanth Sivasankaran Tried  this scenario multiple times on 4GB boxes but unable to hit the scenario.

          Build used: 7.0.0-3624-enterprise 

          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Sreekanth Sivasankaran Tried  this scenario multiple times on 4GB boxes but unable to hit the scenario. Build used: 7.0.0-3624-enterprise 

          Thanks Ashwin Govindarajulu, let me close this as non-reproducible then. And let's reopen if this surface again.

          Sreekanth Sivasankaran Sreekanth Sivasankaran added a comment - Thanks Ashwin Govindarajulu , let me close this as non-reproducible then. And let's reopen if this surface again.
          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Sure Sreekanth Sivasankaran . Thanks.

          Closing since this is not reproducible.

          ashwin.govindarajulu Ashwin Govindarajulu added a comment - Closing since this is not reproducible.

          People

            ashwin.govindarajulu Ashwin Govindarajulu
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty