Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-33420

Indexer crashes during create index

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      Build : 6.5.0-2673
      Last known good build : 6.5.0-2671

      The sanity test creates a bucket on a single node that runs kv, index and n1ql service, loads some documents, creates a view index (primary), and then creates a secondary index. While the secondary index is being built, the indexer crashes with a panic as mentioned below.

      Test log :

      [2019-03-19 10:44:33,200] - [tuq_helper:101] INFO - TOTAL ELAPSED TIME: 76.743338ms
      [2019-03-19 10:44:33,234] - [basetestcase:2386] INFO - list of index nodes in cluster: [ip:172.23.107.160 port:8091 ssh_username:root]
      [2019-03-19 10:44:34,223] - [tuq_helper:77] INFO - RUN QUERY CREATE INDEX employee58d96924f16643f7aeef7f55e6d1ea27job_title ON default(job_title) USING GSI  WITH {'nodes': ['172.23.107.160:8091'], 'defer_build': True}
      [2019-03-19 10:44:34,231] - [rest_client:3349] INFO - query params : statement=CREATE+INDEX+employee58d96924f16643f7aeef7f55e6d1ea27job_title+ON+default%28job_title%29+USING+GSI++WITH+%7B%27nodes%27%3A+%5B%27172.23.107.160%3A8091%27%5D%2C+%27defer_build%27%3A+True%7D
      [2019-03-19 10:44:56,609] - [rest_client:839] ERROR - POST http://172.23.107.160:8093/query?statement=CREATE+INDEX+employee58d96924f16643f7aeef7f55e6d1ea27job_title+ON+default%28job_title%29+USING+GSI++WITH+%7B%27nodes%27%3A+%5B%27172.23.107.160%3A8091%27%5D%2C+%27defer_build%27%3A+True%7D body:  headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Accept': '*/*', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n'} error: 500 reason: unknown {
      "requestID": "38582d32-3d41-4e7e-bcb1-3688f25b545b",
      "signature": null,
      "results": [
      ],
      "errors": [{"code":5000,"msg":"GSI CreateIndex() - cause: Create index or Alter replica cannot proceed due to rebalance in progress, another concurrent create index request, network partition, node failover, or indexer failure."}],
      "status": "errors",
      "metrics": {"elapsedTime": "22.373430632s","executionTime": "22.373306185s","resultCount": 0,"resultSize": 0,"errorCount": 1}
      }
      

      Query log during the same time duration:

      2019-03-19T10:44:34.236-07:00 [Info] send prepare create request to watcher 172.23.107.160:9100
      2019-03-19T10:44:37.055-07:00 [Info] serviceChangeNotifier: received PoolChangeNotification
      2019-03-19T10:44:37.074-07:00 [Info] Refreshing indexer list due to cluster changes or auto-refresh.
      2019-03-19T10:44:37.075-07:00 [Info] Refreshed Indexer List: [172.23.107.160:9100]
      2019-03-19T10:44:37.076-07:00 [Info] switched currmeta from 1 -> 1 force true 
      2019-03-19T10:44:56.606-07:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.107.160:9100.  Error = EOF. Kill Pipe.
      2019-03-19T10:44:56.606-07:00 [Error] WatcherServer.runOnce() : Watcher terminated unexpectedly.
      2019-03-19T10:44:56.606-07:00 [Error] Fail to process prepare request on 172.23.107.160:9100. Error: Terminate Request due to server termination
      2019-03-19T10:44:56.606-07:00 [Error] Fail to create index: Create index or Alter replica cannot proceed due to rebalance in progress, another concurrent create index request, network partition, node failover, or indexer failure.
      2019-03-19T10:44:56.606-07:00 [Info] CreateIndex 0 default/employee58d96924f16643f7aeef7f55e6d1ea27job_title using:GSI exprType:N1QL whereExpr:<ud>()</ud> secExprs:<ud>([`job_title`])</ud> desc:[false] isPrimary:false scheme:SINGLE  partitionKeys:<ud>([])</ud> with:{"defer_build":true,"nodes":["172.23.107.160:8091"]} - elapsed(22.371244299s) err(Create index or Alter replica cannot proceed due to rebalance in progress, another concurrent create index request, network partition, node failover, or indexer failure.)
      2019-03-19T10:44:56.606-07:00 [Info] send cancel create request to watcher 172.23.107.160:9100
      2019-03-19T10:44:57.607-07:00 [Error] WatcherServer.runOnce() error : dial tcp 172.23.107.160:9100: connect: connection refused
      _time=2019-03-19T10:44:57.667-07:00 _level=INFO _msg=n1fty: no index definitions available 
      _time=2019-03-19T10:44:57.687-07:00 _level=INFO _msg=n1fty: no index definitions available 
      _time=2019-03-19T10:44:57.759-07:00 _level=INFO _msg=n1fty: no index definitions available 
      _time=2019-03-19T10:44:57.788-07:00 _level=INFO _msg=n1fty: no index definitions available 
      

      Indexer log during the same time period:

      2019-03-19T10:44:29.294-07:00 [Info] [Queryport ":9101"] connection "172.23.107.160:49034" doReceive() ...
      2019-03-19T10:44:29.791-07:00 [Info] Fetch new stats upon request by life cycle manager
      2019-03-19T10:44:32.056-07:00 [Info] serviceChangeNotifier: received PoolChangeNotification
      2019-03-19T10:44:37.055-07:00 [Info] serviceChangeNotifier: received PoolChangeNotification
      2019-03-19T10:44:55.556-07:00 [Info] PeriodicStats = {"cpu_utilization":0.4,"index_not_found_errcount":0,"indexer_state":"Active","memory_free":4704972800,"memory_quota":268435456,"memory_rss":24506368,"memory_total":7934996480,"memory_total_storage":4239360,"memory_used":4390912,"memory_used_queue":0,"memory_used_storage":151552,"needs_restart":false,"num_connections":0,"num_cpu_core":4,"storage_mode":"plasma","timings/stats_response":"143 17828374 2398954540930","uptime":"2m0.078557176s"}
      2019-03-19T10:44:55.582-07:00 [Info] memstats {"Alloc":52595264, "TotalAlloc":54068208, "Sys":60315896, "Lookups":52, "Mallocs":25725,"Frees":14090, "HeapAlloc":52595264, "HeapSys":54362112, "HeapIdle":770048, "HeapInuse":53592064,"HeapReleased":0, "HeapObjects":11635,"GCSys":1957888, "LastGC":1553017375596794921,"PauseTotalNs":10967806, "PauseNs":[], "NumGC":4}
      2019-03-19T10:44:55.587-07:00 [Info] ServiceMgr::rebalanceJanitor Running Periodic Cleanup
      2019-03-19T10:44:56.605-07:00 [Info] janitor: running cleanup.
      2019-03-19T10:44:56.605-07:00 [Info] janitor: go-routine terminates.
      2019-03-19T10:44:56.605-07:00 [Info] builder: recovering scheduled index
      2019-03-19T10:44:56.605-07:00 [Info] builder: go-routine terminates.
      2019-03-19T10:44:56.605-07:00 [Info] manager ResetConnection: closing metadata repo
      2019-03-19T10:44:56.606-07:00 [INFO][FDB] Forestdb closed database file /opt/couchbase/var/lib/couchbase/data/@2i/MetadataStore
      2019-03-19T10:44:56.606-07:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.107.160:34392.  Error = read tcp 172.23.107.160:9100->172.23.107.160:34392: use of closed network connection. Kill Pipe.
      2019-03-19T10:44:56.606-07:00 [Error] PeerListener.listen(): Error in accepting new connection.  Error = accept tcp [::]:9100: use of closed network connection. Terminate.
      2019/03/19 10:44:57 http: panic serving 172.23.107.160:59369: runtime error: invalid memory address or nil pointer dereference
      goroutine 186 [running]:
      net/http.(*conn).serve.func1(0xc42001d780)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:1491 +0x12a
      panic(0xf0eee0, 0xc420016070)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:458 +0x243
      github.com/couchbase/indexing/secondary/fdb.(*KVStore).GetKV(0x0, 0xc4232e2b90, 0x9, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/fdb/kv.go:23 +0x73
      github.com/couchbase/gometa/repository.(*Repository).Get(0xc423747c00, 0x2, 0x107f3fb, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/gometa/repository/repo.go:255 +0x13a
      github.com/couchbase/gometa/repository.(*ServerConfig).GetStr(0xc4231af6f8, 0x107f3fb, 0x9, 0x116512f, 0xc420301858, 0x431302, 0xc420301828)
      	goproj/src/github.com/couchbase/gometa/repository/server_config.go:121 +0x58
      github.com/couchbase/gometa/server.(*EmbeddedServer).GetConfigValue(0xc4237640a0, 0x107f3fb, 0x9, 0xc420301870, 0x9dbf53, 0xc420301868, 0xc420301878)
      	goproj/src/github.com/couchbase/gometa/server/embeddedServer.go:299 +0x43
      github.com/couchbase/indexing/secondary/manager.(*LocalRepoRef).getLocalValue(0xc423747b20, 0x107f3fb, 0x9, 0xc4212b8000, 0x1082176, 0x7f273dbd8960, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/meta_repo.go:735 +0x42
      github.com/couchbase/indexing/secondary/manager.(*MetadataRepo).GetLocalValue(0xc423756640, 0x107f3fb, 0x9, 0x0, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/meta_repo.go:185 +0xb1
      github.com/couchbase/indexing/secondary/manager.(*MetadataRepo).GetLocalIndexerId(0xc423756640, 0xc4231a3c00, 0x0, 0xc420301978, 0x40f0cb)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/meta_repo.go:151 +0x40
      github.com/couchbase/indexing/secondary/manager.(*requestHandlerContext).getLocalIndexMetadata(0x1a4ffc0, 0x19eaa40, 0xc420130b00, 0x0, 0x0, 0xc4231a3c00, 0x0, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/request_handler.go:876 +0x119
      github.com/couchbase/indexing/secondary/manager.(*requestHandlerContext).handleLocalIndexMetadataRequest(0x1a4ffc0, 0x19ec240, 0xc4232c2820, 0xc4232192c0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/request_handler.go:862 +0xbb
      github.com/couchbase/indexing/secondary/manager.(*requestHandlerContext).(github.com/couchbase/indexing/secondary/manager.handleLocalIndexMetadataRequest)-fm(0x19ec240, 0xc4232c2820, 0xc4232192c0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/request_handler.go:171 +0x48
      net/http.HandlerFunc.ServeHTTP(0xc420342700, 0x19ec240, 0xc4232c2820, 0xc4232192c0)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:1726 +0x44
      net/http.(*ServeMux).ServeHTTP(0xc420129050, 0x19ec240, 0xc4232c2820, 0xc4232192c0)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:2022 +0x7f
      net/http.serverHandler.ServeHTTP(0xc420132380, 0x19ec240, 0xc4232c2820, 0xc4232192c0)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:2202 +0x7d
      net/http.(*conn).serve(0xc42001d780, 0x19edd00, 0xc422223000)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:1579 +0x4b7
      created by net/http.(*Server).Serve
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:2293 +0x44d
      2019/03/19 10:44:57 http: panic serving 172.23.107.160:59781: runtime error: invalid memory address or nil pointer dereference
      goroutine 354 [running]:
      net/http.(*conn).serve.func1(0xc423733880)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:1491 +0x12a
      panic(0xf0eee0, 0xc420016070)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:458 +0x243
      github.com/couchbase/indexing/secondary/fdb.(*KVStore).GetKV(0x0, 0xc4233aa8f0, 0x9, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/fdb/kv.go:23 +0x73
      github.com/couchbase/gometa/repository.(*Repository).Get(0xc423747c00, 0x2, 0x107f3fb, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/gometa/repository/repo.go:255 +0x13a
      github.com/couchbase/gometa/repository.(*ServerConfig).GetStr(0xc4231af6f8, 0x107f3fb, 0x9, 0x116512f, 0xc423441868, 0x431302, 0xc423441838)
      	goproj/src/github.com/couchbase/gometa/repository/server_config.go:121 +0x58
      github.com/couchbase/gometa/server.(*EmbeddedServer).GetConfigValue(0xc4237640a0, 0x107f3fb, 0x9, 0xc423441880, 0x9dbf53, 0xc423441878, 0xc423441888)
      	goproj/src/github.com/couchbase/gometa/server/embeddedServer.go:299 +0x43
      github.com/couchbase/indexing/secondary/manager.(*LocalRepoRef).getLocalValue(0xc423747b20, 0x107f3fb, 0x9, 0xc4212b8000, 0x1082176, 0x7f273dbd8000, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/meta_repo.go:735 +0x42
      github.com/couchbase/indexing/secondary/manager.(*MetadataRepo).GetLocalValue(0xc423756640, 0x107f3fb, 0x9, 0x0, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/meta_repo.go:185 +0xb1
      github.com/couchbase/indexing/secondary/manager.(*MetadataRepo).GetLocalIndexerId(0xc423756640, 0xc423611810, 0x0, 0xc423441988, 0x40f0cb)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/meta_repo.go:151 +0x40
      github.com/couchbase/indexing/secondary/manager.(*requestHandlerContext).getLocalIndexMetadata(0x1a4ffc0, 0x19eaa40, 0xc42365d740, 0x0, 0x0, 0xc423611810, 0x0, 0x0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/request_handler.go:876 +0x119
      github.com/couchbase/indexing/secondary/manager.(*requestHandlerContext).handleLocalIndexMetadataRequest(0x1a4ffc0, 0x19ec240, 0xc420273380, 0xc4232e05a0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/request_handler.go:862 +0xbb
      github.com/couchbase/indexing/secondary/manager.(*requestHandlerContext).(github.com/couchbase/indexing/secondary/manager.handleLocalIndexMetadataRequest)-fm(0x19ec240, 0xc420273380, 0xc4232e05a0)
      	goproj/src/github.com/couchbase/indexing/secondary/manager/request_handler.go:171 +0x48
      net/http.HandlerFunc.ServeHTTP(0xc420342700, 0x19ec240, 0xc420273380, 0xc4232e05a0)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:1726 +0x44
      net/http.(*ServeMux).ServeHTTP(0xc420129050, 0x19ec240, 0xc420273380, 0xc4232e05a0)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:2022 +0x7f
      net/http.serverHandler.ServeHTTP(0xc420132380, 0x19ec240, 0xc420273380, 0xc4232e05a0)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:2202 +0x7d
      net/http.(*conn).serve(0xc423733880, 0x19edd00, 0xc420130c40)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:1579 +0x4b7
      created by net/http.(*Server).Serve
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/net/http/server.go:2293 +0x44d
      2019-03-19T10:44:57.513-07:00 [Info] ServiceMgr::GetTaskList []
      2019-03-19T10:44:57.513-07:00 [Info] ServiceMgr::GetTaskList returns &{[0 0 0 0 0 0 0 4] []}
      2019-03-19T10:44:57.513-07:00 [Info] ServiceMgr::GetCurrentTopology []
      2019-03-19T10:44:57.513-07:00 [Info] ServiceMgr::GetCurrentTopology returns &{[0 0 0 0 0 0 0 4] [add1a2b41e6faff84e970c0cb9ff5c77] true []}
      2019-03-19T10:44:57.513-07:00 [Info] ServiceMgr::GetTaskList [0 0 0 0 0 0 0 4]
      2019-03-19T10:44:57.514-07:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 4]
      2019-03-19T10:44:57.606-07:00 [Info] manager ResetConnection: creating new lifecycle mgr
      2019-03-19T10:44:57.630-07:00 [INFO][FDB] Forestdb opened database file /opt/couchbase/var/lib/couchbase/data/@2i/MetadataStore
      2019-03-19T10:44:57.630-07:00 [Info] manager ResetConnection: restarting metadadta repo
      2019-03-19T10:44:57.631-07:00 [Info] manager ResetConnection: restarting lifecycle mgr
      2019-03-19T10:44:57.631-07:00 [Info] handleSecurityChange: restarting mutation manager
      2019-03-19T10:44:57.631-07:00 [Info] handleSecurityChange: restarting scan coordinator
      2019-03-19T10:44:57.631-07:00 [Info] [Queryport ":9101"] ... stopped
      2019-03-19T10:44:57.631-07:00 [Info] [Queryport ":9101"] ... restarting listener
      2019-03-19T10:44:57.632-07:00 [Error] [Queryport ":9101"] failed starting listener :9101 listen tcp :9101: bind: address already in use !!
      2019-03-19T10:44:57.632-07:00 [Error] receiving packet: read tcp 172.23.107.160:9101->172.23.107.160:49034: use of closed network connection
      2019-03-19T10:44:57.632-07:00 [Error] [Queryport ":9101"] connection "172.23.107.160:49034" exited read tcp 172.23.107.160:9101->172.23.107.160:49034: use of closed network connection
      2019-03-19T10:44:57.632-07:00 [Info] [Queryport ":9101"] connection 172.23.107.160:49034 closed
      2019-03-19T10:44:57.632-07:00 [Error] [Queryport ":9101"] Accept() Error: accept tcp [::]:9101: use of closed network connection
      2019-03-19T10:44:57.652-07:00 [Info] updator: updating service map.  server group=Group 1, indexerVersion=4 nodeAddr 172.23.107.160:8091 clusterVersion 4 excludeNode %!(EXTRA uint64=2)
      2019-03-19T10:44:58.632-07:00 [Error] HTTPS Server terminates on error: accept tcp [::]:19102: use of closed network connection
      2019-03-19T10:44:59.609-07:00 [Info] lifecycleMgr.dispatchRequest: op OPCODE_PREPARE_CREATE_INDEX elapsed 134.76µs len(expediates) 0 len(incomings) 0
      2019-03-19T10:44:59.632-07:00 [Info] handleSecurityChange: restarting http/https server
      2019-03-19T10:44:59.632-07:00 [Info] indexer:: SSL server started: :19102
      2019-03-19T10:44:59.632-07:00 [Info] handleSecurityChange: restarting bucket sequence cache
      2019-03-19T10:44:59.632-07:00 [Info] handleSecurityChange: done
      2019-03-19T10:44:59.632-07:00 [Info] handleSecurityChange: refresh security context
      2019-03-19T10:44:59.656-07:00 [Info] security context initialized
      2019-03-19T10:44:59.656-07:00 [Info] security port mapping updated : map[9101:9101 9105:9105 9999:9999 11210:11207 9100:9100 9103:9103 9104:9104 8093:18093 9102:19102]
      2019-03-19T10:44:59.656-07:00 [Info] handleSecurityChange: restarting index manager
      2019-03-19T10:44:59.656-07:00 [Info] manager ResetConnection: terminating lifecycle mgr
      2019-03-19T10:44:59.656-07:00 [Info] LifecycleMgr.broadcastStats(): received kill signal. Shutting down broadcastStats routine.
      2019-03-19T10:44:59.656-07:00 [Info] LifecycleMgr.processRequest(): receive kill signal. Stop Client request processing.
      2019-03-19T10:44:59.656-07:00 [Info] updator: go-routine terminates.
      2019-03-19T10:45:00.076-07:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.107.160:34538.  Error = EOF. Kill Pipe.
      2019-03-19T10:45:00.076-07:00 [Info] messageListener.start(): message channel closed. Remove peer 8e:d5:9c:d4:d7:39:ad:3 as follower.
      2019-03-19T10:45:00.430-07:00 [Info] Indexer started with command line: [/opt/couchbase/bin/indexer -adminPort=9100 -scanPort=9101 -httpPort=9102 -streamInitPort=9103 -streamCatchupPort=9104 -streamMaintPort=9105 --httpsPort=19102 --certFile=/opt/couchbase/var/lib/couchbase/config/memcached-cert.pem --keyFile=/opt/couchbase/var/lib/couchbase/config/memcached-key.pem -vbuckets=1024 -cluster=127.0.0.1:8091 -storageDir=/opt/couchbase/var/lib/couchbase/data/@2i -diagDir=/opt/couchbase/var/lib/couchbase/crash -nodeUUID=add1a2b41e6faff84e970c0cb9ff5c77 -ipv6=false -isEnterprise=true]
      2019-03-19T10:45:00.447-07:00 [Info] Indexer::NewIndexer Status Warmup
      

      This issue was not seen before 6.5.0-2673, and 6.5.0-2673 onwards, it is seen in multiple platforms, and consistently.

      Attachments

        For Gerrit Dashboard: MB-33420
        # Subject Branch Project Status CR V

        Activity

          People

            jliang John Liang
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty