Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56443

Analytics node goes down causing queries to fail

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 7.2.0
    • 7.2.0
    • analytics
    • 7.2.0-5298 on debian 10 box

    Description

      Automated test fails because analytics node goes down while queries are running.

      Scenario 1 -

      Cluster config -

      Node Services CPU_utilization Mem_total Mem_free Swap_mem_used Active / Replica Version
      172.23.109.170 cbas 8.08058045292 3.83 GiB 2.77 GiB 55.50 MiB / 3.99 GiB 0 / 0 7.2.0-5298-enterprise
      172.23.109.184 kv 50.0118621845 3.83 GiB 2.94 GiB 20.75 MiB / 3.99 GiB 0 / 0 7.2.0-5298-enterprise
      172.23.109.168 cbas 8.32243924735 3.83 GiB 2.69 GiB 30.25 MiB / 3.99 GiB 0 / 0 7.2.0-5298-enterprise
      172.23.109.171 cbas 8.11232591226 3.83 GiB 2.51 GiB 0.0 Byte / 0.0 Byte 0 / 0 7.2.0-5298-enterprise
      172.23.109.174 cbas 8.13564864678 3.83 GiB 2.73 GiB 30.25 MiB / 3.99 GiB 0 / 0 7.2.0-5298-enterprise
      172.23.109.185 kv, n1ql 58.8147584449 3.83 GiB 2.81 GiB 36.25 MiB / 3.99 GiB 0 / 0 7.2.0-5298-enterprise

      Test Steps -

      1. Enable N2N encryption on the cluster with encryption set to control.
      2. Created a KV bucket named `bucket-5`.
      3. Created 100 collections in the bucket with each collection having 500 docs.
      4. Now parallelly start 3 tasks -
        1. data load on KV (This task ends when the amount of data specified is loaded)
        2. creation of 1 dataset on each KV collection 
        3. running "select count from <dataset>" for all the datasets created. (This task continues endlessly until explicitly stopped)
      5. Wait for dataset creation task to finish.
      6. Test fails here while creating one of the dataset.
      7. Following error is observed in analytics logs for node 172.23.109.168

      2023-04-13T05:15:04.268-07:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0:0:0:0:0:0:0:0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: /172.23.109.171:9116 Local Address: /0:0:0:0:0:0:0:0:9116]
      org.apache.hyracks.api.exceptions.NetException: Socket Closed
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:360) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:119) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:199) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
      2023-04-13T05:15:04.268-07:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0:0:0:0:0:0:0:0:9117]] Unexpected tcp io error in connection TCPConnection[Remote Address: /172.23.109.171:56308 Local Address: /0:0:0:0:0:0:0:0:9117]
      org.apache.hyracks.api.exceptions.NetException: Socket Closed
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:360) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:119) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:199) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
      2023-04-13T05:15:04.274-07:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0:0:0:0:0:0:0:0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: /172.23.109.171:44238 Local Address: /0:0:0:0:0:0:0:0:9116]
      org.apache.hyracks.api.exceptions.NetException: Socket Closed
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:360) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:119) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:199) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
      2023-04-13T05:15:04.274-07:00 ERRO CBAS.impl.IPCConnectionManager [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:9115]] TCP read error from /172.23.109.171:9112
      java.io.IOException: Connection reset by peer
              at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:?]
              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:?]
              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[?:?]
              at sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[?:?]
              at sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[?:?]
              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356) ~[?:?]
              at org.apache.hyracks.ipc.sockets.PlainSocketChannel.read(PlainSocketChannel.java:47) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.read(IPCConnectionManager.java:434) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.processSelectedKeys(IPCConnectionManager.java:261) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.doRun(IPCConnectionManager.java:231) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.run(IPCConnectionManager.java:213) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]

      Scenario 2-

      Cluster config - same as scenario 1

      Test Steps -

      1. Enable N2N encryption on the cluster with encryption set to control.
      2. Created a KV bucket named `bucket-7`.
      3. Created 500 collections in the bucket with each collection having 10 docs.
      4. Now a task is run to create 1 dataset on all 500 KV collections.
      5. Test fails here while creating one of the dataset.
      6. Following error is observed in analytics logs for node 172.23.109.168

      2023-04-13T05:15:04.268-07:00 WARN CBAS.management.ReplicationChannel [Replication Worker-2(/172.23.109.171:45136)] unexpected error during replication.
      org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException: could not read all data from source; remaining bytes: 4
              at org.apache.asterix.replication.messaging.ReplicateLogsTask.perform(ReplicateLogsTask.java:73) ~[asterix-replication-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.asterix.replication.management.ReplicationChannel$ReplicationWorker.handle(ReplicationChannel.java:174) ~[asterix-replication-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.asterix.replication.management.ReplicationChannel$ReplicationWorker.run(ReplicationChannel.java:140) ~[asterix-replication-7.2.0-5298.jar:7.2.0-5298]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
              at java.lang.Thread.run(Thread.java:829) ~[?:?]
      Caused by: java.io.EOFException: could not read all data from source; remaining bytes: 4
              at org.apache.asterix.replication.management.NetworkingUtil.readBytes(NetworkingUtil.java:55) ~[asterix-replication-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.asterix.replication.messaging.ReplicationProtocol.readRequest(ReplicationProtocol.java:76) ~[asterix-replication-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.asterix.replication.messaging.ReplicateLogsTask.perform(ReplicateLogsTask.java:64) ~[asterix-replication-7.2.0-5298.jar:7.2.0-5298]
              ... 5 more
      2023-04-13T05:15:04.268-07:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0:0:0:0:0:0:0:0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: /172.23.109.171:9116 Local Address: /0:0:0:0:0:0:0:0:9116]
      org.apache.hyracks.api.exceptions.NetException: Socket Closed
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:360) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:119) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:199) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
      2023-04-13T05:15:04.268-07:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0:0:0:0:0:0:0:0:9117]] Unexpected tcp io error in connection TCPConnection[Remote Address: /172.23.109.171:56308 Local Address: /0:0:0:0:0:0:0:0:9117]
      org.apache.hyracks.api.exceptions.NetException: Socket Closed
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:360) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:119) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:199) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
      2023-04-13T05:15:04.274-07:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0:0:0:0:0:0:0:0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: /172.23.109.171:44238 Local Address: /0:0:0:0:0:0:0:0:9116]
      org.apache.hyracks.api.exceptions.NetException: Socket Closed
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.driveReaderStateMachine(MultiplexedConnection.java:360) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.muxdemux.MultiplexedConnection.notifyIOReady(MultiplexedConnection.java:119) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.net.protocols.tcp.TCPEndpoint$IOThread.run(TCPEndpoint.java:199) ~[hyracks-net-7.2.0-5298.jar:7.2.0-5298]
      2023-04-13T05:15:04.274-07:00 ERRO CBAS.impl.IPCConnectionManager [IPC Network Listener Thread [/0:0:0:0:0:0:0:0:9115]] TCP read error from /172.23.109.171:9112
      java.io.IOException: Connection reset by peer
              at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:?]
              at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:?]
              at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:276) ~[?:?]
              at sun.nio.ch.IOUtil.read(IOUtil.java:245) ~[?:?]
              at sun.nio.ch.IOUtil.read(IOUtil.java:223) ~[?:?]
              at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:356) ~[?:?] 
              at org.apache.hyracks.ipc.sockets.PlainSocketChannel.read(PlainSocketChannel.java:47) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.read(IPCConnectionManager.java:434) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.processSelectedKeys(IPCConnectionManager.java:261) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.doRun(IPCConnectionManager.java:231) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]
              at org.apache.hyracks.ipc.impl.IPCConnectionManager$NetworkThread.run(IPCConnectionManager.java:213) ~[hyracks-ipc-7.2.0-5298.jar:7.2.0-5298]

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ali.alsuliman Ali Alsuliman
            umang.agrawal Umang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty