Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47891

[TLS] select fails: Error performing bulk get operation

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      To repro.

      • setup 2 nodes cluster (kv:n1ql:index-kv)
      • load travel-sample bucket
      • enable N2N encryption: /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable
      • set TLS strict mode: /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict
      • Try simple select from `travel-sample`

      Error seen:

      cbq> select * from `travel-sample` use keys 'airline_10';
      {
          "requestID": "d11eb024-93bb-44c5-a473-fe13233171d3",
          "signature": {
              "*": "*"
          },
          "results": [
          ],
          "errors": [
              {
                  "code": 12008,
                  "msg": "Error performing bulk get operation  - cause: unable to complete action after 4 attempts: \u003cnil\u003e",
                  "retry": true
              }
          ],
          "status": "errors",
          "metrics": {
              "elapsedTime": "676.709455ms",
              "executionTime": "676.535487ms",
              "resultCount": 0,
              "resultSize": 0,
              "serviceLoad": 6,
              "errorCount": 1
          }
      }
      cbq> select * from `travel-sample` limit 2;
      {
          "requestID": "6c77e81f-2731-4d48-9b49-43ad503fe6d8",
          "signature": {
              "*": "*"
          },
          "results": [
          ],
          "errors": [
              {
                  "code": 12008,
                  "msg": "Error performing bulk get operation  - cause: {1 errors, starting with dial tcp 172.23.104.91:11210: connect: connection refused}",
                  "retry": true
              }
          ],
          "status": "errors",
          "metrics": {
              "elapsedTime": "30.443142402s",
              "executionTime": "30.442984089s",
              "resultCount": 0,
              "resultSize": 0,
              "serviceLoad": 6,
              "errorCount": 1
          }
      } 

      Attachments

        For Gerrit Dashboard: MB-47891
        # Subject Branch Project Status CR V

        Activity

          BTW could be related to MB-47887

          pierre.regazzoni Pierre Regazzoni added a comment - BTW could be related to  MB-47887

          so for strict TLS all services disable access to non tls ports. for memcached and ns server they should allow access on localhost .. but i assume it needs to connect as localhost. looks like we connect using the ip from node services for 2 node cluster .. and so all connections and even 8091 are rejected
          it looks like ns_server.go refresh() - should have triggered mapkvtossl

          var encrypted bool
          		if client.tlsConfig != nil {
          			hostport, encrypted, err = MapKVtoSSL(hostport, &poolServices)
          			if err != nil {
          				b.Unlock()
          				return err
          			}
          		}
          

           

          isha Isha Kandaswamy added a comment - so for strict TLS all services disable access to non tls ports. for memcached and ns server they should allow access on localhost .. but i assume it needs to connect as localhost. looks like we connect using the ip from node services for 2 node cluster .. and so all connections and even 8091 are rejected it looks like ns_server.go refresh() - should have triggered mapkvtossl var encrypted bool if client.tlsConfig != nil { hostport, encrypted, err = MapKVtoSSL(hostport, &poolServices) if err != nil { b.Unlock() return err } }  

          When I set the encryption level to strict - disablenonsslports to true, I see these failures in the query log 
           
          _time=2021-08-12T23:42:16.360-07:00 _level=INFO _msg=Unable to retrieve collections info for bucket travel-sample: Unable to get connection to retrieve collections manifest: dial tcp 172.23.99.49:11210: connect: connection refused. No collections access to bucket travel-sample.This is because for a multi-node cluster, query uses the hostnames given in nodeServices to connect to kv. 
           

          • Since kv only allows localhost / 127.0.0.1 / ::1 connections to 11210 - its non ssl port nearly all our internal requests fail
             
          • My questions is, should query be using 127.0.0.1 or ::1 here instead of the ip ? Or can kv see that the request is from the local ip so allow such requests ?
             
          • what is the general expected behavior from services ? (since i assume other services might do the same thing)
          isha Isha Kandaswamy added a comment - When I set the encryption level to strict - disablenonsslports to true, I see these failures in the query log    _time=2021-08-12T23:42:16.360-07:00 _level=INFO _msg=Unable to retrieve collections info for bucket travel-sample: Unable to get connection to retrieve collections manifest: dial tcp 172.23.99.49:11210: connect: connection refused. No collections access to bucket travel-sample.This is because for a multi-node cluster, query uses the hostnames given in nodeServices to connect to kv.    Since kv only allows localhost / 127.0.0.1 / ::1 connections to 11210 - its non ssl port nearly all our internal requests fail   My questions is, should query be using 127.0.0.1 or ::1 here instead of the ip ? Or can kv see that the request is from the local ip so allow such requests ?   what is the general expected behavior from services ? (since i assume other services might do the same thing)
          isha Isha Kandaswamy added a comment - - edited

          Also verify - MB-48004

          MB-47887

          MB-47824

          Update statistics and MB-47955

           

          isha Isha Kandaswamy added a comment - - edited Also verify -  MB-48004 MB-47887 MB-47824 Update statistics and  MB-47955  

          I see this in the logs from indexer - if you see these issues during testing, open an indexer issue. 

          [root@bucketuser logs]# tail -f query.log 
          2021-08-18T11:03:49.123-07:00 [Info] GsiClient::UpdateUsecjson: using collatejson as data format between indexer and GsiClient
          2021-08-18T11:03:49.137-07:00 [Error] transport error between 172.23.99.49:45626->172.23.99.50:9101: write tcp 172.23.99.49:45626->172.23.99.50:9101: write: broken pipe
          2021-08-18T11:03:49.137-07:00 [Error] [GsiScanClient:"172.23.99.50:9101"] d1843df6-aac7-406b-8ce9-a25100a3aa93 request transport failed `write tcp 172.23.99.49:45626->172.23.99.50:9101: write: broken pipe`
          2021-08-18T11:03:49.137-07:00 [Warn] scan failed: requestId d1843df6-aac7-406b-8ce9-a25100a3aa93 queryport 172.23.99.50:9101 inst 13452035488688964811 partition [0]
          2021-08-18T11:03:49.137-07:00 [Warn] Scan failed with error for index 4699942680913860305. Trying scan again with replica, reqId:d1843df6-aac7-406b-8ce9-a25100a3aa93 : write tcp 172.23.99.49:45626->172.23.99.50:9101: write: broken pipe from [172.23.99.50:9101] ...
          2021-08-18T11:03:49.137-07:00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 1. Partition with instances 0 
          2021-08-18T11:03:49.137-07:00 [Warn] Fail to find indexers to satisfy query request. Trying scan again for index 4699942680913860305, reqId:d1843df6-aac7-406b-8ce9-a25100a3
          

          isha Isha Kandaswamy added a comment - I see this in the logs from indexer - if you see these issues during testing, open an indexer issue.  [root @bucketuser logs]# tail -f query.log  2021 - 08 -18T11: 03 : 49.123 - 07 : 00 [Info] GsiClient::UpdateUsecjson: using collatejson as data format between indexer and GsiClient 2021 - 08 -18T11: 03 : 49.137 - 07 : 00 [Error] transport error between 172.23 . 99.49 : 45626 -> 172.23 . 99.50 : 9101 : write tcp 172.23 . 99.49 : 45626 -> 172.23 . 99.50 : 9101 : write: broken pipe 2021 - 08 -18T11: 03 : 49.137 - 07 : 00 [Error] [GsiScanClient: "172.23.99.50:9101" ] d1843df6-aac7-406b-8ce9-a25100a3aa93 request transport failed `write tcp 172.23 . 99.49 : 45626 -> 172.23 . 99.50 : 9101 : write: broken pipe` 2021 - 08 -18T11: 03 : 49.137 - 07 : 00 [Warn] scan failed: requestId d1843df6-aac7-406b-8ce9-a25100a3aa93 queryport 172.23 . 99.50 : 9101 inst 13452035488688964811 partition [ 0 ] 2021 - 08 -18T11: 03 : 49.137 - 07 : 00 [Warn] Scan failed with error for index 4699942680913860305 . Trying scan again with replica, reqId:d1843df6-aac7-406b-8ce9-a25100a3aa93 : write tcp 172.23 . 99.49 : 45626 -> 172.23 . 99.50 : 9101 : write: broken pipe from [ 172.23 . 99.50 : 9101 ] ... 2021 - 08 -18T11: 03 : 49.137 - 07 : 00 [Error] PickRandom: Fail to find indexer for all index partitions. Num partition 1 . Partition with instances 0   2021 - 08 -18T11: 03 : 49.137 - 07 : 00 [Warn] Fail to find indexers to satisfy query request. Trying scan again for index 4699942680913860305 , reqId:d1843df6-aac7-406b-8ce9-a25100a3

          Build couchbase-server-7.0.2-6534 contains go-couchbase commit c5373cc with commit message:
          MB-47891: Use kv ssl port for n2n encryption

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.2-6534 contains go-couchbase commit c5373cc with commit message: MB-47891 : Use kv ssl port for n2n encryption

          Build couchbase-server-7.0.2-6535 contains go-couchbase commit 992c4ca with commit message:
          Revert "MB-47891: Use kv ssl port for n2n encryption"

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.2-6535 contains go-couchbase commit 992c4ca with commit message: Revert " MB-47891 : Use kv ssl port for n2n encryption"

          Build couchbase-server-7.1.0-1165 contains query commit 3e9bf84 with commit message:
          MB-47891: When non ssl ports are disabled use SSL ports for Localnode as well.

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1165 contains query commit 3e9bf84 with commit message: MB-47891 : When non ssl ports are disabled use SSL ports for Localnode as well.

          Build couchbase-server-7.1.0-1165 contains query commit 0126475 with commit message:
          MB-47891: Use kv ssl port for n2n encryption

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1165 contains query commit 0126475 with commit message: MB-47891 : Use kv ssl port for n2n encryption

          For 7.0.2 testing for the same please refer to build from MB-48025

          isha Isha Kandaswamy added a comment - For 7.0.2 testing for the same please refer to build from  MB-48025

          Opened a bug for indexer - MB-48030

          isha Isha Kandaswamy added a comment - Opened a bug for indexer -  MB-48030

          Still seeing issue for select (and insert) with 7.0.2-6535. 

          pierre.regazzoni Pierre Regazzoni added a comment - Still seeing issue for select (and insert) with 7.0.2-6535. 

          See the above comment. This only addresses the fix for 7.1. 

          In 7.1 we moved our code from go-couchbase to the internal primitives/ code and so my changes didn't cause a cyclic dependency with cbauth and go-couchbase. 

          With 7.0 we still use go-couchbase so the changes need to be different. So ive opened MB-48025 to track that. 

          isha Isha Kandaswamy added a comment - See the above comment. This only addresses the fix for 7.1.  In 7.1 we moved our code from go-couchbase to the internal primitives/ code and so my changes didn't cause a cyclic dependency with cbauth and go-couchbase.  With 7.0 we still use go-couchbase so the changes need to be different. So ive opened MB-48025 to track that. 
          isha Isha Kandaswamy added a comment - - edited

          7.0.2 fix in - couchbase-server-7.0.2-6544

          couchbase-server-7.1.0-1170 (can you use this build for 7.1 to verify ? )

           

          isha Isha Kandaswamy added a comment - - edited 7.0.2 fix in - couchbase-server-7.0.2-6544 couchbase-server-7.1.0-1170 (can you use this build for 7.1 to verify ? )  

          Verified on 7.1.0-1170

          pierre.regazzoni Pierre Regazzoni added a comment - Verified on 7.1.0-1170

          People

            pierre.regazzoni Pierre Regazzoni
            pierre.regazzoni Pierre Regazzoni
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty