Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47970

[enforce-tls] Connection::isPacketAvailable(): Invalid packet header detected. Cookies: []

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 7.1.0
    • 7.1.0
    • eventing
    • Centos 7 64 bit; CB 7.1.0-1156

    Description

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/durability_volume.ini rerun=False,GROUP=specific -t ns_server.enforce_tls.EnforceTls.test_all_encrypted_and_non_encrypted_ports,nodes_init=1,bucket_size=100,services_init=kv:n1ql:index:fts:cbas:eventing:backup,GROUP=specific'

      Steps
      1. Create one node cluster with all services

      2021-08-17 02:57:28,677 | test  | INFO    | MainThread | [table_view:display:72] Cluster statistics
      +----------------+----------------------------------------------+-----------------+-----------+----------+-----------------------+-------------------+-----------------------+
      | Node           | Services                                     | CPU_utilization | Mem_total | Mem_free | Swap_mem_used         | Active / Replica  | Version               |
      +----------------+----------------------------------------------+-----------------+-----------+----------+-----------------------+-------------------+-----------------------+
      | 172.23.105.215 | backup, cbas, eventing, fts, index, kv, n1ql | 25.9959141982   | 3.91 GiB  | 3.12 GiB | 141.00 MiB / 3.50 GiB | 63288 / 0         | 7.1.0-1156-enterprise |
      +----------------+----------------------------------------------+-----------------+-----------+----------+-----------------------+-------------------+-----------------------+

      2. Create one couchstore bucket

      +----------------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----+-------+------------+----------+-----------+-----+
      | Bucket                                                                           | Type      | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota  | RAM Used | Disk Used | ARR |
      +----------------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----+-------+------------+----------+-----------+-----+
      | 2T52V7h-RSC1Vn%p6yEJLRaZFEqMi7OnCdBQkzy051BVSJLA0DcYIvS_Nu5vG2XzHRfxjP-28-969000 | couchbase | couchstore      | 1        | none       | 0   | 0     | 100.00 MiB | 1.47 MiB | 8.07 MiB  | 100 |
      +----------------------------------------------------------------------------------+-----------+-----------------+----------+------------+-----+-------+------------+----------+-----------+-----+

      3. Enforce tls

      2021-08-17 02:57:43,345 | test  | INFO    | MainThread | [enforce_tls:set_n2n_encryption_level_on_nodes:38] Enabling n2n encryption and setting level to strict on node ip:172.23.105.215 port:8091 ssh_username:root

      4. Validate that services obey TLS by grepping addresses and non-ssl and tls ports
      5. Validate memacached logs for any traces of error

      02:57:47, Test's start timestamp is 2021-08-17 02:55:42.933000
      2021-08-17 02:58:16,667 | test  | INFO    | MainThread | [basetestcase:check_coredump_exist:888] unwanted messages in /opt/couchbase/var/lib/couchbase/logs/memcached.log.000011.txt
      2021-08-17 02:58:16,667 | test  | CRITICAL | MainThread | [basetestcase:check_coredump_exist:890] 172.23.105.215: Found ' ERROR ' logs - ['2021-08-17T02:57:47.924997-07:00 ERROR 144: Exception occurred during packet execution. Closing connection: Connection::isPacketAvailable(): Invalid packet header detected. Cookies: []\n']

      (this is typically a 7.0.2 test but also run on Neo)

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Another repro:

            {"description":"Rejected an invalid packet","id":20483,"local":{"ip":"127.0.0.1","port":11210},"name":"invalid packet","packet":"0x32 0x30 0x32 0x31 0x2d 0x30 0x38 0x2d 0x31 0x38 0x54 0x31 0x33 0x3a 0x34 0x31 0x3a 0x33 0x30 0x2e 0x38 0x36 0x33 0x2b 0x30 0x31 0x30 0x30 0x20 0x5b 0x57 0x41 0x52 0x4e 0x5d 0x20 0x53 0x74 0x6f 0x72 0x65 0x3a 0x3a 0x4f 0x62 0x73 0x65 0x72 0x76 0x65 0x43 0x68 0x61 0x6e 0x67 0x65 0x73 0x20 0x3a 0x20 0x55 0x6e 0x61 0x62 0x6c 0x65 0x20 0x74 0x6f 0x20 0x6f 0x62 0x73 0x65 0x72 0x76 0x65 0x20 0x6d 0x65 0x74 0x61 0x6b 0x76 0x20 0x75 0x6e 0x65 0x78 0x70 0x65 0x63 0x74 0x65 0x64 0x20 0x45 0x4f 0x46 0x20 0x2e 0x2e 0x52 0x65 0x74 0x72 0x79 0x69 0x6e 0x67 0x0a","real_userid":{"domain":"local","user":"@cbq-engine"},"remote":{"ip":"127.0.0.1","port":57744},"timestamp":"2021-08-18T13:41:30.863316+01:00"}

            -> "2021-08-18T13:41:30.863+0100 [WARN] Store::ObserveChanges : Unable to observe metakv unexpected EOF ..Retrying"

            Eventing log:

            2021-08-18T13:41:29.668+01:00 [Info] ServiceMgr::GetTaskList rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
            2021-08-18T13:41:30.862+0100 [WARN] Store::ObserveChanges : Unable to observe metakv unexpected EOF ..Retrying 
            2021-08-18T13:41:30.862+01:00 [Error] ServiceMgr::initService metakv observe error for setting store, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.863+01:00 [Warn] servicesChangeNotifier: Connection terminated for services notifier instance of http://%40eventing-cbauth@127.0.0.1:8091, default (unexpected EOF). Retrying...
            2021-08-18T13:41:30.864+01:00 [Warn] servicesChangeNotifier: Connection terminated for pool notifier instance of http://%40eventing-cbauth@127.0.0.1:8091, default (unexpected EOF). Retrying... 
            2021-08-18T13:41:30.864+01:00 [Error] ServiceMgr::initService metakv observe error for primary store, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for debugger, err: unexpected EOF. Retrying.
            2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for global config, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for rebalance token, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.864+01:00 [Error] ServiceMgr::initService metakv observe error for temp store, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for settings, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for event handler code, err: unexpected EOF. Retrying...
            2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for apps retry, err: unexpected EOF. Retrying.
            2021-08-18T13:41:30.868+01:00 [Info] Cluster Encryption Settings have been changed by ns server.
            2021-08-18T13:41:30.868+01:00 [Info] Updating node-to-node encryption level: {EncryptData:false DisableNonSSLPorts:false}

            Interesting as the level is in capitals and the Eventing log's copy is at .862 not .863 which suggests two writes of the message.

            Query log:

             

            2021-08-18T13:41:31.022+01:00 [ERROR] Transmit of NOOP failed write tcp 127.0.0.1:57744->127.0.0.1:11210: write: broken pipe 
            2021-08-18T13:41:31.022+01:00 [ERROR] Connection Error: write tcp 127.0.0.1:57744->127.0.0.1:11210: write: broken pipe. Refreshing bucket travel-sample (vbid:970,keys:<ud>[route_10003]</ud>)
            ...
            2021-08-18T13:41:35.617+01:00 [INFO] Updating node-to-node encryption level: {EncryptData:true DisableNonSSLPorts:true}
            

            memcached.log:

             

            2021-08-18T13:41:19.935113+01:00 INFO 114: HELO [GoMemcached] XATTR, XERROR, JSON, SyncReplication, Collections, PreserveTtl [ {"ip":"127.0.0.1","port":57744} - {"ip":"127.0.0.1","port":11210} (not authenticated) ]
            2021-08-18T13:41:19.936250+01:00 INFO 114: Client {"ip":"127.0.0.1","port":57744} authenticated as <ud>@cbq-engine</ud>
            2021-08-18T13:41:30.863414+01:00 ERROR 114: Exception occurred during packet execution. Closing connection: Connection::isPacketAvailable(): Invalid packet header detected. Cookies: []
            

            So this suggests that it was indeed a query connection open on local port 57744.

            However I notice that it seems Eventing's logging.Log is via a C routine that writes to a specific FD that has been cached. I'd have thought best practice would be to open-write-close with logs, preventing the possibility of stale FDs...

            Donald.haggart Donald Haggart added a comment - Another repro: {"description":"Rejected an invalid packet","id":20483,"local":{"ip":"127.0.0.1","port":11210},"name":"invalid packet","packet":"0x32 0x30 0x32 0x31 0x2d 0x30 0x38 0x2d 0x31 0x38 0x54 0x31 0x33 0x3a 0x34 0x31 0x3a 0x33 0x30 0x2e 0x38 0x36 0x33 0x2b 0x30 0x31 0x30 0x30 0x20 0x5b 0x57 0x41 0x52 0x4e 0x5d 0x20 0x53 0x74 0x6f 0x72 0x65 0x3a 0x3a 0x4f 0x62 0x73 0x65 0x72 0x76 0x65 0x43 0x68 0x61 0x6e 0x67 0x65 0x73 0x20 0x3a 0x20 0x55 0x6e 0x61 0x62 0x6c 0x65 0x20 0x74 0x6f 0x20 0x6f 0x62 0x73 0x65 0x72 0x76 0x65 0x20 0x6d 0x65 0x74 0x61 0x6b 0x76 0x20 0x75 0x6e 0x65 0x78 0x70 0x65 0x63 0x74 0x65 0x64 0x20 0x45 0x4f 0x46 0x20 0x2e 0x2e 0x52 0x65 0x74 0x72 0x79 0x69 0x6e 0x67 0x0a","real_userid":{"domain":"local","user":"@cbq-engine"},"remote":{"ip":"127.0.0.1","port":57744},"timestamp":"2021-08-18T13:41:30.863316+01:00"} -> "2021-08-18T13:41:30. 863 +0100 [WARN] Store::ObserveChanges : Unable to observe metakv unexpected EOF ..Retrying" Eventing log: 2021-08-18T13:41:29.668+01:00 [Info] ServiceMgr::GetTaskList rev: service.Revision{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} 2021-08-18T13:41:30.862+0100 [WARN] Store::ObserveChanges : Unable to observe metakv unexpected EOF ..Retrying 2021-08-18T13:41:30.862+01:00 [Error] ServiceMgr::initService metakv observe error for setting store, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.863+01:00 [Warn] servicesChangeNotifier: Connection terminated for services notifier instance of http://%40eventing-cbauth@127.0.0.1:8091, default (unexpected EOF). Retrying... 2021-08-18T13:41:30.864+01:00 [Warn] servicesChangeNotifier: Connection terminated for pool notifier instance of http://%40eventing-cbauth@127.0.0.1:8091, default (unexpected EOF). Retrying... 2021-08-18T13:41:30.864+01:00 [Error] ServiceMgr::initService metakv observe error for primary store, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for debugger, err: unexpected EOF. Retrying. 2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for global config, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for rebalance token, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.864+01:00 [Error] ServiceMgr::initService metakv observe error for temp store, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for settings, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for event handler code, err: unexpected EOF. Retrying... 2021-08-18T13:41:30.864+01:00 [Error] Eventing::main metakv observe error for apps retry, err: unexpected EOF. Retrying. 2021-08-18T13:41:30.868+01:00 [Info] Cluster Encryption Settings have been changed by ns server. 2021-08-18T13:41:30.868+01:00 [Info] Updating node-to-node encryption level: {EncryptData:false DisableNonSSLPorts:false} Interesting as the level is in capitals and the Eventing log's copy is at .862 not .863 which suggests two writes of the message. Query log:   2021-08-18T13:41:31.022+01:00 [ERROR] Transmit of NOOP failed write tcp 127.0.0.1:57744->127.0.0.1:11210: write: broken pipe 2021-08-18T13:41:31.022+01:00 [ERROR] Connection Error: write tcp 127.0.0.1:57744->127.0.0.1:11210: write: broken pipe. Refreshing bucket travel-sample (vbid:970,keys:<ud>[route_10003]</ud>) ... 2021-08-18T13:41:35.617+01:00 [INFO] Updating node-to-node encryption level: {EncryptData:true DisableNonSSLPorts:true} memcached.log:   2021-08-18T13:41:19.935113+01:00 INFO 114: HELO [GoMemcached] XATTR, XERROR, JSON, SyncReplication, Collections, PreserveTtl [ {"ip":"127.0.0.1","port":57744} - {"ip":"127.0.0.1","port":11210} (not authenticated) ] 2021-08-18T13:41:19.936250+01:00 INFO 114: Client {"ip":"127.0.0.1","port":57744} authenticated as <ud>@cbq-engine</ud> 2021-08-18T13:41:30.863414+01:00 ERROR 114: Exception occurred during packet execution. Closing connection: Connection::isPacketAvailable(): Invalid packet header detected. Cookies: [] So this suggests that it was indeed a query connection open on local port 57744. However I notice that it seems Eventing's logging.Log is via a C routine that writes to a specific FD that has been cached. I'd have thought best practice would be to open-write-close with logs, preventing the possibility of stale FDs...
            Donald.haggart Donald Haggart added a comment - - edited

            If we look at eventing-ee/evaluator/impl/engine.go:

             31 func (e *engine) Initialize(cfg map[api.Config]interface{}) api.Fault {                                                         
             32     e.engineLock.Lock()
             33     defer e.engineLock.Unlock()
             34     if e.initialized {
             35         initializederr := fmt.Errorf("Engine already Initialized")
             36         return utils.NewFault(api.EngineAlreadyInitializedErr, initializederr)
             37     }
             38     c := utils.ConvertConfig(cfg)
             39     if initfault := v8wrapper.Init(c.Applog, c.Syslog); initfault != nil {
             40         return initfault
             41     }
             42     e.initialized = true
             43     return nil
             44 }
            

            The Golang os.File members of "c" returned from utils.ConvertConfig(), fall out of scope at the end of Initialize() and so their FDs are freed for reuse. (Since the cfg passed in which is a return from convertAndInjectConfig() also falls out of scope. It of course doesn't matter that these could be open handles to /dev/null to begin with.)

            A minimum change which ought to stop this stale FD use would be to make c a non-local scope variable (just to keep the os.File objects alive).

            (But better to not log via a cached FD still, I'd think.)

            A very shoddily instrumented Toy build:

            --- a/evaluator/worker/src/utils.cc
            +++ b/evaluator/worker/src/utils.cc
            @@ -27,6 +27,7 @@ struct Global {
               int syslog;
               std::mutex applog_lock;
               std::mutex syslog_lock;
            +  std::string settime;
             };
             
             Global global = Global{Info, 0, 1, std::mutex{}, std::mutex{}};
            @@ -50,9 +51,12 @@ std::string level_str(LogLevel lvl) {
             
             void set_loglevel(enum LogLevel level) { global.loglevel = level; }
             
            +std::string timestamp();
            +
             void set_logfd(int applog, int syslog) {
               global.applog = applog;
               global.syslog = syslog;
            +  global.settime = timestamp();
             }
             
             std::string timestamp() {
            @@ -83,6 +87,7 @@ void syslog(enum LogLevel level, const char *format, ...) {
               va_list args;
               va_start(args, format);
               vsnprintf(buf, sizeof(buf)-1, format, args);
            +  os << " [[FD:" << global.syslog << " / set at: " << global.settime << "]] ";
               os << timestamp();
               std::string log_level = level_str(level);
               os << " [" << log_level.c_str() << "] ";
            

            confirms the problem:

             

            {"description":"Rejected an invalid packet","id":20483,"local":{"ip":"127.0.0.1","port":11210},"name":"invalid packet","packet":"0x20 0x5b 0x5b 0x46 0x44 0x3a 0x32 0x34 0x20 0x2f 0x20 0x73 0x65 0x74 0x20 0x61 0x74 0x3a 0x20 0x32 0x30 0x32 0x31 0x2d 0x30 0x38 0x2d 0x31 0x38 0x54 0x31 0x35 0x3a 0x34 0x32 0x3a 0x34 0x36 0x2e 0x33 0x38 0x30 0x2b 0x30 0x31 0x30 0x30 0x5d 0x5d 0x20 0x32 0x30 0x32 0x31 0x2d 0x30 0x38 0x2d 0x31 0x38 0x54 0x31 0x35 0x3a 0x34 0x35 0x3a 0x31 0x35 0x2e 0x37 0x34 0x37 0x2b 0x30 0x31 0x30 0x30 0x20 0x5b 0x57 0x41 0x52 0x4e 0x5d 0x20 0x53 0x74 0x6f 0x72 0x65 0x3a 0x3a 0x4f 0x62 0x73 0x65 0x72 0x76 0x65 0x43 0x68 0x61 0x6e 0x67 0x65 0x73 0x20 0x3a 0x20 0x55 0x6e 0x61 0x62 0x6c 0x65 0x20 0x74 0x6f 0x20 0x6f 0x62 0x73 0x65 0x72 0x76 0x65 0x20 0x6d 0x65 0x74 0x61 0x6b 0x76 0x20 0x75 0x6e 0x65 0x78 0x70 0x65 0x63 0x74 0x65 0x64 0x20 0x45 0x4f 0x46 0x20 0x2e 0x2e 0x52 0x65 0x74 0x72 0x79 0x69 0x6e 0x67 0x0a","real_userid":{"domain":"local","user":"@cbq-engine"},"remote":{"ip":"127.0.0.1","port":49144},"timestamp":"2021-08-18T15:45:15.747986+01:00"}

            -> "[[FD:24 / set at: 2021-08-18T15:42:46.380+0100]] 2021-08-18T15:45:15.747+0100 [WARN]Store::ObserveChanges : Unable to observe metakv unexpected EOF ..Retrying"

            memcached.log:

             

            2021-08-18T15:45:10.762535+01:00 INFO 111: HELO [GoMemcached] XATTR, XERROR, JSON, SyncReplication, Collections, PreserveTtl [ {"ip":"127.0.0.1","port":49144} - {"ip":"127.0.0.1","port":11210} (not authenticated) ]
            2021-08-18T15:45:10.762804+01:00 INFO 111: Client {"ip":"127.0.0.1","port":49144} authenticated as <ud>@cbq-engine</ud>
            2021-08-18T15:45:15.748233+01:00 ERROR 111: Exception occurred during packet execution. Closing connection: Connection::isPacketAvailable(): Invalid packet header detected. Cookies: []
            

             

            So that shows that the Query connection was only established at 45:10.762535 but the FD in use by Eventing was set some minutes earlier at 42:46.380; Query must have be re-assigned that FD.

             

            $ ls -l /proc/`pgrep cbq-engine`/fd
            total 0
            ...
            lrwx------ 1 couchbase couchbase 64 Aug 18 15:50 24 -> 'socket:[177379016]'
            ...

            confirms that FD 24 doesn't point to a file but rather to a socket.

             

            Another way of looking at it, just after restart we have:

            $ ls -l /proc/`pgrep cbq-engine`/fd
            total 0
            lr-x------ 1 couchbase couchbase 64 Aug 18 16:01 0 -> 'pipe:[177525995]'
            l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 1 -> 'pipe:[177525996]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 10 -> 'socket:[177528861]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 11 -> 'socket:[177528862]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 12 -> '/tmp/scan-results499837778 (deleted)'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 13 -> 'socket:[177527048]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 14 -> '/tmp/search-results700054153 (deleted)'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 15 -> 'socket:[177523631]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 16 -> 'socket:[177527049]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 17 -> 'socket:[177523630]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 18 -> 'socket:[177528866]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 19 -> 'socket:[177524173]'
            l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 2 -> 'pipe:[177525997]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 20 -> 'socket:[177527977]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 21 -> 'socket:[177527978]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 22 -> 'socket:[177527981]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 23 -> 'socket:[177527982]'
            l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 24 -> /dev/null
            l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 25 -> /dev/null
            ...

            Once we've run some queries:

            $ ls -l /proc/`pgrep cbq-engine`/fd
            total 0
            lr-x------ 1 couchbase couchbase 64 Aug 18 16:01 0 -> 'pipe:[177525995]'
            l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 1 -> 'pipe:[177525996]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 10 -> 'socket:[177571687]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 11 -> 'socket:[177567682]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 12 -> 'socket:[177572891]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 13 -> 'socket:[177568617]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 14 -> 'socket:[177569163]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 15 -> 'socket:[177523631]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 16 -> 'socket:[177527049]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 17 -> 'socket:[177523630]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 18 -> 'socket:[177528866]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 19 -> 'socket:[177524173]'
            l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 2 -> 'pipe:[177525997]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 20 -> 'socket:[177560652]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 21 -> 'socket:[177561630]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 22 -> 'socket:[177527981]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 23 -> 'socket:[177527982]'
            lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 24 -> 'socket:[177568618]'
            ...

            So the reuse of the FD is clear and would only happen when the tracked Go objects close()d their underlying descriptors.

             

            Given this analysis, I'm removing Query from the picture and leaving this for Eventing to resolve.

            Donald.haggart Donald Haggart added a comment - - edited If we look at eventing-ee/evaluator/impl/engine.go: 31 func (e *engine) Initialize(cfg map[api.Config] interface {}) api.Fault { 32 e.engineLock.Lock() 33 defer e.engineLock.Unlock() 34 if e.initialized { 35 initializederr := fmt.Errorf( "Engine already Initialized" ) 36 return utils.NewFault(api.EngineAlreadyInitializedErr, initializederr) 37 } 38 c := utils.ConvertConfig(cfg) 39 if initfault := v8wrapper.Init(c.Applog, c.Syslog); initfault != nil { 40 return initfault 41 } 42 e.initialized = true 43 return nil 44 } The Golang os.File members of "c" returned from utils.ConvertConfig(), fall out of scope at the end of Initialize() and so their FDs are freed for reuse. (Since the cfg passed in which is a return from convertAndInjectConfig() also falls out of scope. It of course doesn't matter that these could be open handles to /dev/null to begin with.) A minimum change which ought to stop this stale FD use would be to make c a non-local scope variable (just to keep the os.File objects alive). (But better to not log via a cached FD still, I'd think.) A very shoddily instrumented Toy build: --- a/evaluator/worker/src/utils.cc +++ b/evaluator/worker/src/utils.cc @@ - 27 , 6 + 27 , 7 @@ struct Global { int syslog; std::mutex applog_lock; std::mutex syslog_lock; + std::string settime; }; Global global = Global{Info, 0 , 1 , std::mutex{}, std::mutex{}}; @@ - 50 , 9 + 51 , 12 @@ std::string level_str(LogLevel lvl) { void set_loglevel( enum LogLevel level) { global.loglevel = level; } +std::string timestamp(); + void set_logfd( int applog, int syslog) { global.applog = applog; global.syslog = syslog; + global.settime = timestamp(); } std::string timestamp() { @@ - 83 , 6 + 87 , 7 @@ void syslog( enum LogLevel level, const char *format, ...) { va_list args; va_start(args, format); vsnprintf(buf, sizeof(buf)- 1 , format, args); + os << " [[FD:" << global.syslog << " / set at: " << global.settime << "]] " ; os << timestamp(); std::string log_level = level_str(level); os << " [" << log_level.c_str() << "] " ; confirms the problem:   {"description":"Rejected an invalid packet","id":20483,"local":{"ip":"127.0.0.1","port":11210},"name":"invalid packet","packet":"0x20 0x5b 0x5b 0x46 0x44 0x3a 0x32 0x34 0x20 0x2f 0x20 0x73 0x65 0x74 0x20 0x61 0x74 0x3a 0x20 0x32 0x30 0x32 0x31 0x2d 0x30 0x38 0x2d 0x31 0x38 0x54 0x31 0x35 0x3a 0x34 0x32 0x3a 0x34 0x36 0x2e 0x33 0x38 0x30 0x2b 0x30 0x31 0x30 0x30 0x5d 0x5d 0x20 0x32 0x30 0x32 0x31 0x2d 0x30 0x38 0x2d 0x31 0x38 0x54 0x31 0x35 0x3a 0x34 0x35 0x3a 0x31 0x35 0x2e 0x37 0x34 0x37 0x2b 0x30 0x31 0x30 0x30 0x20 0x5b 0x57 0x41 0x52 0x4e 0x5d 0x20 0x53 0x74 0x6f 0x72 0x65 0x3a 0x3a 0x4f 0x62 0x73 0x65 0x72 0x76 0x65 0x43 0x68 0x61 0x6e 0x67 0x65 0x73 0x20 0x3a 0x20 0x55 0x6e 0x61 0x62 0x6c 0x65 0x20 0x74 0x6f 0x20 0x6f 0x62 0x73 0x65 0x72 0x76 0x65 0x20 0x6d 0x65 0x74 0x61 0x6b 0x76 0x20 0x75 0x6e 0x65 0x78 0x70 0x65 0x63 0x74 0x65 0x64 0x20 0x45 0x4f 0x46 0x20 0x2e 0x2e 0x52 0x65 0x74 0x72 0x79 0x69 0x6e 0x67 0x0a","real_userid":{"domain":"local","user":"@cbq-engine"},"remote":{"ip":"127.0.0.1","port":49144},"timestamp":"2021-08-18T15:45:15.747986+01:00"} -> "[ [FD:24 / set at: 2021-08-18T15:42:46.380+0100] ] 2021-08-18T15:45:15.747+0100 [WARN] Store::ObserveChanges : Unable to observe metakv unexpected EOF ..Retrying" memcached.log:   2021-08-18T15:45:10.762535+01:00 INFO 111: HELO [GoMemcached] XATTR, XERROR, JSON, SyncReplication, Collections, PreserveTtl [ {"ip":"127.0.0.1","port":49144} - {"ip":"127.0.0.1","port":11210} (not authenticated) ] 2021-08-18T15:45:10.762804+01:00 INFO 111: Client {"ip":"127.0.0.1","port":49144} authenticated as <ud>@cbq-engine</ud> 2021-08-18T15:45:15.748233+01:00 ERROR 111: Exception occurred during packet execution. Closing connection: Connection::isPacketAvailable(): Invalid packet header detected. Cookies: []   So that shows that the Query connection was only established at 45:10.762535 but the FD in use by Eventing was set some minutes earlier at 42:46.380; Query must have be re-assigned that FD.   $ ls -l /proc/`pgrep cbq-engine`/fd total 0 ... lrwx------ 1 couchbase couchbase 64 Aug 18 15:50 24 -> 'socket:[177379016]' ... confirms that FD 24 doesn't point to a file but rather to a socket.   Another way of looking at it, just after restart we have: $ ls -l /proc/`pgrep cbq-engine`/fd total 0 lr-x------ 1 couchbase couchbase 64 Aug 18 16:01 0 -> 'pipe:[177525995]' l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 1 -> 'pipe:[177525996]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 10 -> 'socket:[177528861]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 11 -> 'socket:[177528862]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 12 -> '/tmp/scan-results499837778 (deleted)' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 13 -> 'socket:[177527048]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 14 -> '/tmp/search-results700054153 (deleted)' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 15 -> 'socket:[177523631]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 16 -> 'socket:[177527049]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 17 -> 'socket:[177523630]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 18 -> 'socket:[177528866]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 19 -> 'socket:[177524173]' l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 2 -> 'pipe:[177525997]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 20 -> 'socket:[177527977]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 21 -> 'socket:[177527978]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 22 -> 'socket:[177527981]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 23 -> 'socket:[177527982]' l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 24 -> /dev/null l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 25 -> /dev/null ... Once we've run some queries: $ ls -l /proc/`pgrep cbq-engine`/fd total 0 lr-x------ 1 couchbase couchbase 64 Aug 18 16:01 0 -> 'pipe:[177525995]' l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 1 -> 'pipe:[177525996]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 10 -> 'socket:[177571687]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 11 -> 'socket:[177567682]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 12 -> 'socket:[177572891]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 13 -> 'socket:[177568617]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 14 -> 'socket:[177569163]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 15 -> 'socket:[177523631]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 16 -> 'socket:[177527049]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 17 -> 'socket:[177523630]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 18 -> 'socket:[177528866]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 19 -> 'socket:[177524173]' l-wx------ 1 couchbase couchbase 64 Aug 18 16:01 2 -> 'pipe:[177525997]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 20 -> 'socket:[177560652]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 21 -> 'socket:[177561630]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 22 -> 'socket:[177527981]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 23 -> 'socket:[177527982]' lrwx------ 1 couchbase couchbase 64 Aug 18 16:01 24 -> 'socket:[177568618]' ... So the reuse of the FD is clear and would only happen when the tracked Go objects close()d their underlying descriptors.   Given this analysis, I'm removing Query from the picture and leaving this for Eventing to resolve.

            Indeed, there was an issue with the scope of config causing GC to close the handle. Sumedh Basarkod Would request a rerun with the latest changed picked up.

            Thanks!

            abhishek.jindal Abhishek Jindal added a comment - Indeed, there was an issue with the scope of config causing GC to close the handle. Sumedh Basarkod Would request a rerun with the latest changed picked up. Thanks!

            Build couchbase-server-7.1.0-1278 contains eventing-ee commit df0eb44 with commit message:
            MB-47970 : Push config to engine scope

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1278 contains eventing-ee commit df0eb44 with commit message: MB-47970 : Push config to engine scope

            Did not see this again upon reruns after the fix. Closing. 

            sumedh.basarkod Sumedh Basarkod added a comment - Did not see this again upon reruns after the fix. Closing. 

            People

              sumedh.basarkod Sumedh Basarkod
              sumedh.basarkod Sumedh Basarkod
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty