Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46003

Cbcollect fails to gather stats

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • master
    • 7.0.0
    • ns_server
    • None
    • couchbase-server-7.0.0-5062 Centos7
    • Untriaged
    • 1
    • Unknown

    Description

      10.112.211.1Steps to reproduce:

      • Install couchbase-server-7.0.0-5062 rpm on clean vagrant (or other)
      • set up single node cluster
      • create bucket
      • collect logs
      • observe logs

      stats.log indicates cbstats was not provided a valid host string

      stats.log

      ==============================================================================
      memcached stats all
      cbstats -a 127.0.0.1: all -u
      ==============================================================================
      Invalid format for host string: '127.0.0.1:'
      ==============================================================================
      memcached stats checkpoint
      cbstats -a 127.0.0.1: checkpoint -u
      ==============================================================================
      Invalid format for host string: '127.0.0.1:'
      ==============================================================================
      memcached stats collections
      cbstats -a 127.0.0.1: collections -u
      ==============================================================================
      Invalid format for host string: '127.0.0.1:'
      ==============================================================================
      memcached stats config
      cbstats -a 127.0.0.1: config -u
      ==============================================================================
      Invalid format for host string: '127.0.0.1:'
      ==============================================================================
      

      Prometheus metrics have also not been collected:

      cbcollect_info.log

      [2021-04-29T14:49:03.556541+00:00] Failed to create prometheus snapshot: 7
      [2021-04-29T14:49:03.556599+00:00] Error: unable to retrieve statistics
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            dfinlay Dave Finlay added a comment -

            James Harrison: could you zip up the logs directory and attach?

            dfinlay Dave Finlay added a comment - James Harrison : could you zip up the logs directory and attach?
            james.harrison James Harrison added a comment - - edited

            Here's the collected logs from a fresh repro [^collectinfo-2021-04-29T155923-ns_1@127.0.0.1.zip] and I've kept the node up in case there's anything else that would be useful to gather.

            james.harrison James Harrison added a comment - - edited Here's the collected logs from a fresh repro  [^collectinfo-2021-04-29T155923-ns_1@127.0.0.1.zip] and I've kept the node up in case there's anything else that would be useful to gather.
            dfinlay Dave Finlay added a comment -

            Just a guess but could be connected to MB-45867.

            dfinlay Dave Finlay added a comment - Just a guess but could be connected to MB-45867 .
            bryan.mccoid Bryan McCoid added a comment -

            Here's what happened when I ran this using the steps described above.. Notable sections:

            CPU throttling info (echo /sys/devices/system/cpu/cpu*/thermal_throttle/* | xargs -n1 -- sh -c 'echo $1; cat $1' --) - Exit code 123
            Raw PID 1 scheduler /proc/1/sched (cat /proc/1/sched | head -n 1) - OK
            Raw PID 1 control groups /proc/1/cgroup (cat /proc/1/cgroup) - OK
            Found dump-guts: /opt/couchbase/bin/dump-guts
            Checking for server guts in /opt/couchbase/var/lib/couchbase/initargs...
            Error occurred getting server guts: Got exception: {error,function_clause}
            [{'dump-guts__escript__1619__737778__405448__6',
                 '-main_with_everything/4-lc$^0/1-0-',
                 ['_deleted'],
                 [{file,"/opt/couchbase/bin/dump-guts"},{line,572}]},
             {'dump-guts__escript__1619__737778__405448__6',main_with_everything,4,
                 [{file,"/opt/couchbase/bin/dump-guts"},{line,572}]},
             {'dump-guts__escript__1619__737778__405448__6',main,1,
                 [{file,"/opt/couchbase/bin/dump-guts"},{line,136}]},
             {escript,run,2,[{file,"escript.erl"},{line,758}]},
             {escript,start,1,[{file,"escript.erl"},{line,277}]},
             {init,start_em,1,[]},
             {init,do_boot,3,[]}]
            Found dump-guts: /opt/couchbase/bin/dump-guts
            Checking for server guts in /opt/couchbase/var/lib/couchbase/initargs...
            Error occurred getting server guts: Got exception: {error,function_clause}
            [{'dump-guts__escript__1619__737778__784948__5',
                 '-main_with_everything/4-lc$^0/1-0-',
                 ['_deleted'],
                 [{file,"/opt/couchbase/bin/dump-guts"},{line,572}]},
             {'dump-guts__escript__1619__737778__784948__5',main_with_everything,4,
                 [{file,"/opt/couchbase/bin/dump-guts"},{line,572}]},
             {'dump-guts__escript__1619__737778__784948__5',main,1,
                 [{file,"/opt/couchbase/bin/dump-guts"},{line,136}]},
             {escript,run,2,[{file,"escript.erl"},{line,758}]},
             {escript,start,1,[{file,"escript.erl"},{line,277}]},
             {init,start_em,1,[]},
             {init,do_boot,3,[]}]
            Found dump-guts: /opt/couchbase/bin/dump-guts
            initargs file '/root/Library/Application Support/Couchbase/var/lib/couchbase/initargs' does not exist
            Couldn't read server guts. Using some default values.
            Adding persistent buckets '['new-fake-bucket-1']' to server guts
            I/O error(2): No such file or directory
            curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refusedproduct diag header () - OK
            Directory structure (['ls', '-lRai', '/']) - Exit code 1
            Database directory structure (['ls', '-lRai', '/opt/couchbase/var/lib/couchbase/data']) - OK
            

            And also:

            mdocs for new-fake-bucket-1 (/opt/couchbase/var/lib/couchbase/data/new-fake-bucket-1) (['magma_dump', '/opt/couchbase/var/lib/couchbase/data/new-fake-bucket-1', '--cbcollect']) - OK
            mctimings [] (['mctimings', '-u', '', '-h', '127.0.0.1:', '-a', '-v']) - Exit code 1
            mctimings ['subdoc_execute', 'snappy_decompress', 'json_validate'] (['mctimings', '-u', '', '-h', '127.0.0.1:', '-a', '-v', 'subdoc_execute', 'snappy_decompress', 'json_validate']) - Exit code 1
            Users storage () - Failed to collect file '': [Errno 2] No such file or directory: ''
            OK
            Dist configuration (dist_cfg) () - Failed to collect file '': [Errno 2] No such file or directory: ''
            OK
            Memcached cert (memcached-cert.pem) () - Failed to collect file '': [Errno 2] No such file or directory: ''
            OK
            Local SSL cert (local-ssl-cert.pem) () - Failed to collect file '': [Errno 2] No such file or directory: ''
            OK
            NS Log () - OK
            Phosphor Trace (['kv_trace_dump', '-H', '127.0.0.1:', '-u', '', 'kv_trace.json']) - Exit code 1
            I/O error(2): No such file or directory
            Failed to create prometheus snapshot: 7
            Error: unable to retrieve statistics
            cbcollect_info log () - OK
            

            Not sure what the cause is yet but wanted to confirm that I can reproduce this issue. collection-2021-04-29T190825-0400.zip

            bryan.mccoid Bryan McCoid added a comment - Here's what happened when I ran this using the steps described above.. Notable sections: CPU throttling info (echo /sys/devices/system/cpu/cpu*/thermal_throttle/* | xargs -n1 -- sh -c 'echo $1; cat $1' --) - Exit code 123 Raw PID 1 scheduler /proc/ 1 /sched (cat /proc/ 1 /sched | head -n 1 ) - OK Raw PID 1 control groups /proc/ 1 /cgroup (cat /proc/ 1 /cgroup) - OK Found dump-guts: /opt/couchbase/bin/dump-guts Checking for server guts in /opt/couchbase/var/lib/couchbase/initargs... Error occurred getting server guts: Got exception: {error,function_clause} [{ 'dump-guts__escript__1619__737778__405448__6' , '-main_with_everything/4-lc$^0/1-0-' , [ '_deleted' ], [{file, "/opt/couchbase/bin/dump-guts" },{line, 572 }]}, { 'dump-guts__escript__1619__737778__405448__6' ,main_with_everything, 4 , [{file, "/opt/couchbase/bin/dump-guts" },{line, 572 }]}, { 'dump-guts__escript__1619__737778__405448__6' ,main, 1 , [{file, "/opt/couchbase/bin/dump-guts" },{line, 136 }]}, {escript,run, 2 ,[{file, "escript.erl" },{line, 758 }]}, {escript,start, 1 ,[{file, "escript.erl" },{line, 277 }]}, {init,start_em, 1 ,[]}, {init,do_boot, 3 ,[]}] Found dump-guts: /opt/couchbase/bin/dump-guts Checking for server guts in /opt/couchbase/var/lib/couchbase/initargs... Error occurred getting server guts: Got exception: {error,function_clause} [{ 'dump-guts__escript__1619__737778__784948__5' , '-main_with_everything/4-lc$^0/1-0-' , [ '_deleted' ], [{file, "/opt/couchbase/bin/dump-guts" },{line, 572 }]}, { 'dump-guts__escript__1619__737778__784948__5' ,main_with_everything, 4 , [{file, "/opt/couchbase/bin/dump-guts" },{line, 572 }]}, { 'dump-guts__escript__1619__737778__784948__5' ,main, 1 , [{file, "/opt/couchbase/bin/dump-guts" },{line, 136 }]}, {escript,run, 2 ,[{file, "escript.erl" },{line, 758 }]}, {escript,start, 1 ,[{file, "escript.erl" },{line, 277 }]}, {init,start_em, 1 ,[]}, {init,do_boot, 3 ,[]}] Found dump-guts: /opt/couchbase/bin/dump-guts initargs file '/root/Library/Application Support/Couchbase/var/lib/couchbase/initargs' does not exist Couldn't read server guts. Using some default values. Adding persistent buckets '[' new -fake-bucket- 1 ']' to server guts I/O error( 2 ): No such file or directory curl: ( 7 ) Failed to connect to 127.0 . 0.1 port 80 : Connection refusedproduct diag header () - OK Directory structure ([ 'ls' , '-lRai' , '/' ]) - Exit code 1 Database directory structure ([ 'ls' , '-lRai' , '/opt/couchbase/var/lib/couchbase/data' ]) - OK And also: mdocs for new -fake-bucket- 1 (/opt/couchbase/var/lib/couchbase/data/ new -fake-bucket- 1 ) ([ 'magma_dump' , '/opt/couchbase/var/lib/couchbase/data/new-fake-bucket-1' , '--cbcollect' ]) - OK mctimings [] ([ 'mctimings' , '-u' , '' , '-h' , '127.0.0.1:' , '-a' , '-v' ]) - Exit code 1 mctimings [ 'subdoc_execute' , 'snappy_decompress' , 'json_validate' ] ([ 'mctimings' , '-u' , '' , '-h' , '127.0.0.1:' , '-a' , '-v' , 'subdoc_execute' , 'snappy_decompress' , 'json_validate' ]) - Exit code 1 Users storage () - Failed to collect file '' : [Errno 2 ] No such file or directory: '' OK Dist configuration (dist_cfg) () - Failed to collect file '' : [Errno 2 ] No such file or directory: '' OK Memcached cert (memcached-cert.pem) () - Failed to collect file '' : [Errno 2 ] No such file or directory: '' OK Local SSL cert (local-ssl-cert.pem) () - Failed to collect file '' : [Errno 2 ] No such file or directory: '' OK NS Log () - OK Phosphor Trace ([ 'kv_trace_dump' , '-H' , '127.0.0.1:' , '-u' , '' , 'kv_trace.json' ]) - Exit code 1 I/O error( 2 ): No such file or directory Failed to create prometheus snapshot: 7 Error: unable to retrieve statistics cbcollect_info log () - OK Not sure what the cause is yet but wanted to confirm that I can reproduce this issue.  collection-2021-04-29T190825-0400.zip

            Seeing this issue in Windows in MB-46013 as well where we do request but missing the port, I assume it may be related

            ==============================================================================
            Chronicle config
            curl -sS -X POST --proxy  -K- http://127.0.0.1:/diag/eval
            ==============================================================================
            curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
            ==============================================================================
            ale configuration
            curl -sS --proxy  -K- http://127.0.0.1:/diag/ale
            ==============================================================================
            curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
            ==============================================================================
            

            carlos.gonzalez Carlos Gonzalez Betancort (Inactive) added a comment - Seeing this issue in Windows in MB-46013 as well where we do request but missing the port, I assume it may be related ============================================================================== Chronicle config curl -sS -X POST --proxy -K- http://127.0.0.1:/diag/eval ============================================================================== curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused ============================================================================== ale configuration curl -sS --proxy -K- http://127.0.0.1:/diag/ale ============================================================================== curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused ==============================================================================
            bryan.mccoid Bryan McCoid added a comment -

            This is fixed by: af8c485b9c2f18a1bf067ae250aa3a05083b74e3 and I've verified that it does, in fact, work in build 5076. 

            bryan.mccoid Bryan McCoid added a comment - This is fixed by: af8c485b9c2f18a1bf067ae250aa3a05083b74e3 and I've verified that it does, in fact, work in build 5076. 

            Verified on 7.0.0-5076 by examining the stats.log and cbcollect_info.log. Closing.

            sumedh.basarkod Sumedh Basarkod (Inactive) added a comment - Verified on 7.0.0-5076 by examining the stats.log and cbcollect_info.log. Closing.

            People

              sumedh.basarkod Sumedh Basarkod (Inactive)
              james.harrison James Harrison
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty