Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4824

memcached silently closes connection in auth request [was: ns_server: throws exception in ebucketmigrator_srv during rebalance test]

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0-developer-preview-4
    • Fix Version/s: 2.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Environment:
      cluster-run 6 nodes.

      Description

      Rebalance-in while inserting spatial docs causes rebalance to fail. During the time of failure, I see the following in the diags:

      [error_logger:error] [2012-02-21 10:49:04] [n_0@10.17.4.51:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <0.25874.6>
      registered_name: []
      exception error: no match of right hand side value

      {error,closed}

      in function mc_client_binary:cmd_binary_vocal_recv/5 (src/mc_client_binary.erl, line 103)
      in call from mc_client_binary:auth/2 (src/mc_client_binary.erl, line 145)
      in call from ebucketmigrator_srv:connect/4 (src/ebucketmigrator_srv.erl, line 281)
      in call from ebucketmigrator_srv:init/1 (src/ebucketmigrator_srv.erl, line 133)
      ancestors: [<0.25873.6>]
      messages: []
      links: Port<0.454477>,<0.25873.6>,#Port<0.454476>
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 987
      stack_size: 24
      reductions: 960
      neighbours:

      SpatialRebalanceTests.test_insert_x_docs_during_rebalance

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        this error means that source node closed connection when we tried to establish tap connection. I see no traces of ns_server-side issues that could lead to this. We honestly did what we're normally doing.

        BTW, thumbs up for using R15 which provides line numbers in backtrace.

        So it closes connection when SASL auth is performed. Just after establishing tcp connection and prior to selecting bucket. And much prior to sending TAP_CONNECT command.

        This could be something OS-related (like exhausting file descriptors in memcached for example).

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - this error means that source node closed connection when we tried to establish tap connection. I see no traces of ns_server-side issues that could lead to this. We honestly did what we're normally doing. BTW, thumbs up for using R15 which provides line numbers in backtrace. So it closes connection when SASL auth is performed. Just after establishing tcp connection and prior to selecting bucket. And much prior to sending TAP_CONNECT command. This could be something OS-related (like exhausting file descriptors in memcached for example).
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Assigning to our memcached & bucket engine guy.

        Trond, ns_server is using bucket name and password in AUTH request. But that's bucket default with forcibly empty password. So it should not be AUTH failure. Even if it is, it should not be intermittent and silent (i.e. no error is returned but socket just closed).

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Assigning to our memcached & bucket engine guy. Trond, ns_server is using bucket name and password in AUTH request. But that's bucket default with forcibly empty password. So it should not be AUTH failure. Even if it is, it should not be intermittent and silent (i.e. no error is returned but socket just closed).
        Hide
        trond Trond Norbye added a comment -

        I'll look into this today

        Show
        trond Trond Norbye added a comment - I'll look into this today
        Hide
        trond Trond Norbye added a comment -

        Are you absolutely sure it's not sending any data before the socket is closed? I just added plaintext sasl support for mcstat and ran memcached like:

        export ISASL_PWFILE=/Users/trondnorbye/compile/MB-4824/memcached/isasl.pw
        (isasl.pw contains:
        default
        )

        When I connect ./mcstat -u default -h localhost:11211 -P ' '
        It receives an error package before the socket is closed (as memcached prints out
        ./memcached -vvv -S
        Loaded isasl db from /Users/trondnorbye/compile/MB-4824/memcached/isasl.pw
        Initialized SASL.
        isasl checking DB every 60s
        Loaded engine: Default engine v0.1
        Supplying the following features: LRU, compare and swap
        <22 server listening (binary)
        <23 server listening (binary)
        <24 send buffer was 9216, now 3728270
        <24 server listening (udp)
        <25 send buffer was 9216, now 3728270
        <24 server listening (udp)
        <25 server listening (udp)
        <25 server listening (udp)
        <24 server listening (udp)
        <24 server listening (udp)
        <25 server listening (udp)
        <25 server listening (udp)
        23 - Running task: (conn_listening)
        <26 new binary client connection.
        26 - Running task: (conn_new_cmd)
        26: going from conn_new_cmd to conn_waiting
        26 - Running task: (conn_waiting)
        26: going from conn_waiting to conn_read
        26 - Running task: (conn_read)
        26: going from conn_read to conn_parse_cmd
        26 - Running task: (conn_parse_cmd)
        >26 Read binary protocol data:
        >26 0x80 0x21 0x00 0x05
        >26 0x00 0x00 0x00 0x00
        >26 0x00 0x00 0x00 0x10
        >26 0x00 0x00 0x00 0x00
        >26 0x00 0x00 0x00 0x00
        >26 0x00 0x00 0x00 0x00

        26: authenticated() in cmd 0x21 is true
        26: going from conn_parse_cmd to conn_nread
        26 - Running task: (conn_nread)
        26 - Running task: (conn_nread)
        26 - Running task: (conn_nread)
        26 - Running task: (conn_nread)
        26: mech: ``PLAIN'' with 11 bytes of data
        26: sasl result code: -1
        26: Unknown sasl response: -1
        >26 Writing an error: Auth failure
        <26 Writing bin response:
        <26 0x81 0x21 0x00 0x00
        <26 0x00 0x00 0x00 0x20
        <26 0x00 0x00 0x00 0x0c
        <26 0x00 0x00 0x00 0x00
        <26 0x00 0x00 0x00 0x00
        <26 0x00 0x00 0x00 0x00

        26: going from conn_nread to conn_mwrite
        26 - Running task: (conn_mwrite)
        26 - Running task: (conn_mwrite)
        26: going from conn_mwrite to conn_new_cmd
        26 - Running task: (conn_new_cmd)
        26: going from conn_new_cmd to conn_waiting
        26 - Running task: (conn_waiting)
        26: going from conn_waiting to conn_read
        26 - Running task: (conn_read)
        26: going from conn_read to conn_closing
        26 - Running task: (conn_closing)
        -1: going from conn_closing to conn_immediate_close
        -1 - Running task: (conn_immediate_close)
        Immediate close of 0x7ff80053d6c0

        Could it be that the code detects that the socket is closed and doesn't read out the error message? If no, how can I reproduce this issue?

        Show
        trond Trond Norbye added a comment - Are you absolutely sure it's not sending any data before the socket is closed? I just added plaintext sasl support for mcstat and ran memcached like: export ISASL_PWFILE=/Users/trondnorbye/compile/ MB-4824 /memcached/isasl.pw (isasl.pw contains: default ) When I connect ./mcstat -u default -h localhost:11211 -P ' ' It receives an error package before the socket is closed (as memcached prints out ./memcached -vvv -S Loaded isasl db from /Users/trondnorbye/compile/ MB-4824 /memcached/isasl.pw Initialized SASL. isasl checking DB every 60s Loaded engine: Default engine v0.1 Supplying the following features: LRU, compare and swap <22 server listening (binary) <23 server listening (binary) <24 send buffer was 9216, now 3728270 <24 server listening (udp) <25 send buffer was 9216, now 3728270 <24 server listening (udp) <25 server listening (udp) <25 server listening (udp) <24 server listening (udp) <24 server listening (udp) <25 server listening (udp) <25 server listening (udp) 23 - Running task: (conn_listening) <26 new binary client connection. 26 - Running task: (conn_new_cmd) 26: going from conn_new_cmd to conn_waiting 26 - Running task: (conn_waiting) 26: going from conn_waiting to conn_read 26 - Running task: (conn_read) 26: going from conn_read to conn_parse_cmd 26 - Running task: (conn_parse_cmd) >26 Read binary protocol data: >26 0x80 0x21 0x00 0x05 >26 0x00 0x00 0x00 0x00 >26 0x00 0x00 0x00 0x10 >26 0x00 0x00 0x00 0x00 >26 0x00 0x00 0x00 0x00 >26 0x00 0x00 0x00 0x00 26: authenticated() in cmd 0x21 is true 26: going from conn_parse_cmd to conn_nread 26 - Running task: (conn_nread) 26 - Running task: (conn_nread) 26 - Running task: (conn_nread) 26 - Running task: (conn_nread) 26: mech: ``PLAIN'' with 11 bytes of data 26: sasl result code: -1 26: Unknown sasl response: -1 >26 Writing an error: Auth failure <26 Writing bin response: <26 0x81 0x21 0x00 0x00 <26 0x00 0x00 0x00 0x20 <26 0x00 0x00 0x00 0x0c <26 0x00 0x00 0x00 0x00 <26 0x00 0x00 0x00 0x00 <26 0x00 0x00 0x00 0x00 26: going from conn_nread to conn_mwrite 26 - Running task: (conn_mwrite) 26 - Running task: (conn_mwrite) 26: going from conn_mwrite to conn_new_cmd 26 - Running task: (conn_new_cmd) 26: going from conn_new_cmd to conn_waiting 26 - Running task: (conn_waiting) 26: going from conn_waiting to conn_read 26 - Running task: (conn_read) 26: going from conn_read to conn_closing 26 - Running task: (conn_closing) -1: going from conn_closing to conn_immediate_close -1 - Running task: (conn_immediate_close) Immediate close of 0x7ff80053d6c0 Could it be that the code detects that the socket is closed and doesn't read out the error message? If no, how can I reproduce this issue?
        Hide
        trond Trond Norbye added a comment -

        Please let me know how to reproduce if you're absolutely sure that you're not getting an error message

        Show
        trond Trond Norbye added a comment - Please let me know how to reproduce if you're absolutely sure that you're not getting an error message
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Yes I am absolutely sure it is not being sent auth reply. It could be unrelated to AUTH, because error is seemingly intermittent.

        And apparently the way to reproduce it is by running testrunner test that was mentioned as part of bug report.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Yes I am absolutely sure it is not being sent auth reply. It could be unrelated to AUTH, because error is seemingly intermittent. And apparently the way to reproduce it is by running testrunner test that was mentioned as part of bug report.
        Hide
        trond Trond Norbye added a comment -

        If you spawn memcached with more logging you should see when it disconnects you and you can map that with the error you're seeing.. try add -vvv (but you probably need to extend the logger to not drop lines...)

        Show
        trond Trond Norbye added a comment - If you spawn memcached with more logging you should see when it disconnects you and you can map that with the error you're seeing.. try add -vvv (but you probably need to extend the logger to not drop lines...)
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        results in a rebalance hung

        Show
        farshid Farshid Ghods (Inactive) added a comment - results in a rebalance hung
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        no it does not. If rebalance hungs anywhere it's a different bug.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - no it does not. If rebalance hungs anywhere it's a different bug.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        We've seen very similar issue in 1.8.1 recently. I've added better diagnostics and hopefully will soon find out whats wrong

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - We've seen very similar issue in 1.8.1 recently. I've added better diagnostics and hopefully will soon find out whats wrong
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        there are couple of other 1.8.1 bugs related to this issue.
        please close if this is a dupe

        Show
        farshid Farshid Ghods (Inactive) added a comment - there are couple of other 1.8.1 bugs related to this issue. please close if this is a dupe
        Hide
        chiyoung Chiyoung Seo added a comment -

        These issues are basically caused by the same reason.

        http://www.couchbase.com/issues/browse/MB-5343

        Show
        chiyoung Chiyoung Seo added a comment - These issues are basically caused by the same reason. http://www.couchbase.com/issues/browse/MB-5343

          People

          • Assignee:
            chiyoung Chiyoung Seo
            Reporter:
            tommie Tommie McAfee
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Due:
              Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes