Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-9331

The key that not existed in bucket ops is very high

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.0.1
    • Fix Version/s: bug-backlog
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Environment:
    • Triage:
      Untriaged
    • Operating System:
      Centos 64-bit
    • Flagged:
      Impediment

      Description

      My cluster has three buckets,and 8 nodes. In one bucket, one node has a strange phenomenon´╝Ü
      1. The ops of the key not exists in bucket is very high, and it up to 1.1k/s, see also in attach1.jpg;
      2. Only one node has the exception, other nodes is ok, see also in attach2.jpg;
      3. The exception node's status see in attach3.jpg;
      4. The exception node VBUCKET RESOURCES see in attach4.jpg;
      5. When I set the exception key in the bucket by manual, the ops is also up to 1.1k/s, and then delete it from bucket, the ops also not down;

      1. cbpkg_exception.log
        14 kB
        hupantingxue
      2. memcached.log.tar.gz
        2.08 MB
        hupantingxue
      1. attach1.jpg
        34 kB
      2. attach2.jpg
        47 kB
      3. attach3.jpg
        139 kB
      4. attach4.jpg
        112 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        hupantingxue hupantingxue added a comment - - edited

        I fail over the exception node(192.168.250.174) from the cluster, and then the exception changed to another node, which is 192.168.250.154, the execption key also changed to "c216db14ced919595f4d8469_5cbc91ea963213de9852a82a5eebf32d8433e953b185456a24783bdbaca67746" existed in the bucket.
        What can I do for this issue?

        Show
        hupantingxue hupantingxue added a comment - - edited I fail over the exception node(192.168.250.174) from the cluster, and then the exception changed to another node, which is 192.168.250.154, the execption key also changed to "c216db14ced919595f4d8469_5cbc91ea963213de9852a82a5eebf32d8433e953b185456a24783bdbaca67746" existed in the bucket. What can I do for this issue?
        Hide
        hupantingxue hupantingxue added a comment -

        some couchbase logs

        Show
        hupantingxue hupantingxue added a comment - some couchbase logs
        Hide
        perry Perry Krug added a comment -

        This monitoring of the key is coming directly from your application requesting it. Whether it exists or not, your application is reading or writing to it that many times and therefore the server is reporting as such. I would say to track down this key in your application if you feel it is an inappropriate amount of traffic to it.

        Show
        perry Perry Krug added a comment - This monitoring of the key is coming directly from your application requesting it. Whether it exists or not, your application is reading or writing to it that many times and therefore the server is reporting as such. I would say to track down this key in your application if you feel it is an inappropriate amount of traffic to it.
        Hide
        hupantingxue hupantingxue added a comment - - edited

        @Perry Krug ,Thanks in advance first!
        I had investigated the situation you said before the question posted.
        1. The application has no such reading or writing log about the key, while other keys can be found.
        2. I logined to the exception node(192.168.250.174) and got the network connection of the port 11209~11213, there is no other application connect to the port except my application.
        As what I said in the last comment, I failed over the node(192.168.250.174) , and the execption node changed to another node(192.168.250.154), while the exception key also changed.

        Show
        hupantingxue hupantingxue added a comment - - edited @Perry Krug ,Thanks in advance first! I had investigated the situation you said before the question posted. 1. The application has no such reading or writing log about the key, while other keys can be found. 2. I logined to the exception node(192.168.250.174) and got the network connection of the port 11209~11213, there is no other application connect to the port except my application. As what I said in the last comment, I failed over the node(192.168.250.174) , and the execption node changed to another node(192.168.250.154), while the exception key also changed.
        Hide
        perry Perry Krug added a comment -

        Sorry for the delay in getting back to you, I thought I had put an update in and it appears that I did not.

        Keep in mind that this Jira interface is for our internal bug tracking of the project and is not meant to be used for troubleshooting or Q&A. I would strongly suggest you post this question to our community forums: http://www.couchbase.com/communities/

        I am 99% positive that Couchbase is correctly reporting that it is receiving requests for this key. When you failed over the node, the active location of that key changed to the other server, and therefore that server began reporting the presence of these requests...which further strengthens my belief that it is the application sending it.

        Can you take a packet capture on this server? Wireshark should be able to decode the Couchbase packet and you can search for the key name. I'm very confident that you will see some process in your application making these requests.

        Perry

        Show
        perry Perry Krug added a comment - Sorry for the delay in getting back to you, I thought I had put an update in and it appears that I did not. Keep in mind that this Jira interface is for our internal bug tracking of the project and is not meant to be used for troubleshooting or Q&A. I would strongly suggest you post this question to our community forums: http://www.couchbase.com/communities/ I am 99% positive that Couchbase is correctly reporting that it is receiving requests for this key. When you failed over the node, the active location of that key changed to the other server, and therefore that server began reporting the presence of these requests...which further strengthens my belief that it is the application sending it. Can you take a packet capture on this server? Wireshark should be able to decode the Couchbase packet and you can search for the key name. I'm very confident that you will see some process in your application making these requests. Perry
        Hide
        javen Javen Fang added a comment -

        @Perry:
        Thank you for your reply. For your suggestion, we will publish the issue to forum soon.

        For your explanation, it seems you had missed some words -
        "As what I said in the last comment, I failed over the node(192.168.250.174) , and the execption node changed to another node(192.168.250.154), while the exception key also changed."

        @hupantingxue already pointed out that - when failed the node, the loading goes to another node, BUT, the key changed - NOT the same key.
        So we thought this is Couchbase internal bug.

        Show
        javen Javen Fang added a comment - @Perry: Thank you for your reply. For your suggestion, we will publish the issue to forum soon. For your explanation, it seems you had missed some words - "As what I said in the last comment, I failed over the node(192.168.250.174) , and the execption node changed to another node(192.168.250.154), while the exception key also changed." @hupantingxue already pointed out that - when failed the node, the loading goes to another node, BUT, the key changed - NOT the same key. So we thought this is Couchbase internal bug.
        Hide
        hupantingxue hupantingxue added a comment -

        @Perry:
        Thanks in advance for your reply.

        I had post the question to your community forums: http://www.couchbase.com/communities/q-and-a/key-not-existed-bucket-whose-ops-very-high as your suggestion.

        I will also take a packet capture on this excepthon server.

        Show
        hupantingxue hupantingxue added a comment - @Perry: Thanks in advance for your reply. I had post the question to your community forums: http://www.couchbase.com/communities/q-and-a/key-not-existed-bucket-whose-ops-very-high as your suggestion. I will also take a packet capture on this excepthon server.
        Hide
        hupantingxue hupantingxue added a comment - - edited

        @Perry:
        Thank you!
        I took a packet capture just now, and find an unknown process on 192.168.250.164.61263 write the exception key "c216db14ced919595f4d8469_5cbc91ea963213de9852a82a5eebf32d8433e953b185456a24783bdbaca67746", and I killed the process, the ops down to the normal level.
        I restart the unknown process, the ops is also normal.

        The package is as below, and I wondered if there is a bug, that caused Infinite loop? What 's the meaning of the response of the server(192.168.250.154) that is "I'm.not.responsible.for.this.vbucket"?

        16:32:12.051868 IP (tos 0x0, ttl 64, id 8127, offset 0, flags [DF], proto TCP (6), length 182)
        192.168.250.164.61263 > 192.168.250.154.11210: Flags [P.], cksum 0x20d7 (correct), seq 650:780, ack 301, win 32, options [nop,no
        p,TS val 1648945843 ecr 993200916], length 130
        0x0000: 4500 00b6 1fbf 4000 4006 a3f2 c0a8 faa4 E.....@.@.......
        0x0010: c0a8 fa9a ef4f 2bca 2320 bd9c aeae 5a5d .....O+.#.....Z]
        0x0020: 8018 0020 20d7 0000 0101 080a 6248 eab3 ............bH..
        0x0030: 3b33 0b14 8001 0059 0800 033a 0000 006a ;3.....Y...:...j
        0x0040: c72c c765 0000 0000 0000 0000 0000 0000 .,.e............
        0x0050: 0000 0000 6332 3136 6462 3134 6365 6439 ....c216db14ced9
        0x0060: 3139 3539 3566 3464 3834 3639 5f35 6362 19595f4d8469_5cb
        0x0070: 6339 3165 6139 3633 3231 3364 6539 3835 c91ea963213de985
        0x0080: 3261 3832 6135 6565 6266 3332 6438 3433 2a82a5eebf32d843
        0x0090: 3365 3935 3362 3138 3534 3536 6132 3437 3e953b185456a247
        0x00a0: 3833 6264 6261 6361 3637 3734 3631 3630 83bdbaca67746160
        0x00b0: 3238 3532 3838 285288

        16:32:12.051960 IP (tos 0x0, ttl 64, id 55278, offset 0, flags [DF], proto TCP (6), length 112)
        192.168.250.154.11210 > 192.168.250.164.61263: Flags [P.], cksum 0x76f3 (incorrect -> 0x2035), seq 301:361, ack 780, win 3074, o
        ptions [nop,nop,TS val 993200917 ecr 1648945843], length 60
        0x0000: 4500 0070 d7ee 4000 4006 ec08 c0a8 fa9a E..p..@.@.......
        0x0010: c0a8 faa4 2bca ef4f aeae 5a5d 2320 be1e ....+..O..Z]#...
        0x0020: 8018 0c02 76f3 0000 0101 080a 3b33 0b15 ....v.......;3..
        0x0030: 6248 eab3 8101 0000 0000 0007 0000 0024 bH.............$
        0x0040: c72c c765 0000 0000 0000 0000 4927 6d20 .,.e........I'm.
        0x0050: 6e6f 7420 7265 7370 6f6e 7369 626c 6520 not.responsible.
        0x0060: 666f 7220 7468 6973 2076 6275 636b 6574 for.this.vbucket

        Show
        hupantingxue hupantingxue added a comment - - edited @Perry: Thank you! I took a packet capture just now, and find an unknown process on 192.168.250.164.61263 write the exception key "c216db14ced919595f4d8469_5cbc91ea963213de9852a82a5eebf32d8433e953b185456a24783bdbaca67746", and I killed the process, the ops down to the normal level. I restart the unknown process, the ops is also normal. The package is as below, and I wondered if there is a bug, that caused Infinite loop? What 's the meaning of the response of the server(192.168.250.154) that is "I'm.not.responsible.for.this.vbucket"? 16:32:12.051868 IP (tos 0x0, ttl 64, id 8127, offset 0, flags [DF] , proto TCP (6), length 182) 192.168.250.164.61263 > 192.168.250.154.11210: Flags [P.] , cksum 0x20d7 (correct), seq 650:780, ack 301, win 32, options [nop,no p,TS val 1648945843 ecr 993200916], length 130 0x0000: 4500 00b6 1fbf 4000 4006 a3f2 c0a8 faa4 E.....@.@....... 0x0010: c0a8 fa9a ef4f 2bca 2320 bd9c aeae 5a5d .....O+.#.....Z] 0x0020: 8018 0020 20d7 0000 0101 080a 6248 eab3 ............bH.. 0x0030: 3b33 0b14 8001 0059 0800 033a 0000 006a ;3.....Y...:...j 0x0040: c72c c765 0000 0000 0000 0000 0000 0000 .,.e............ 0x0050: 0000 0000 6332 3136 6462 3134 6365 6439 ....c216db14ced9 0x0060: 3139 3539 3566 3464 3834 3639 5f35 6362 19595f4d8469_5cb 0x0070: 6339 3165 6139 3633 3231 3364 6539 3835 c91ea963213de985 0x0080: 3261 3832 6135 6565 6266 3332 6438 3433 2a82a5eebf32d843 0x0090: 3365 3935 3362 3138 3534 3536 6132 3437 3e953b185456a247 0x00a0: 3833 6264 6261 6361 3637 3734 3631 3630 83bdbaca67746160 0x00b0: 3238 3532 3838 285288 16:32:12.051960 IP (tos 0x0, ttl 64, id 55278, offset 0, flags [DF] , proto TCP (6), length 112) 192.168.250.154.11210 > 192.168.250.164.61263: Flags [P.] , cksum 0x76f3 (incorrect -> 0x2035), seq 301:361, ack 780, win 3074, o ptions [nop,nop,TS val 993200917 ecr 1648945843] , length 60 0x0000: 4500 0070 d7ee 4000 4006 ec08 c0a8 fa9a E..p..@.@....... 0x0010: c0a8 faa4 2bca ef4f aeae 5a5d 2320 be1e ....+..O..Z]#... 0x0020: 8018 0c02 76f3 0000 0101 080a 3b33 0b15 ....v.......;3.. 0x0030: 6248 eab3 8101 0000 0000 0007 0000 0024 bH.............$ 0x0040: c72c c765 0000 0000 0000 0000 4927 6d20 .,.e........I'm. 0x0050: 6e6f 7420 7265 7370 6f6e 7369 626c 6520 not.responsible. 0x0060: 666f 7220 7468 6973 2076 6275 636b 6574 for.this.vbucket
        Hide
        hupantingxue hupantingxue added a comment - - edited

        packet capture see also in attach cbpkg_exception.log (http://www.couchbase.com/issues/secure/attachment/18685/cbpkg_exception.log)

        Show
        hupantingxue hupantingxue added a comment - - edited packet capture see also in attach cbpkg_exception.log ( http://www.couchbase.com/issues/secure/attachment/18685/cbpkg_exception.log )

          People

          • Assignee:
            chiyoung Chiyoung Seo
            Reporter:
            hupantingxue hupantingxue
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 24h
              24h
              Remaining:
              Remaining Estimate - 24h
              24h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Gerrit Reviews

                There are no open Gerrit changes