Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-531

Couchbase operator event order is sometimes inconsistent

    XMLWordPrintable

Details

    Description

      Couchbase cluster event order is getting changed.

      In the below scenario, pods 0001, 0005 is getting killed and the same order is preserved in the description initially. Later once the pod 0001 is recovered, "MemberDown" event order is getting swapped.

      Cb-Cluster description with member doing down event order 0001 followed by 0005,

      Ashwins-MacBook-Pro:couchbase-operator]$ kubectl describe cbc
      Name:         test-couchbase-l477x
      Namespace:    ashwin
      Labels:       <none>
      Annotations:  <none>
      API Version:  couchbase.com/v1
      Kind:         CouchbaseCluster
      Metadata:
        Cluster Name:
        Creation Timestamp:  2018-08-11T15:41:30Z
        Generate Name:       test-couchbase-
        Generation:          1
        Resource Version:    3645172
        Self Link:           /apis/couchbase.com/v1/namespaces/ashwin/couchbaseclusters/test-couchbase-l477x
        UID:                 045e0cd7-9d7d-11e8-8cfd-080027ee3776
      Spec:
        Auth Secret:  basic-test-secret
        Base Image:   couchbase/server
        Buckets:
          Conflict Resolution:  seqno
          Enable Flush:         true
          Eviction Policy:      fullEviction
          Io Priority:          high
          Memory Quota:         100
          Name:                 PVBucket
          Replicas:             2
          Type:                 couchbase
        Cluster:
          Analytics Service Memory Quota:                 1024
          Auto Failover Max Count:                        3
          Auto Failover On Data Disk Issues:              false
          Auto Failover On Data Disk Issues Time Period:  120
          Auto Failover Server Group:                     false
          Auto Failover Timeout:                          30
          Cluster Name:                                   test-couchbase-l477x
          Data Service Memory Quota:                      256
          Eventing Service Memory Quota:                  256
          Index Service Memory Quota:                     256
          Index Storage Setting:                          memory_optimized
          Search Service Memory Quota:                    256
        Expose Admin Console:                             false
        Security Context:
          Fs Group:  1000
        Servers:
          Name:  test_config_1
          Pod:
            Resources:
            Volume Mounts:
              Data:     couchbase
              Default:  couchbase
          Services:
            data
            query
            index
          Size:                         6
        Software Update Notifications:  false
        Version:                        enterprise-5.5.0
        Volume Claim Templates:
          Metadata:
            Creation Timestamp:  <nil>
            Name:                couchbase
          Spec:
            Resources:
              Requests:
                Storage:         2Gi
            Storage Class Name:  standard
          Status:
      Status:
        Buckets:
          PV Bucket:
            Conflict Resolution:  seqno
            Enable Flush:         true
            Eviction Policy:      fullEviction
            Io Priority:          high
            Memory Quota:         100
            Name:                 PVBucket
            Replicas:             2
            Type:                 couchbase
        Cluster Id:               3b7d4d629beaf9454dd3f55ffd2f218b
        Conditions:
          Available:
            Last Transition Time:  2018-08-11T15:44:22Z
            Last Update Time:      2018-08-11T15:44:22Z
            Message:               The following nodes are down and not serving requests: http://test-couchbase-l477x-0001.test-couchbase-l477x.ashwin.svc:8091, http://test-couchbase-l477x-0005.test-couchbase-l477x.ashwin.svc:8091
            Reason:                Cluster partially available
            Status:                False
          Balanced:
            Last Transition Time:  2018-08-11T15:43:43Z
            Last Update Time:      2018-08-11T15:43:43Z
            Message:               Data is equally distributed across all nodes in the cluster
            Reason:                Cluster is balanced
            Status:                True
        Control Paused:            false
        Current Version:           enterprise-5.5.0
        Members:
          Index:  6
          Ready:
            Name:  test-couchbase-l477x-0000
            Name:  test-couchbase-l477x-0002
            Name:  test-couchbase-l477x-0003
            Name:  test-couchbase-l477x-0004
          Unready:
            Name:  test-couchbase-l477x-0001
            Name:  test-couchbase-l477x-0005
        Phase:     Running
        Reason:
        Size:      6
      Events:
        Type     Reason              Age               From                                 Message
        ----     ------              ----              ----                                 -------
        Normal   NewMemberAdded      3m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0000 added to cluster
        Normal   NewMemberAdded      2m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0001 added to cluster
        Normal   NewMemberAdded      2m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0002 added to cluster
        Normal   NewMemberAdded      2m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0003 added to cluster
        Normal   NewMemberAdded      1m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0004 added to cluster
        Normal   NewMemberAdded      1m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0005 added to cluster
        Normal   RebalanceStarted    1m                couchbase-operator-7f558f9849-8dj5h  A rebalance has been started to balance data across the cluster
        Normal   RebalanceCompleted  1m                couchbase-operator-7f558f9849-8dj5h  A rebalance has completed
        Normal   BucketCreated       1m                couchbase-operator-7f558f9849-8dj5h  A new bucket `PVBucket` was created
        Warning  MemberDown          1s (x6 over 46s)  couchbase-operator-7f558f9849-8dj5h  Existing member test-couchbase-l477x-0001 down
        Warning  MemberDown          1s (x6 over 46s)  couchbase-operator-7f558f9849-8dj5h  Existing member test-couchbase-l477x-0005 down
      Ashwins-MacBook-Pro:couchbase-operator]${noformat}
       
       
      Changed *Member-Down* event order,
      

      Ashwins-MacBook-Pro:couchbase-operator]$ kubectl describe cbc
      Name:         test-couchbase-l477x
      Namespace:    ashwin
      Labels:       <none>
      Annotations:  <none>
      API Version:  couchbase.com/v1
      Kind:         CouchbaseCluster
      Metadata:
        Cluster Name:
        Creation Timestamp:  2018-08-11T15:41:30Z
        Generate Name:       test-couchbase-
        Generation:          1
        Resource Version:    3645263
        Self Link:           /apis/couchbase.com/v1/namespaces/ashwin/couchbaseclusters/test-couchbase-l477x
        UID:                 045e0cd7-9d7d-11e8-8cfd-080027ee3776
      Spec:
        Auth Secret:  basic-test-secret
        Base Image:   couchbase/server
        Buckets:
          Conflict Resolution:  seqno
          Enable Flush:         true
          Eviction Policy:      fullEviction
          Io Priority:          high
          Memory Quota:         100
          Name:                 PVBucket
          Replicas:             2
          Type:                 couchbase
        Cluster:
          Analytics Service Memory Quota:                 1024
          Auto Failover Max Count:                        3
          Auto Failover On Data Disk Issues:              false
          Auto Failover On Data Disk Issues Time Period:  120
          Auto Failover Server Group:                     false
          Auto Failover Timeout:                          30
          Cluster Name:                                   test-couchbase-l477x
          Data Service Memory Quota:                      256
          Eventing Service Memory Quota:                  256
          Index Service Memory Quota:                     256
          Index Storage Setting:                          memory_optimized
          Search Service Memory Quota:                    256
        Expose Admin Console:                             false
        Security Context:
          Fs Group:  1000
        Servers:
          Name:  test_config_1
          Pod:
            Resources:
            Volume Mounts:
              Data:     couchbase
              Default:  couchbase
          Services:
            data
            query
            index
          Size:                         6
        Software Update Notifications:  false
        Version:                        enterprise-5.5.0
        Volume Claim Templates:
          Metadata:
            Creation Timestamp:  <nil>
            Name:                couchbase
          Spec:
            Resources:
              Requests:
                Storage:         2Gi
            Storage Class Name:  standard
          Status:
      Status:
        Buckets:
          PV Bucket:
            Conflict Resolution:  seqno
            Enable Flush:         true
            Eviction Policy:      fullEviction
            Io Priority:          high
            Memory Quota:         100
            Name:                 PVBucket
            Replicas:             2
            Type:                 couchbase
        Cluster Id:               3b7d4d629beaf9454dd3f55ffd2f218b
        Conditions:
          Available:
            Last Transition Time:  2018-08-11T15:45:24Z
            Last Update Time:      2018-08-11T15:45:24Z
            Message:               The following nodes are down and not serving requests: http://test-couchbase-l477x-0005.test-couchbase-l477x.ashwin.svc:8091, http://test-couchbase-l477x-0001.test-couchbase-l477x.ashwin.svc:8091
            Reason:                Cluster partially available
            Status:                False
          Balanced:
            Last Transition Time:  2018-08-11T15:43:43Z
            Last Update Time:      2018-08-11T15:43:43Z
            Message:               Data is equally distributed across all nodes in the cluster
            Reason:                Cluster is balanced
            Status:                True
        Control Paused:            false
        Current Version:           enterprise-5.5.0
        Members:
          Index:  6
          Ready:
            Name:  test-couchbase-l477x-0000
            Name:  test-couchbase-l477x-0002
            Name:  test-couchbase-l477x-0003
            Name:  test-couchbase-l477x-0004
          Unready:
            Name:  test-couchbase-l477x-0001
            Name:  test-couchbase-l477x-0005
        Phase:     Running
        Reason:
        Size:      6
      Events:
        Type     Reason              Age               From                                 Message
        ----     ------              ----              ----                                 -------
        Normal   NewMemberAdded      3m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0000 added to cluster
        Normal   NewMemberAdded      3m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0001 added to cluster
        Normal   NewMemberAdded      3m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0002 added to cluster
        Normal   NewMemberAdded      2m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0003 added to cluster
        Normal   NewMemberAdded      2m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0004 added to cluster
        Normal   NewMemberAdded      2m                couchbase-operator-7f558f9849-8dj5h  New member test-couchbase-l477x-0005 added to cluster
        Normal   RebalanceStarted    2m                couchbase-operator-7f558f9849-8dj5h  A rebalance has been started to balance data across the cluster
        Normal   RebalanceCompleted  1m                couchbase-operator-7f558f9849-8dj5h  A rebalance has completed
        Normal   BucketCreated       1m                couchbase-operator-7f558f9849-8dj5h  A new bucket `PVBucket` was created
        Warning  MemberDown          22s (x7 over 1m)  couchbase-operator-7f558f9849-8dj5h  Existing member test-couchbase-l477x-0005 down
        Warning  MemberDown          13s (x8 over 1m)  couchbase-operator-7f558f9849-8dj5h  Existing member test-couchbase-l477x-0001 down
        Normal   MemberRecovered     3s                couchbase-operator-7f558f9849-8dj5h  Existing member test-couchbase-l477x-0001 recovered
      Ashwins-MacBook-Pro:couchbase-operator]${noformat}

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty