Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-537

PVC: Pod recovery/VolumeUnhealthy events are missing in the generated event list

    XMLWordPrintable

Details

    Description

       

      Testcase: TestPersistentVolumeKillAllPods

      Scenario:

      1. Created 4 node Couchbase cluster with PVC defined for all nodes
      2. Killing all couchbase-server pods
      3. All pods are recovered back and cluster is rebalanced.
      4. But recovery event is got only for 2 pods instead of 3 events

      Events:
        Type     Reason              Age              From                                 Message
        ----     ------              ----             ----                                 -------
        Normal   NewMemberAdded      10m              couchbase-operator-585d4b675d-j74s4  New member test-couchbase-xrlsc-0000 added to cluster
        Normal   NewMemberAdded      10m              couchbase-operator-585d4b675d-j74s4  New member test-couchbase-xrlsc-0001 added to cluster
        Normal   NewMemberAdded      9m               couchbase-operator-585d4b675d-j74s4  New member test-couchbase-xrlsc-0002 added to cluster
        Normal   NewMemberAdded      8m               couchbase-operator-585d4b675d-j74s4  New member test-couchbase-xrlsc-0003 added to cluster
        Normal   RebalanceStarted    8m               couchbase-operator-585d4b675d-j74s4  A rebalance has been started to balance data across the cluster
        Normal   RebalanceCompleted  8m               couchbase-operator-585d4b675d-j74s4  A rebalance has completed
        Normal   BucketCreated       8m               couchbase-operator-585d4b675d-j74s4  A new bucket `PVBucket` was created
        Warning  MemberDown          6m (x7 over 7m)  couchbase-operator-585d4b675d-j74s4  Existing member test-couchbase-xrlsc-0002 down
        Warning  MemberDown          6m (x8 over 7m)  couchbase-operator-585d4b675d-j74s4  Existing member test-couchbase-xrlsc-0000 down
        Normal   MemberRecovered     5m               couchbase-operator-585d4b675d-j74s4  Existing member test-couchbase-xrlsc-0000 recovered
        Warning  MemberDown          5m (x8 over 7m)  couchbase-operator-585d4b675d-j74s4  Existing member test-couchbase-xrlsc-0003 down
        Normal   MemberRecovered     5m               couchbase-operator-585d4b675d-j74s4  Existing member test-couchbase-xrlsc-0003 recovered
        Normal   RebalanceStarted    3m               couchbase-operator-585d4b675d-j74s4  A rebalance has been started to balance data across the cluster
        Normal   RebalanceCompleted  3m               couchbase-operator-585d4b675d-j74s4  A rebalance has completed

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Description for release notes:

            Known Issue: In rare events Kubernetes seems to drop events logged by the operator.

            Workaround: None.

            mikew Mike Wiederhold [X] (Inactive) added a comment - Description for release notes: Known Issue: In rare events Kubernetes seems to drop events logged by the operator. Workaround: None.

            I found 1 area where recovery event was raised, but I noticed events were mysteriously being dropped when I ran the scenario.

            Also note the VolumeUnhealthy event is cached.  If the test is deleting the same one it's possible we don't record duplicates.

             

             

            tommie Tommie McAfee added a comment - I found 1 area where recovery event was raised, but I noticed events were mysteriously being dropped when I ran the scenario. Also note the VolumeUnhealthy event is cached.  If the test is deleting the same one it's possible we don't record duplicates.    

            Mike Wiederhold [X], in test case TestPersistentVolumeWithSingleNodeService also, "VolumeUnhealthy" event is missed some times.

            Type: Normal | Reason: ServiceCreated | Message: Service for admin console `test-couchbase-wc92f-ui` was created
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0000 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0001 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0002 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0003 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0004 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0005 added to cluster
                        Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster
                        Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed
                        Type: Normal | Reason: BucketCreated | Message: A new bucket `PVBucket` was created
                        Type: Warning | Reason: MemberDown | Message: Existing member test-couchbase-wc92f-0005 down
                        Type: Normal | Reason: MemberRecovered | Message: Existing member test-couchbase-wc92f-0005 recovered
                        Type: Normal | Reason: VolumeUnhealthy | Message: Member test-couchbase-wc92f-0005 volumes are unhealthy.  Failover is recommended: Missing PersistentVolumeClaim for path /opt/couchbase/var/lib/couchbase
                        Type: Warning | Reason: MemberFailedOver | Message: Existing member test-couchbase-wc92f-0005 failed over
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0006 added to cluster
                        Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster
                        Type: Normal | Reason: MemberRemoved | Message: Existing member test-couchbase-wc92f-0005 removed from the cluster
                        Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed
             
                        but got:
                        Type: Normal | Reason: ServiceCreated | Message: Service for admin console `test-couchbase-wc92f-ui` was created
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0000 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0001 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0002 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0003 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0004 added to cluster
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0005 added to cluster
                        Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster
                        Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed
                        Type: Normal | Reason: BucketCreated | Message: A new bucket `PVBucket` was created
                        Type: Warning | Reason: MemberDown | Message: Existing member test-couchbase-wc92f-0005 down
                        Type: Normal | Reason: MemberRecovered | Message: Existing member test-couchbase-wc92f-0005 recovered
                        Type: Warning | Reason: MemberFailedOver | Message: Existing member test-couchbase-wc92f-0005 failed over
                        Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0006 added to cluster
                        Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster
                        Type: Normal | Reason: MemberRemoved | Message: Existing member test-couchbase-wc92f-0005 removed from the cluster
                        Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Mike Wiederhold [X] , in test case TestPersistentVolumeWithSingleNodeService also, "VolumeUnhealthy" event is missed some times. Type: Normal | Reason: ServiceCreated | Message: Service for admin console `test-couchbase-wc92f-ui` was created             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0000 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0001 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0002 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0003 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0004 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0005 added to cluster             Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster             Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed             Type: Normal | Reason: BucketCreated | Message: A new bucket `PVBucket` was created             Type: Warning | Reason: MemberDown | Message: Existing member test-couchbase-wc92f-0005 down             Type: Normal | Reason: MemberRecovered | Message: Existing member test-couchbase-wc92f-0005 recovered             Type: Normal | Reason: VolumeUnhealthy | Message: Member test-couchbase-wc92f-0005 volumes are unhealthy.  Failover is recommended: Missing PersistentVolumeClaim for path /opt/couchbase/var/lib/couchbase             Type: Warning | Reason: MemberFailedOver | Message: Existing member test-couchbase-wc92f-0005 failed over             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0006 added to cluster             Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster             Type: Normal | Reason: MemberRemoved | Message: Existing member test-couchbase-wc92f-0005 removed from the cluster             Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed               but got:             Type: Normal | Reason: ServiceCreated | Message: Service for admin console `test-couchbase-wc92f-ui` was created             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0000 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0001 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0002 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0003 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0004 added to cluster             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0005 added to cluster             Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster             Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed             Type: Normal | Reason: BucketCreated | Message: A new bucket `PVBucket` was created             Type: Warning | Reason: MemberDown | Message: Existing member test-couchbase-wc92f-0005 down             Type: Normal | Reason: MemberRecovered | Message: Existing member test-couchbase-wc92f-0005 recovered             Type: Warning | Reason: MemberFailedOver | Message: Existing member test-couchbase-wc92f-0005 failed over             Type: Normal | Reason: NewMemberAdded | Message: New member test-couchbase-wc92f-0006 added to cluster             Type: Normal | Reason: RebalanceStarted | Message: A rebalance has been started to balance data across the cluster             Type: Normal | Reason: MemberRemoved | Message: Existing member test-couchbase-wc92f-0005 removed from the cluster             Type: Normal | Reason: RebalanceCompleted | Message: A rebalance has completed

            Moving this to 1.1.0. We are definitely raising the recovered event, but it seems to be getting lost somewhere in the Kubernetes code. This is only affecting two of the tests and the operator is doing the right things and logging the recovery. We will look into this again for the 1.1.0 release.

            mikew Mike Wiederhold [X] (Inactive) added a comment - Moving this to 1.1.0. We are definitely raising the recovered event, but it seems to be getting lost somewhere in the Kubernetes code. This is only affecting two of the tests and the operator is doing the right things and logging the recovery. We will look into this again for the 1.1.0 release.

            Mike Wiederhold [X] please find the attached log file cbopinfo-20180817T171141+0000.tar.gz

            Also in this case, Cluster is reported as balanced condition in cluster description. But there is no "RebalanceComplete" event in the event list.

            NAME                                  READY     STATUS    RESTARTS   AGE
            couchbase-operator-56699895c4-2k826   1/1       Running   0          11m
            docker-registry-1-9fznr               1/1       Running   1          10d
            registry-console-1-jcn85              1/1       Running   1          10d
            router-1-mkhbw                        1/1       Running   1          10d
            test-couchbase-gfs6b-0000             1/1       Running   0          7m
            test-couchbase-gfs6b-0001             1/1       Running   0          6m
            test-couchbase-gfs6b-0002             1/1       Running   0          5m
            test-couchbase-gfs6b-0003             1/1       Running   0          4m
             
             
            Status:
              Buckets:
                PV Bucket:
                  Conflict Resolution:  seqno
                  Enable Flush:         true
                  Eviction Policy:      fullEviction
                  Io Priority:          high
                  Memory Quota:         100
                  Name:                 PVBucket
                  Replicas:             2
                  Type:                 couchbase
              Cluster Id:               763e6e473071d61afb4ede1b26cee19b
              Conditions:
                Available:
                  Last Transition Time:  2018-08-17T17:11:19Z
                  Last Update Time:      2018-08-17T17:11:19Z
                  Reason:                Cluster available
                  Status:                True
                Balanced:
                  Last Transition Time:  2018-08-17T17:11:40Z
                  Last Update Time:      2018-08-17T17:11:40Z
                  Message:               Data is equally distributed across all nodes in the cluster
                  Reason:                Cluster is balanced
                  Status:                True
              Control Paused:            false
              Current Version:           enterprise-5.5.0
              Members:
                Index:  4
                Ready:
                  Name:  test-couchbase-gfs6b-0000
                  Name:  test-couchbase-gfs6b-0001
                  Name:  test-couchbase-gfs6b-0002
                  Name:  test-couchbase-gfs6b-0003
              Phase:     Running
              Reason:
              Size:      4
             
            Events:
              Type     Reason               Age              From                                 Message
              ----     ------               ----             ----                                 -------
              Normal   NewMemberAdded       10m              couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0000 added to cluster
              Normal   NewMemberAdded       9m               couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0001 added to cluster
              Normal   NewMemberAdded       9m               couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0002 added to cluster
              Normal   NewMemberAdded       8m               couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0003 added to cluster
              Normal   RebalanceStarted     8m               couchbase-operator-56699895c4-2k826  A rebalance has been started to balance data across the cluster
              Normal   RebalanceCompleted   8m               couchbase-operator-56699895c4-2k826  A rebalance has completed
              Normal   BucketCreated        8m               couchbase-operator-56699895c4-2k826  A new bucket `PVBucket` was created
              Warning  MemberDown           6m (x7 over 7m)  couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0003 down
              Warning  MemberDown           6m (x8 over 7m)  couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0001 down
              Normal   MemberRecovered      5m               couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0001 recovered
              Warning  MemberDown           5m (x8 over 7m)  couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0002 down
              Normal   MemberRecovered      4m               couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0002 recovered
              Normal   RebalanceStarted     3m               couchbase-operator-56699895c4-2k826  A rebalance has been started to balance data across the cluster
              Normal   RebalanceIncomplete  3m               couchbase-operator-56699895c4-2k826  A rebalance is incomplete

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Mike Wiederhold [X] please find the attached log file cbopinfo-20180817T171141+0000.tar.gz Also in this case, Cluster is reported as balanced condition in cluster description. But there is no "RebalanceComplete" event in the event list. NAME                                  READY     STATUS    RESTARTS   AGE couchbase-operator-56699895c4-2k826   1/1       Running   0          11m docker-registry-1-9fznr               1/1       Running   1          10d registry-console-1-jcn85              1/1       Running   1          10d router-1-mkhbw                        1/1       Running   1          10d test-couchbase-gfs6b-0000             1/1       Running   0          7m test-couchbase-gfs6b-0001             1/1       Running   0          6m test-couchbase-gfs6b-0002             1/1       Running   0          5m test-couchbase-gfs6b-0003             1/1       Running   0          4m     Status:   Buckets:     PV Bucket:       Conflict Resolution:  seqno       Enable Flush:         true       Eviction Policy:      fullEviction       Io Priority:          high       Memory Quota:         100       Name:                 PVBucket       Replicas:             2       Type:                 couchbase   Cluster Id:               763e6e473071d61afb4ede1b26cee19b   Conditions:     Available:       Last Transition Time:  2018-08-17T17:11:19Z       Last Update Time:      2018-08-17T17:11:19Z       Reason:                Cluster available       Status:                True     Balanced:       Last Transition Time:  2018-08-17T17:11:40Z       Last Update Time:      2018-08-17T17:11:40Z       Message:               Data is equally distributed across all nodes in the cluster       Reason:                Cluster is balanced       Status:                True   Control Paused:            false   Current Version:           enterprise-5.5.0   Members:     Index:  4     Ready:       Name:  test-couchbase-gfs6b-0000       Name:  test-couchbase-gfs6b-0001       Name:  test-couchbase-gfs6b-0002       Name:  test-couchbase-gfs6b-0003   Phase:     Running   Reason:   Size:      4   Events:   Type     Reason               Age              From                                 Message   ----     ------               ----             ----                                 -------   Normal   NewMemberAdded       10m              couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0000 added to cluster   Normal   NewMemberAdded       9m               couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0001 added to cluster   Normal   NewMemberAdded       9m               couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0002 added to cluster   Normal   NewMemberAdded       8m               couchbase-operator-56699895c4-2k826  New member test-couchbase-gfs6b-0003 added to cluster   Normal   RebalanceStarted     8m               couchbase-operator-56699895c4-2k826  A rebalance has been started to balance data across the cluster   Normal   RebalanceCompleted   8m               couchbase-operator-56699895c4-2k826  A rebalance has completed   Normal   BucketCreated        8m               couchbase-operator-56699895c4-2k826  A new bucket `PVBucket` was created   Warning  MemberDown           6m (x7 over 7m)  couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0003 down   Warning  MemberDown           6m (x8 over 7m)  couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0001 down   Normal   MemberRecovered      5m               couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0001 recovered   Warning  MemberDown           5m (x8 over 7m)  couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0002 down   Normal   MemberRecovered      4m               couchbase-operator-56699895c4-2k826  Existing member test-couchbase-gfs6b-0002 recovered   Normal   RebalanceStarted     3m               couchbase-operator-56699895c4-2k826  A rebalance has been started to balance data across the cluster   Normal   RebalanceIncomplete  3m               couchbase-operator-56699895c4-2k826  A rebalance is incomplete

            People

              tommie Tommie McAfee
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty