Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37200

[System test]: Eventing rebalance hung ,failed with timeout

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 6.5.0
    • 6.5.0
    • eventing

    Description

      Build: 6.5.0-4926 , not seen on 6.5.0-4908

      Test: MH longevity

      Day: 2nd

      Cycle: 1st and 2nd

      Seeing Multiple occurrences of timeout 

      [2019-12-07T02:40:31-08:00, sequoiatools/couchbase-cli:6.5:80497e] server-add -c 172.23.108.103:8091 --server-add https://172.23.106.100 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2019-12-07T02:41:15-08:00, sequoiatools/couchbase-cli:6.5:2767f4] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      → 
       
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
       
      docker logs 2767f4
      docker start 2767f4
       
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2019-12-07T03:06:45-08:00, sequoiatools/cmd:a79d85] 60
       
       
       
      [user:error,2019-12-07T03:06:13.881-08:00,ns_1@172.23.108.103:<0.26696.357>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.13972.1071>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 600 secs">>}}}}}. 

      [2019-12-07T03:20:52-08:00, sequoiatools/couchbase-cli:6.5:ab9dde] server-add -c 172.23.108.103:8091 --server-add https://172.23.104.5 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2019-12-07T03:32:36-08:00, sequoiatools/couchbase-cli:6.5:72069a] server-add -c 172.23.108.103:8091 --server-add https://172.23.99.20 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2019-12-07T03:33:14-08:00, sequoiatools/couchbase-cli:6.5:2d5e1f] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      → 
       
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
       
      docker logs 2d5e1f
      docker start 2d5e1f
       
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2019-12-07T03:56:17-08:00, sequoiatools/cmd:454a13] 60
       
       
       
       
       
       
      [user:error,2019-12-07T03:55:40.784-08:00,ns_1@172.23.108.103:<0.26696.357>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.414.1104>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 600 secs">>}}}}}.
      Rebalance Operation Id = ddf558dfc73434f31e1f474c60eb81a4 

      [2019-12-07T08:13:46-08:00, sequoiatools/couchbase-cli:6.5:4b9823] server-add -c 172.23.108.103:8091 --server-add https://172.23.104.5 -u Administrator -p password --server-add-username Administrator --server-add-password password --services eventing
      [2019-12-07T08:14:15-08:00, sequoiatools/couchbase-cli:6.5:d82136] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      → 
       
       
      Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
       
      docker logs d82136
      docker start d82136
       
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2019-12-07T08:31:29-08:00, sequoiatools/cmd:7a8168] 60
       
       
       
       
      [user:error,2019-12-07T08:30:47.970-08:00,ns_1@172.23.108.103:<0.26696.357>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.6373.1283>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 600 secs">>}}}}}.
      Rebalance Operation Id = f59864e651935d9a6b4867f44b063d9c 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          ankit.prabhu Ankit Prabhu added a comment -

          At node .98.135, eventing producer tried to send the updated vbs to consumer but worker_bucket_op_2 doesn't receive the message. Due to this initWorkerMap function never returned and isPlannerRunning flag was set to true.

          https://github.com/couchbase/eventing/blob/a9baa5b7e4e264caab7a9bb85bfb2f7e07182a34/producer/producer.go#L219
          https://github.com/couchbase/eventing/blob/a23aa8fbf8b9fac1da8308137373cf8c6595d867/producer/vbucket_node_assignment.go#L302

          2019-12-07T00:02:18.796-08:00 [Info] Producer::initWorkerVbMap [bucket_op_curl:3] Sending workerVbucketMap: map[worker_bucket_op_curl_0:[683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796] worker_bucket_op_curl_1:[797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910] worker_bucket_op_curl_2:[911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023]] to all consumers
          2019-12-07T00:02:18.797-08:00 [Info] Consumer::SendAssignedVbs [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] Sending assigned vbuckets map: [683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796]
          2019-12-07T00:02:18.797-08:00 [Info] Consumer::SendAssignedVbs [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] Sending assigned vbuckets map: [797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910]
          2019-12-07T00:02:18.797-08:00 [Info] eventing-consumer [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] Updating vbucket map, vbmap :797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910
          2019-12-07T00:02:18.797-08:00 [Info] eventing-consumer [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] Updating vbucket map, vbmap :683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796
          

           
          When KV node rebalance started, eventing received STREAMEND for those vbs which belong to that kv node. But since planner was not yet completed eventing starts skipping stream request for those vbs.

          669662:2019-12-07T00:37:20.411-08:00 [Info] Consumer::handleFailoverLog [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 got STREAMEND
          669724:2019-12-07T00:37:21.853-08:00 [Info] Consumer::controlRoutine [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691, reclaiming it back by restarting dcp stream
          669725:2019-12-07T00:37:21.854-08:00 [Info] Consumer::checkIfAlreadyEnqueued [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 not enqueued
          669726:2019-12-07T00:37:21.854-08:00 [Info] Consumer::addToEnqueueMap [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 enqueuing
          669727:2019-12-07T00:37:21.854-08:00 [Info] Consumer::updateVbOwnerAndStartDCPStream [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Sending streamRequestInfo size: 0
          669729:2019-12-07T00:37:21.854-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 reqStreamCh size: 0 msg: &consumer.streamRequestInfo{startSeqNo:0x98a50, vb:0x2b3, vbBlob:(*consumer.vbucketKVBlob)(0xc43111f680)} Got request to stream
          669730:2019-12-07T00:37:21.854-08:00 [Info] Consumer::deleteFromEnqueueMap [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 deleting from enqueue list
          669731:2019-12-07T00:37:21.854-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Skipping stream request as planner is running
          669877:2019-12-07T00:37:24.851-08:00 [Info] Consumer::controlRoutine [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691, reclaiming it back by restarting dcp stream
          669878:2019-12-07T00:37:24.852-08:00 [Info] Consumer::checkIfAlreadyEnqueued [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 not enqueued
          669879:2019-12-07T00:37:24.852-08:00 [Info] Consumer::addToEnqueueMap [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 enqueuing
          669880:2019-12-07T00:37:24.852-08:00 [Info] Consumer::updateVbOwnerAndStartDCPStream [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Sending streamRequestInfo size: 0
          669882:2019-12-07T00:37:24.852-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 reqStreamCh size: 0 msg: &consumer.streamRequestInfo{startSeqNo:0x98a50, vb:0x2b3, vbBlob:(*consumer.vbucketKVBlob)(0xc4204df680)} Got request to stream
          

          Since eventing starts skipping stream request to those vbs all the successive rebalance failed because there was no progress.

          [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Skipping stream request as planner is running
          922152:2019-12-07T03:27:06.725-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 889 Skipping stream request as planner is running
          922189:2019-12-07T03:27:06.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 790 Skipping stream request as planner is running
          922192:2019-12-07T03:27:07.725-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 811 Skipping stream request as planner is running
          922195:2019-12-07T03:27:07.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 791 Skipping stream request as planner is running
          922200:2019-12-07T03:27:08.725-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 812 Skipping stream request as planner is running
          922203:2019-12-07T03:27:08.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 738 Skipping stream request as planner is running
          922208:2019-12-07T03:27:09.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 851 Skipping stream request as planner is running
          922245:2019-12-07T03:27:09.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 739 Skipping stream request as planner is running
          922252:2019-12-07T03:27:10.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 823 Skipping stream request as planner is running
          922259:2019-12-07T03:27:10.985-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 794 Skipping stream request as planner is running
          922278:2019-12-07T03:27:11.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 824 Skipping stream request as planner is running
          922285:2019-12-07T03:27:11.985-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 747 Skipping stream request as planner is running
          922308:2019-12-07T03:27:12.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 852 Skipping stream request as planner is running
          922347:2019-12-07T03:27:12.986-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 748 Skipping stream request as planner is running
          

          896869:2019-12-07T03:06:01.918-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
          896945:2019-12-07T03:06:05.005-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796]
          896947:2019-12-07T03:06:05.006-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903]
          896949:2019-12-07T03:06:05.007-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
          897011:2019-12-07T03:06:07.915-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796]
          897013:2019-12-07T03:06:07.916-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903]
          897015:2019-12-07T03:06:07.918-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
          897091:2019-12-07T03:06:10.963-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796]
          897093:2019-12-07T03:06:10.964-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903]
          897095:2019-12-07T03:06:10.966-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
          897210:2019-12-07T03:06:13.772-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796]
          897212:2019-12-07T03:06:13.773-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903]
          

          I took the goroutine dump, but it looks like it came out of this situation and log also got rotated so we don't know when and why it came out of this situation. Vikas Chaudhary could you please take goroutine dump when eventing rebalance fails?

          ankit.prabhu Ankit Prabhu added a comment - At node .98.135, eventing producer tried to send the updated vbs to consumer but worker_bucket_op_2 doesn't receive the message. Due to this initWorkerMap function never returned and isPlannerRunning flag was set to true. https://github.com/couchbase/eventing/blob/a9baa5b7e4e264caab7a9bb85bfb2f7e07182a34/producer/producer.go#L219 https://github.com/couchbase/eventing/blob/a23aa8fbf8b9fac1da8308137373cf8c6595d867/producer/vbucket_node_assignment.go#L302 2019-12-07T00:02:18.796-08:00 [Info] Producer::initWorkerVbMap [bucket_op_curl:3] Sending workerVbucketMap: map[worker_bucket_op_curl_0:[683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796] worker_bucket_op_curl_1:[797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910] worker_bucket_op_curl_2:[911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023]] to all consumers 2019-12-07T00:02:18.797-08:00 [Info] Consumer::SendAssignedVbs [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] Sending assigned vbuckets map: [683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796] 2019-12-07T00:02:18.797-08:00 [Info] Consumer::SendAssignedVbs [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] Sending assigned vbuckets map: [797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910] 2019-12-07T00:02:18.797-08:00 [Info] eventing-consumer [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] Updating vbucket map, vbmap :797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 2019-12-07T00:02:18.797-08:00 [Info] eventing-consumer [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] Updating vbucket map, vbmap :683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796   When KV node rebalance started, eventing received STREAMEND for those vbs which belong to that kv node. But since planner was not yet completed eventing starts skipping stream request for those vbs. 669662:2019-12-07T00:37:20.411-08:00 [Info] Consumer::handleFailoverLog [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 got STREAMEND 669724:2019-12-07T00:37:21.853-08:00 [Info] Consumer::controlRoutine [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691, reclaiming it back by restarting dcp stream 669725:2019-12-07T00:37:21.854-08:00 [Info] Consumer::checkIfAlreadyEnqueued [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 not enqueued 669726:2019-12-07T00:37:21.854-08:00 [Info] Consumer::addToEnqueueMap [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 enqueuing 669727:2019-12-07T00:37:21.854-08:00 [Info] Consumer::updateVbOwnerAndStartDCPStream [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Sending streamRequestInfo size: 0 669729:2019-12-07T00:37:21.854-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 reqStreamCh size: 0 msg: &consumer.streamRequestInfo{startSeqNo:0x98a50, vb:0x2b3, vbBlob:(*consumer.vbucketKVBlob)(0xc43111f680)} Got request to stream 669730:2019-12-07T00:37:21.854-08:00 [Info] Consumer::deleteFromEnqueueMap [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 deleting from enqueue list 669731:2019-12-07T00:37:21.854-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Skipping stream request as planner is running 669877:2019-12-07T00:37:24.851-08:00 [Info] Consumer::controlRoutine [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691, reclaiming it back by restarting dcp stream 669878:2019-12-07T00:37:24.852-08:00 [Info] Consumer::checkIfAlreadyEnqueued [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 not enqueued 669879:2019-12-07T00:37:24.852-08:00 [Info] Consumer::addToEnqueueMap [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 enqueuing 669880:2019-12-07T00:37:24.852-08:00 [Info] Consumer::updateVbOwnerAndStartDCPStream [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Sending streamRequestInfo size: 0 669882:2019-12-07T00:37:24.852-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 reqStreamCh size: 0 msg: &consumer.streamRequestInfo{startSeqNo:0x98a50, vb:0x2b3, vbBlob:(*consumer.vbucketKVBlob)(0xc4204df680)} Got request to stream Since eventing starts skipping stream request to those vbs all the successive rebalance failed because there was no progress. [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 691 Skipping stream request as planner is running 922152:2019-12-07T03:27:06.725-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 889 Skipping stream request as planner is running 922189:2019-12-07T03:27:06.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 790 Skipping stream request as planner is running 922192:2019-12-07T03:27:07.725-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 811 Skipping stream request as planner is running 922195:2019-12-07T03:27:07.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 791 Skipping stream request as planner is running 922200:2019-12-07T03:27:08.725-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 812 Skipping stream request as planner is running 922203:2019-12-07T03:27:08.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 738 Skipping stream request as planner is running 922208:2019-12-07T03:27:09.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 851 Skipping stream request as planner is running 922245:2019-12-07T03:27:09.984-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 739 Skipping stream request as planner is running 922252:2019-12-07T03:27:10.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 823 Skipping stream request as planner is running 922259:2019-12-07T03:27:10.985-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 794 Skipping stream request as planner is running 922278:2019-12-07T03:27:11.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 824 Skipping stream request as planner is running 922285:2019-12-07T03:27:11.985-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 747 Skipping stream request as planner is running 922308:2019-12-07T03:27:12.726-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] vb: 852 Skipping stream request as planner is running 922347:2019-12-07T03:27:12.986-08:00 [Info] Consumer::processReqStreamMessages [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] vb: 748 Skipping stream request as planner is running 896869:2019-12-07T03:06:01.918-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: [] 896945:2019-12-07T03:06:05.005-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796] 896947:2019-12-07T03:06:05.006-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903] 896949:2019-12-07T03:06:05.007-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: [] 897011:2019-12-07T03:06:07.915-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796] 897013:2019-12-07T03:06:07.916-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903] 897015:2019-12-07T03:06:07.918-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: [] 897091:2019-12-07T03:06:10.963-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796] 897093:2019-12-07T03:06:10.964-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903] 897095:2019-12-07T03:06:10.966-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_42136478.sock:12556] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: [] 897210:2019-12-07T03:06:13.772-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_0:/tmp/127.0.0.1:8091_0_42136478.sock:12547] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 30 dump: [691, 738-739, 747-750, 765, 768-776, 779-782, 786-791, 794-796] 897212:2019-12-07T03:06:13.773-08:00 [Info] Consumer::RebalanceTaskProgress [worker_bucket_op_curl_1:/tmp/127.0.0.1:8091_1_42136478.sock:12546] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 58 dump: [797-800, 805, 807-825, 827-835, 845-852, 854-860, 864-869, 887-889, 903] I took the goroutine dump, but it looks like it came out of this situation and log also got rotated so we don't know when and why it came out of this situation. Vikas Chaudhary could you please take goroutine dump when eventing rebalance fails?
          ankit.prabhu Ankit Prabhu added a comment - - edited

          Resolving it as duplicate of MB-37179

          ankit.prabhu Ankit Prabhu added a comment - - edited Resolving it as duplicate of  MB-37179

          People

            satya.nand Satya Nand (Inactive)
            vikas.chaudhary Vikas Chaudhary
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty