Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7855

mixed cluster(1.8.1, 2.0.1) : memcached crashed during rebalance: Port server memcached on node A exited with status 139

    XMLWordPrintable

Details

    • Task
    • Resolution: Duplicate
    • Major
    • 2.1.0
    • 2.0.1
    • couchbase-bucket
    • Security Level: Public
    • None
    • CentOS5.7 x64

    Description

      http://qa.hq.northscale.net/job/centos-64-2.0-new-rebalance-mixed-cluster/51/consoleFull
      ./testrunner -i /tmp/rebalance_in.ini get-logs=True,wait_timeout=180,GROUP=P0,EXCLUDE_GROUP=FROM_2_0,get-cbcollect-info=True -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_with_warming_uprebalance_out_with_warming_up (rebalance.rebalanceout.RebalanceOutTests) ... ,nodes_out=3,items=500000,replicas=2,max_verify=100000,GROUP=OUT;P0

      1.8.1.8.1-937-rel
      [10.3.3.92]
      [10.3.3.93]
      [10.3.3.94]

      2.0.1-170 nodes
      [10.3.3.99]
      [10.3.3.82]
      [10.3.3.91]
      [10.3.3.97]

      test logs & UI logs:

      [2013-03-03 14:41:45,119] - [rest_client:804] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.3.3.91%2Cns_1%4010.3.3.97%2Cns_1%4010.3.3.94&user=Administrator&knownNodes=ns_1%4010.3.3.91%2Cns_1%4010.3.3.92%2Cns_1%4010.3.3.94%2Cns_1%4010.3.3.82%2Cns_1%4010.3.3.93%2Cns_1%4010.3.3.99%2Cns_1%4010.3.3.97
      [2013-03-03 14:41:45,148] - [rest_client:808] INFO - rebalance operation started
      [2013-03-03 14:41:45,190] - [rest_client:905] INFO - rebalance percentage : 0 %
      [2013-03-03 14:41:55,201] - [rest_client:905] INFO - rebalance percentage : 3.18635171224 %
      [2013-03-03 14:42:05,214] - [rest_client:905] INFO - rebalance percentage : 7.26072098483 %
      [2013-03-03 14:42:15,228] - [rest_client:905] INFO - rebalance percentage : 11.2492449683 %
      [2013-03-03 14:42:25,264] - [rest_client:905] INFO - rebalance percentage : 15.1684030474 %
      [2013-03-03 14:42:35,291] - [rest_client:888] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed
      [2013-03-03 14:42:35,292] - [rest_client:889] INFO - Latest logs from UI:
      [2013-03-03 14:42:35,362] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.99', u'code': 1, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.3.99\' in 2 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362350789337.0, u'type': u'info'}

      [2013-03-03 14:42:35,362] - [rest_client:890] ERROR - {u'node': u'ns_1@10.3.3.91', u'code': 2, u'text': u"Rebalance exited with reason badmatch,{error,closed,\n {gen_server,call,\n [

      {'ns_memcached-default','ns_1@10.3.3.99'}

      ,\n

      {set_vbucket,791,replica}

      ,\n 180000]}}\n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1362350785366.0, u'type': u'info'}
      [2013-03-03 14:42:35,363] - [rest_client:890] ERROR - {u'node': u'ns_1@10.3.3.99', u'code': 4, u'text': u"Control connection to memcached on 'ns_1@10.3.3.99' disconnected: {badmatch,\n {error,\n closed}}", u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362350785325.0, u'type': u'info'}
      [2013-03-03 14:42:35,363] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.99', u'code': 0, u'text': u'Port server memcached on node \'ns_1@10.3.3.99\' exited with status 139. Restarting. Messages: Sun Mar 3 14:46:24.556535 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 884 does not exist\nSun Mar 3 14:46:24.564771 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 885 does not exist\nSun Mar 3 14:46:24.573549 PST 3: TAP (Producer) eq_tapq:replication_building_1021_\'ns_1@10.3.3.93\' - Backfill is completed with VBuckets 1021, \nSun Mar 3 14:46:24.573598 PST 3: TAP (Producer) eq_tapq:replication_building_1021_\'ns_1@10.3.3.93\' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 1021\nSun Mar 3 14:46:24.583592 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.93 - disconnected, keep alive for 300 seconds\nSun Mar 3 14:46:24.586125 PST 3: TAP (Producer) eq_tapq:replication_building_1021_\'ns_1@10.3.3.92\' - Backfill is completed with VBuckets 1021, \nSun Mar 3 14:46:24.586246 PST 3: TAP (Producer) eq_tapq:replication_building_1021_\'ns_1@10.3.3.92\' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 1021\nSun Mar 3 14:46:24.590024 PST 3: Schedule cleanup of "eq_tapq:anon_393"\nSun Mar 3 14:46:24.592166 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 886 does not exist\nSun Mar 3 14:46:24.594568 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 887 does not exist\nSun Mar 3 14:46:24.598553 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 888 does not exist\nSun Mar 3 14:46:24.600084 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 889 does not exist\nSun Mar 3 14:46:24.601800 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 890 does not exist\nSun Mar 3 14:46:24.603173 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 891 does not exist\nSun Mar 3 14:46:24.604500 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 892 does not exist\nSun Mar 3 14:46:24.605959 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.93 - Connection is re-established. Rollback unacked messages...\nSun Mar 3 14:46:24.606019 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.93 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0\nSun Mar 3 14:46:24.606036 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.93 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0\nSun Mar 3 14:46:24.606235 PST 3: Schedule cleanup of "eq_tapq:anon_405"\nSun Mar 3 14:46:24.607351 PST 3: TAP (Producer) eq_tapq:anon_405 - Clear the tap queues by force\nSun Mar 3 14:46:24.608265 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 893 does not exist\nSun Mar 3 14:46:24.610252 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 894 does not exist\nSun Mar 3 14:46:24.611725 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 895 does not exist\nSun Mar 3 14:46:24.613358 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 896 does not exist\nSun Mar 3 14:46:24.614894 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 897 does not exist\nSun Mar 3 14:46:24.616714 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 898 does not exist\nSun Mar 3 14:46:24.618934 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 899 does not exist\nSun Mar 3 14:46:24.621083 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 900 does not exist\nSun Mar 3 14:46:24.630865 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 901 does not exist\nSun Mar 3 14:46:24.633184 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 902 does not exist\nSun Mar 3 14:46:24.636660 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 903 does not exist\nSun Mar 3 14:46:24.638895 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 904 does not exist\nSun Mar 3 14:46:24.641598 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 905 does not exist\nSun Mar 3 14:46:24.644453 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 906 does not exist\nSun Mar 3 14:46:24.647332 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 907 does not exist\nSun Mar 3 14:46:24.650084 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 908 does not exist\nSun Mar 3 14:46:24.652297 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 909 does not exist\nSun Mar 3 14:46:24.654505 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 910 does not exist\nSun Mar 3 14:46:24.656543 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 911 does not exist\nSun Mar 3 14:46:24.658728 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 912 does not exist\nSun Mar 3 14:46:24.660870 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 913 does not exist\nSun Mar 3 14:46:24.675750 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 914 does not exist\nSun Mar 3 14:46:24.678573 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 915 does not exist\nSun Mar 3 14:46:24.681267 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 916 does not exist\nSun Mar 3 14:46:24.684201 PST 3: TAP (Consumer) eq_tapq:anon_400 - disconnected\nSun Mar 3 14:46:24.684266 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 917 does not exist\nSun Mar 3 14:46:24.688212 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 918 does not exist\nSun Mar 3 14:46:24.692260 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 919 does not exist\nSun Mar 3 14:46:24.693720 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 920 does not exist\nSun Mar 3 14:46:24.695347 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 921 does not exist\nSun Mar 3 14:46:24.697559 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 922 does not exist\nSun Mar 3 14:46:24.699837 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 923 does not exist\nSun Mar 3 14:46:24.701187 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 924 does not exist\nSun Mar 3 14:46:24.702445 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 925 does not exist\nSun Mar 3 14:46:24.703742 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 926 does not exist\nSun Mar 3 14:46:24.705023 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 927 does not exist\nSun Mar 3 14:46:24.706293 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 928 does not exist\nSun Mar 3 14:46:24.707913 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 929 does not exist\nSun Mar 3 14:46:24.709382 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 930 does not exist\nSun Mar 3 14:46:24.710855 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 931 does not exist\nSun Mar 3 14:46:24.712323 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 932 does not exist\nSun Mar 3 14:46:24.713741 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 933 does not exist\nSun Mar 3 14:46:24.715278 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 934 does not exist\nSun Mar 3 14:46:24.717121 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 935 does not exist\nSun Mar 3 14:46:24.719087 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 936 does not exist\nSun Mar 3 14:46:24.720379 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 937 does not exist\nSun Mar 3 14:46:24.720409 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.82 - disconnected, keep alive for 300 seconds\nSun Mar 3 14:46:24.722399 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 938 does not exist\nSun Mar 3 14:46:24.722649 PST 3: Schedule cleanup of "eq_tapq:anon_400"\nSun Mar 3 14:46:24.723864 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 939 does not exist\nSun Mar 3 14:46:24.725572 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 940 does not exist\nSun Mar 3 14:46:24.727473 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 941 does not exist\nSun Mar 3 14:46:24.729436 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 942 does not exist\nSun Mar 3 14:46:24.731316 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 943 does not exist\nSun Mar 3 14:46:24.733465 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - disconnected, keep alive for 300 seconds\nSun Mar 3 14:46:24.733533 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 944 does not exist\nSun Mar 3 14:46:24.735516 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 945 does not exist\nSun Mar 3 14:46:24.737256 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 946 does not exist\nSun Mar 3 14:46:24.739452 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 947 does not exist\nSun Mar 3 14:46:24.741646 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 948 does not exist\nSun Mar 3 14:46:24.744002 PST 3: TAP (Consumer) eq_tapq:anon_406 - Reset vbucket 565 was completed succecssfully.\nSun Mar 3 14:46:24.751701 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 949 does not exist\nSun Mar 3 14:46:24.775075 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 950 does not exist\nSun Mar 3 14:46:24.777240 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 951 does not exist\nSun Mar 3 14:46:24.778791 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 952 does not exist\nSun Mar 3 14:46:24.780117 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 953 does not exist\nSun Mar 3 14:46:24.781488 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 954 does not exist\nSun Mar 3 14:46:24.782688 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 955 does not exist\nSun Mar 3 14:46:24.784157 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 956 does not exist\nSun Mar 3 14:46:24.788277 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 957 does not exist\nSun Mar 3 14:46:24.789895 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 958 does not exist\nSun Mar 3 14:46:24.791705 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Connection is re-established. Rollback unacked messages...\nSun Mar 3 14:46:24.791773 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0\nSun Mar 3 14:46:24.791803 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0\nSun Mar 3 14:46:24.792373 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 959 does not exist\nSun Mar 3 14:46:24.792410 PST 3: Schedule cleanup of "eq_tapq:anon_407"\nSun Mar 3 14:46:24.793240 PST 3: TAP (Producer) eq_tapq:anon_407 - Clear the tap queues by force\nSun Mar 3 14:46:24.794013 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 960 does not exist\nSun Mar 3 14:46:24.795510 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 961 does not exist\nSun Mar 3 14:46:24.797628 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.94 - Failed to set the TAP cursor to the open checkpoint because the TAP checkpoint state for vbucket 962 does not exist', u'shortText': u'message', u'module': u'ns_port_server', u'tstamp': 1362350785317.0, u'type': u'info'}

      [2013-03-03 14:42:35,364] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.91', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.3.3.92','ns_1@10.3.3.82',\n 'ns_1@10.3.3.93','ns_1@10.3.3.99'], EjectNodes = ['ns_1@10.3.3.91',\n 'ns_1@10.3.3.97',\n 'ns_1@10.3.3.94']\n (repeated 1 times)", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1362350764434.0, u'type': u'info'}

      [2013-03-03 14:42:35,364] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.91', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1362350742736.0, u'type': u'info'}

      [2013-03-03 14:42:35,365] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.91', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'module': u'ns_rebalancer', u'tstamp': 1362350742057.0, u'type': u'info'}

      [2013-03-03 14:42:35,365] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.82', u'code': 1, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.3.82\' in 2 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362350739122.0, u'type': u'info'}

      [2013-03-03 14:42:35,366] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.82', u'code': 1, u'text': u"Couchbase Server has started on web port 8091 on node 'ns_1@10.3.3.82'.", u'shortText': u'web start ok', u'module': u'menelaus_sup', u'tstamp': 1362350734988.0, u'type': u'info'}

      [2013-03-03 14:42:35,366] - [rest_client:890] ERROR -

      {u'node': u'ns_1@10.3.3.97', u'code': 4, u'text': u"Node 'ns_1@10.3.3.97' saw that node 'ns_1@10.3.3.82' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1362350734608.0, u'type': u'info'}

      andrey@baranouski:~/repository/testrunner$ ssh rooot@10.3.3.99
      rooot@10.3.3.99's password:
      Permission denied, please try again.
      rooot@10.3.3.99's password:
      Permission denied, please try again.
      rooot@10.3.3.99's password:

      andrey@baranouski:~/repository/testrunner$ ssh root@10.3.3.99
      root@10.3.3.99's password:
      Permission denied, please try again.
      root@10.3.3.99's password:
      Last login: Tue Feb 5 07:46:07 2013 from 10.32.26.65
      [root@caper-007 ~]# cd /tmp/
      [root@caper-007 tmp]# sudo gdb /opt/couchbase/bin/memcached core.memcached.17695
      GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
      Copyright (C) 2009 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law. Type "show copying"
      and "show warranty" for details.
      This GDB was configured as "x86_64-redhat-linux-gnu".
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>...
      Reading symbols from /opt/couchbase/bin/memcached...done.
      [New Thread 17718]
      [New Thread 17720]
      [New Thread 17719]
      [New Thread 17717]
      [New Thread 17716]
      [New Thread 17715]
      [New Thread 17713]
      [New Thread 17712]
      [New Thread 17711]
      [New Thread 17710]
      [New Thread 17709]
      [New Thread 17704]
      [New Thread 17703]
      [New Thread 17695]
      Reading symbols from /opt/couchbase/lib/memcached/libmemcached_utilities.so.0...done.
      Loaded symbols for /opt/couchbase/lib/memcached/libmemcached_utilities.so.0
      Reading symbols from /opt/couchbase/lib/libevent-2.0.so.5...done.
      Loaded symbols for /opt/couchbase/lib/libevent-2.0.so.5
      Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libdl.so.2
      Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libm.so.6
      Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib64/librt.so.1
      Reading symbols from /opt/couchbase/lib/libtcmalloc_minimal.so.4...done.
      Loaded symbols for /opt/couchbase/lib/libtcmalloc_minimal.so.4
      Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
      [Thread debugging using libthread_db enabled]
      Loaded symbols for /lib64/libpthread.so.0
      Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libc.so.6
      Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/ld-linux-x86-64.so.2
      Reading symbols from /usr/lib64/libstdc++.so.6...(no debugging symbols found)...done.
      Loaded symbols for /usr/lib64/libstdc++.so.6
      Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libgcc_s.so.1
      Reading symbols from /opt/couchbase/lib/memcached/stdin_term_handler.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/stdin_term_handler.so
      Reading symbols from /opt/couchbase/lib/memcached/file_logger.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/file_logger.so
      Reading symbols from /opt/couchbase/lib/memcached/bucket_engine.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/bucket_engine.so
      Reading symbols from /opt/couchbase/lib/memcached/ep.so...done.
      Loaded symbols for /opt/couchbase/lib/memcached/ep.so
      Reading symbols from /opt/couchbase/lib/libcouchstore.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libcouchstore.so.1
      Reading symbols from /opt/couchbase/lib/libsnappy.so.1...done.
      Loaded symbols for /opt/couchbase/lib/libsnappy.so.1
      Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
      Loaded symbols for /lib64/libnss_files.so.2

      warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff819fd000
      Core was generated by `/opt/couchbase/bin/memcached -X /opt/couchbase/lib/memcached/stdin_term_handler'.
      Program terminated with signal 11, Segmentation fault.
      #0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722
      722 daemon/thread.c: No such file or directory.
      in daemon/thread.c
      (gdb) t a a bt

      Thread 14 (Thread 0x2b9d8dcb6240 (LWP 17695)):
      #0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546000, tv=<value optimized out>) at epoll.c:404
      #2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546000, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000409742 in main (argc=<value optimized out>, argv=<value optimized out>) at daemon/memcached.c:7918

      Thread 13 (Thread 17703):
      #0 0x00002b9d8d51445b in read () from /lib64/libc.so.6
      #1 0x00002b9d8d4ba677 in _IO_new_file_underflow () from /lib64/libc.so.6
      #2 0x00002b9d8d4bb03e in _IO_default_uflow_internal () from /lib64/libc.so.6
      #3 0x00002b9d8d4b0124 in _IO_getline_info_internal () from /lib64/libc.so.6
      #4 0x00002b9d8d4aefc9 in fgets () from /lib64/libc.so.6
      #5 0x00002b9d8dcb7939 in check_stdin_thread (arg=<value optimized out>) at extensions/daemon/stdin_check.c:37
      #6 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #7 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 12 (Thread 17704):
      #0 0x00002b9d8d23f1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00002aaaaaaae4d6 in logger_thead_main (arg=0x11cde040) at extensions/loggers/file_logger.c:368
      #2 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #3 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 11 (Thread 17709):
      #0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546500, tv=<value optimized out>) at epoll.c:404
      #2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546500, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000414504 in worker_libevent (arg=0x11ce1900) at daemon/thread.c:301
      #4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 10 (Thread 17710):
      #0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546280, tv=<value optimized out>) at epoll.c:404
      #2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546280, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000414504 in worker_libevent (arg=0x11ce19f8) at daemon/thread.c:301
      #4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 9 (Thread 17711):
      #0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546c80, tv=<value optimized out>) at epoll.c:404
      #2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546c80, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000414504 in worker_libevent (arg=0x11ce1af0) at daemon/thread.c:301
      #4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 8 (Thread 17712):
      #0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546a00, tv=<value optimized out>) at epoll.c:404
      #2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546a00, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000414504 in worker_libevent (arg=0x11ce1be8) at daemon/thread.c:301
      #4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      --Type <return> to continue, or q <return> to quit--
      Thread 7 (Thread 17713):
      #0 0x00002b9d8d522648 in epoll_wait () from /lib64/libc.so.6
      #1 0x00002b9d8c72f576 in epoll_dispatch (base=0x16546780, tv=<value optimized out>) at epoll.c:404
      #2 0x00002b9d8c71ae44 in event_base_loop (base=0x16546780, flags=<value optimized out>) at event.c:1558
      #3 0x0000000000414504 in worker_libevent (arg=0x11ce1ce0) at daemon/thread.c:301
      #4 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #5 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 6 (Thread 17715):
      #0 0x00002b9d8d4e8221 in nanosleep () from /lib64/libc.so.6
      #1 0x00002b9d8d51bba4 in usleep () from /lib64/libc.so.6
      #2 0x00002aaaaaf31945 in updateStatsThread (arg=0x11cde4c0) at src/memory_tracker.cc:31
      #3 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #4 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 5 (Thread 17716):

      #0 0x00002b9d8d005ed9 in (anonymous namespace)::GetSizeWithCallback(void const*, unsigned long (void const*)) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #1 0x00002b9d8d006669 in TCMallocImplementation::GetAllocatedSize(void const*) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #2 0x00002b9d8d0172e8 in MallocExtension_GetAllocatedSize () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #3 0x00002aaaaaf316b4 in NewHook (ptr=0x1a6fa3c0) at src/memory_tracker.cc:48
      #4 0x00002b9d8d0138e5 in MallocHook::InvokeNewHookSlow(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #5 0x00002b9d8d006ca7 in MallocHook::InvokeNewHook(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #6 0x00002b9d8d0198a4 in tc_new () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #7 0x00002aaaaaf6f638 in CouchKVStore::set (this=0x165b0000, itm=..., cb=<value optimized out>) at src/couch-kvstore/couch-kvstore.cc:343
      #8 0x00002aaaaaefd234 in EventuallyPersistentStore::flushOneDelOrSet (this=0x1653e480, qi=..., rejectQueue=std::queue wrapping: std::deque with 0 elements, vb=...) at src/ep.cc:2420
      #9 0x00002aaaaaefd4fb in EventuallyPersistentStore::flushOne (this=0x1653e480, queue=<value optimized out>, rejectQueue=std::queue wrapping: std::deque with 0 elements, vb=...) at src/ep.cc:2468
      #10 0x00002aaaaaf00ff5 in EventuallyPersistentStore::flushVBQueue (this=0x1653e480, vb=..., vb_queue=std::queue wrapping: std::deque with 250 elements =

      {...}

      , vbid=565, data_age=0) at src/ep.cc:2022
      #11 0x00002aaaaaf0224c in EventuallyPersistentStore::flushOutgoingQueue (this=0x1653e480, flushQueue=0x1653e748, flushPhase=@0x1653c570, nextVbid=@0x1653c578) at src/ep.cc:1964
      #12 0x00002aaaaaf2b9cc in Flusher::doFlush (this=0x1653c480) at src/flusher.cc:245
      #13 0x00002aaaaaf2c805 in Flusher::step (this=0x1653c480, d=..., tid=...) at src/flusher.cc:158
      #14 0x00002aaaaaef473a in Dispatcher::run (this=0x16582c40) at src/dispatcher.cc:173
      #15 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16582c40) at src/dispatcher.cc:28
      #16 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #17 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 4 (Thread 17717):
      #0 0x00002b9d8d23f1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00002aaaaaef2078 in wait (this=0x165c6090, d=...) at src/syncobject.hh:58
      #2 IdleTask::run (this=0x165c6090, d=...) at src/dispatcher.cc:336
      #3 0x00002aaaaaef473a in Dispatcher::run (this=0x16582a80) at src/dispatcher.cc:173
      #4 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16582a80) at src/dispatcher.cc:28
      #5 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #6 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 3 (Thread 17719):
      #0 get (mem=48) at src/atomic.hh:86
      #1 operator bool (mem=48) at src/atomic.hh:95
      #2 ObjectRegistry::memoryAllocated (mem=48) at src/objectregistry.cc:137
      #3 0x00002b9d8d0138e5 in MallocHook::InvokeNewHookSlow(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #4 0x00002b9d8d006ca7 in MallocHook::InvokeNewHook(void const*, unsigned long) () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #5 0x00002b9d8d0198a4 in tc_new () from /opt/couchbase/lib/libtcmalloc_minimal.so.4
      #6 0x00002b9d8d842861 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) ()
      from /usr/lib64/libstdc++.so.6
      #7 0x00002b9d8d843365 in ?? () from /usr/lib64/libstdc++.so.6
      #8 0x00002b9d8d84345a in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, unsigned long, std::allocator<char> const&) () from /usr/lib64/libstdc++.so.6
      --Type <return> to continue, or q <return> to quit--
      #9 0x00002aaaaaee3dd9 in getKey (this=0x16d5fe00, v=<value optimized out>) at src/stored-value.hh:195
      #10 BackFillVisitor::visit (this=0x16d5fe00, v=<value optimized out>) at src/backfill.cc:143
      #11 0x00002aaaaaf36c75 in HashTable::visit (this=0x171bbc08, visitor=...) at src/stored-value.cc:404
      #12 0x00002aaaaaef8bc2 in VBCBAdaptor::callback (this=0x1e5197a0, d=..., t=...) at src/ep.cc:2850
      #13 0x00002aaaaaef473a in Dispatcher::run (this=0x165836c0) at src/dispatcher.cc:173
      #14 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x165836c0) at src/dispatcher.cc:28
      #15 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #16 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 2 (Thread 17720):
      #0 0x00002b9d8d23f1c0 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1 0x00002aaaaaf10eaf in wait (this=0x16542000) at src/syncobject.hh:58
      #2 wait (this=0x16542000) at src/syncobject.hh:74
      #3 wait (this=0x16542000) at src/tapconnmap.hh:169
      #4 EventuallyPersistentEngine::notifyPendingConnections (this=0x16542000) at src/ep_engine.cc:3423
      #5 0x00002aaaaaf10f93 in EvpNotifyPendingConns (arg=0x16542000) at src/ep_engine.cc:1145
      #6 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #7 0x00002b9d8d52225d in clone () from /lib64/libc.so.6

      Thread 1 (Thread 0x47718940 (LWP 17718)):
      #0 add_conn_to_pending_io_list (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:722
      #1 notify_io_complete (cookie=0x164c7340, status=ENGINE_SUCCESS) at daemon/thread.c:488
      #2 0x00002aaaaaf4a4fd in notifyIOComplete (this=<value optimized out>, tc=0x16d13400) at src/ep_engine.h:439
      #3 TapConnMap::notifyPausedConnection_UNLOCKED (this=<value optimized out>, tc=0x16d13400) at src/tapconnmap.cc:347
      #4 0x00002aaaaaee4901 in performTapOp<void*> (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/tapconnmap.hh:119
      #5 BackfillDiskLoad::callback (this=0x173d3f80, d=<value optimized out>, t=<value optimized out>) at src/backfill.cc:78
      #6 0x00002aaaaaef473a in Dispatcher::run (this=0x16583880) at src/dispatcher.cc:173
      #7 0x00002aaaaaef503b in launch_dispatcher_thread (arg=0x16583880) at src/dispatcher.cc:28
      #8 0x00002b9d8d23a77d in start_thread () from /lib64/libpthread.so.0
      #9 0x00002b9d8d52225d in clone () from /lib64/libc.so.6
      (gdb)
      (gdb) Quit
      (gdb) quit

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mikew Mike Wiederhold [X] (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty