Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31074

After cluster upgrade, clusterCompatibility still at old value

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Duplicate
    • 6.0.0
    • 6.0.0
    • ns_server
    •  6.0.0 build 1567

    Description

      Title
      Original was: Eventing API's detect wrong version after upgrade when upgraded using online upgrade using failover
      Updated to reflect it is seen that in /pools/default listing itself and so not specific to eventing.

      Script to Repro

      ./testrunner -i /tmp/upgrade3.ini -p get-cbcollect-info=True -t eventing.eventing_upgrade.EventingUpgrade.test_online_upgrade_with_failover_rebalance_with_eventing,nodes_init=4,dataset=default,groups=simple,skip_cleanup=True,initial_version=5.5.0-2958,doc-per-day=2,upgrade_version=6.0.0-1567
      

      Steps to Repro

      • Create a 4 node cluster kv-eventing-index-n1ql in 5.5.0-2958
      • Deploy a bucket op function
      • Add 4 alice nodes kv-eventing-index-n1ql
      • Failover all the old vulcan nodes and rebalance out all the nodes.
      • Deploy a timer function using the following API which fails with "Function requires 6.0 but cluster is at 5.5"

      Request

      2018-08-2802:30:11,
      061- root - ERROR - POST http://172.23.104.91:8091/_p/event/setApplication/?name=test_import_function_2 body:{  
         "depcfg":{  
            "buckets":[  
               {  
                  "alias":"dst_bucket",
                  "bucket_name":"dst_bucket1"
               }
            ],
            "source_bucket":"src_bucket",
            "metadata_bucket":"metadata"
         },
         "appcode":"function OnUpdate(doc,meta) {\n    var expiry = new Date();\n    expiry.setSeconds(expiry.getSeconds() + 5);\n\n    var context = {docID : meta.id};\n    createTimer(NDtimerCallback,  expiry, meta.id, context);\n}\nfunction NDtimerCallback(context) {\n    dst_bucket[context.docID] = 'from NDtimerCallback';\n}",
         "id":0,
         "settings":{  
            "enable_recursive_mutation":false,
            "app_log_max_files":10,
            "curl_timeout":500,
            "skip_timer_threshold":86400,
            "dcp_stream_boundary":"everything",
            "use_memory_manager":true,
            "persist_interval":5000,
            "sock_batch_size":100,
            "dcp_num_connections":1,
            "enable_snapshot_smr":false,
            "log_level":"TRACE",
            "min_page_items":50,
            "fuzz_offset":0,
            "max_delta_chain_len":200,
            "xattr_doc_timer_entry_prune_threshold":100,
            "worker_feedback_queue_cap":10000,
            "tick_duration":60000,
            "deadline_timeout":3,
            "app_log_max_size":10485760,
            "max_page_items":400,
            "worker_count":3,
            "lss_read_ahead_size":1048576,
            "deployment_status":true,
            "lss_cleaner_threshold":30,
            "description":"",
            "dcp_gen_chan_size":10000,
            "lss_cleaner_max_threshold":70,
            "feedback_batch_size":100,
            "auto_swapper":true,
            "worker_queue_cap":100000,
            "cpp_worker_thread_count":2,
            "cron_timers_per_doc":1000,
            "feedback_read_buffer_size":65536,
            "execution_timeout":1,
            "processing_status":true,
            "cleanup_timers":false,
            "timer_processing_tick_interval":500,
            "breakpad_on":true,
            "lcb_inst_capacity":5,
            "vb_ownership_giveup_routine_count":3,
            "data_chan_size":10000,
            "vb_ownership_takeover_routine_count":3,
            "checkpoint_interval":10000
         },
         "appname":"test_import_function_2"
      }
      

      Response

      headers:{  
         'Content-type':'application/json',
         'Authorization':'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n'
      }error:406reason:unknown{  
         "name":"ERR_CLUSTER_VERSION",
         "code":42,
         "description":"This function syntax is unsupported on current cluster version",
         "attributes":null,
         "runtime_info":{  
            "code":42,
            "info":"Function requires 6.0 but cluster is at 5.5"
         }
      }
      

      However entire cluster is already in 6.0.0 before we run the cluster. This doesn't seem to happen from UI but only through this API which we extensively use in automation.
      At the same time this API works fine for upgrade when we upgrade through swap rebalance and regular rebalance.

      Logs attached.

      Automation Log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/56/consoleText

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Balakumaran.Gopal Balakumaran Gopal created issue -
            Balakumaran.Gopal Balakumaran Gopal made changes -
            Field Original Value New Value
            Summary Eventing API's detect wrong version after upgrade when upgrade through online upgrade using failover Eventing API's detect wrong version after upgrade when upgraded using online upgrade using failover
            jeelan.poola Jeelan Poola made changes -
            Assignee Jeelan Poola [ jeelan.poola ] Sriram Melkote [ siri ]
            Balakumaran.Gopal Balakumaran Gopal made changes -
            Labels functional-test

            Bala, I checked our code and we fetch /pools/default and examine node version each time we deploy. So we should eliminate the possibility that /pools/default was itself behind cluster status (which would be then a ns_server issue). As we discussed, could you please log the output of /pools/default before doing the deployment? Thank you so much for this change, I'm requesting as it is timing related.

            siri Sriram Melkote (Inactive) added a comment - Bala, I checked our code and we fetch /pools/default and examine node version each time we deploy. So we should eliminate the possibility that /pools/default was itself behind cluster status (which would be then a ns_server issue). As we discussed, could you please log the output of /pools/default before doing the deployment? Thank you so much for this change, I'm requesting as it is timing related.

            Sriram Melkote - I printed the pools/default result after upgrade complete. Looks like ns_server is returning correct version.

            Output

            {
                "alerts": [], 
                "alertsSilenceURL": "/controller/resetAlerts?uuid=0c9da2be4b4dd709d908262f19b19ae9&token=0", 
                "autoCompactionSettings": {
                    "databaseFragmentationThreshold": {
                        "percentage": 30, 
                        "size": "undefined"
                    }, 
                    "indexCircularCompaction": {
                        "daysOfWeek": "Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday", 
                        "interval": {
                            "abortOutside": false, 
                            "fromHour": 0, 
                            "fromMinute": 0, 
                            "toHour": 0, 
                            "toMinute": 0
                        }
                    }, 
                    "indexCompactionMode": "circular", 
                    "indexFragmentationThreshold": {
                        "percentage": 30
                    }, 
                    "parallelDBAndViewCompaction": false, 
                    "viewFragmentationThreshold": {
                        "percentage": 30, 
                        "size": "undefined"
                    }
                }, 
                "balanced": true, 
                "buckets": {
                    "terseBucketsBase": "/pools/default/b/", 
                    "terseStreamingBucketsBase": "/pools/default/bs/", 
                    "uri": "/pools/default/buckets?v=30828773&uuid=0c9da2be4b4dd709d908262f19b19ae9"
                }, 
                "cbasMemoryQuota": 1117, 
                "checkPermissionsURI": "/pools/default/checkPermissions?v=N%2BNsWbIb5Z1FV3EXsBhHKEFdp1I%3D", 
                "clusterName": "", 
                "controllers": {
                    "addNode": {
                        "uri": "/controller/addNodeV2?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "clusterLogsCollection": {
                        "cancelURI": "/controller/cancelLogsCollection?uuid=0c9da2be4b4dd709d908262f19b19ae9", 
                        "startURI": "/controller/startLogsCollection?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "ejectNode": {
                        "uri": "/controller/ejectNode?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "failOver": {
                        "uri": "/controller/failOver?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "reAddNode": {
                        "uri": "/controller/reAddNode?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "reFailOver": {
                        "uri": "/controller/reFailOver?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "rebalance": {
                        "uri": "/controller/rebalance?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "replication": {
                        "createURI": "/controller/createReplication?uuid=0c9da2be4b4dd709d908262f19b19ae9", 
                        "validateURI": "/controller/createReplication?just_validate=1"
                    }, 
                    "setAutoCompaction": {
                        "uri": "/controller/setAutoCompaction?uuid=0c9da2be4b4dd709d908262f19b19ae9", 
                        "validateURI": "/controller/setAutoCompaction?just_validate=1"
                    }, 
                    "setRecoveryType": {
                        "uri": "/controller/setRecoveryType?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }, 
                    "startGracefulFailover": {
                        "uri": "/controller/startGracefulFailover?uuid=0c9da2be4b4dd709d908262f19b19ae9"
                    }
                }, 
                "counters": {
                    "failover": 4, 
                    "failover_complete": 4, 
                    "rebalance_start": 5, 
                    "rebalance_success": 4
                }, 
                "eventingMemoryQuota": 256, 
                "ftsMemoryQuota": 256, 
                "indexMemoryQuota": 512, 
                "indexStatusURI": "/indexStatus?v=21137658", 
                "maxBucketCount": 10, 
                "memoryQuota": 1884, 
                "name": "default", 
                "nodeStatusesUri": "/nodeStatuses", 
                "nodes": [
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.90:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.90:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.90:8091", 
                        "interestingStats": {
                            "cmd_get": 487, 
                            "couch_docs_actual_disk_size": 146832343, 
                            "couch_docs_data_size": 121259891, 
                            "couch_spatial_data_size": 0, 
                            "couch_spatial_disk_size": 0, 
                            "couch_views_actual_disk_size": 0, 
                            "couch_views_data_size": 0, 
                            "curr_items": 9090, 
                            "curr_items_tot": 9090, 
                            "ep_bg_fetched": 0, 
                            "get_hits": 487, 
                            "mem_used": 96246480, 
                            "ops": 487, 
                            "vb_active_num_non_resident": 0, 
                            "vb_replica_curr_items": 0
                        }, 
                        "mcdMemoryAllocated": 3220, 
                        "mcdMemoryReserved": 3220, 
                        "memoryFree": 3130527744, 
                        "memoryTotal": 4220846080, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.90", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "kv"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 11.45038167938931, 
                            "mem_free": 3130527744, 
                            "mem_total": 4220846080, 
                            "swap_total": 3758092288, 
                            "swap_used": 87867392
                        }, 
                        "thisNode": true, 
                        "uptime": "1123", 
                        "version": "6.0.0-1567-enterprise"
                    }, 
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.91:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.91:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.91:8091", 
                        "interestingStats": {}, 
                        "mcdMemoryAllocated": 3220, 
                        "mcdMemoryReserved": 3220, 
                        "memoryFree": 3177287680, 
                        "memoryTotal": 4220846080, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.91", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "eventing"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 24.28940568475452, 
                            "mem_free": 3177287680, 
                            "mem_total": 4220846080, 
                            "swap_total": 3758092288, 
                            "swap_used": 0
                        }, 
                        "uptime": "1123", 
                        "version": "6.0.0-1567-enterprise"
                    }, 
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.92:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.92:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.92:8091", 
                        "interestingStats": {}, 
                        "mcdMemoryAllocated": 3220, 
                        "mcdMemoryReserved": 3220, 
                        "memoryFree": 3267891200, 
                        "memoryTotal": 4220846080, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.92", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "index"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 2, 
                            "mem_free": 3267891200, 
                            "mem_total": 4220846080, 
                            "swap_total": 3758092288, 
                            "swap_used": 0
                        }, 
                        "uptime": "1128", 
                        "version": "6.0.0-1567-enterprise"
                    }, 
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.97:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.97:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.97:8091", 
                        "interestingStats": {}, 
                        "mcdMemoryAllocated": 3189, 
                        "mcdMemoryReserved": 3189, 
                        "memoryFree": 3212210176, 
                        "memoryTotal": 4179963904, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.97", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "n1ql"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 1.005025125628141, 
                            "mem_free": 3212210176, 
                            "mem_total": 4179963904, 
                            "swap_total": 0, 
                            "swap_used": 0
                        }, 
                        "uptime": "1115", 
                        "version": "6.0.0-1567-enterprise"
                    }
                ], 
                "rebalanceProgressUri": "/pools/default/rebalanceProgress", 
                "rebalanceStatus": "running", 
                "remoteClusters": {
                    "uri": "/pools/default/remoteClusters?uuid=0c9da2be4b4dd709d908262f19b19ae9", 
                    "validateURI": "/pools/default/remoteClusters?just_validate=1"
                }, 
                "serverGroupsUri": "/pools/default/serverGroups?v=62824633", 
                "stopRebalanceUri": "/controller/stopRebalance?uuid=0c9da2be4b4dd709d908262f19b19ae9", 
                "storageTotals": {
                    "hdd": {
                        "free": 28951971472, 
                        "quotaTotal": 33278128128, 
                        "total": 33278128128, 
                        "used": 4326156656, 
                        "usedByData": 146832343
                    }, 
                    "ram": {
                        "quotaTotal": 1975517184, 
                        "quotaTotalPerNode": 1975517184, 
                        "quotaUsed": 419430400, 
                        "quotaUsedPerNode": 419430400, 
                        "total": 4220846080, 
                        "used": 2551402496, 
                        "usedByData": 96246480
                    }
                }, 
                "tasks": {
                    "uri": "/pools/default/tasks?v=76297930"
                }
            } 
            

            Detailed automation Log runs available @ http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/57/console

            Balakumaran.Gopal Balakumaran Gopal added a comment - Sriram Melkote - I printed the pools/default result after upgrade complete. Looks like ns_server is returning correct version. Output { "alerts": [], "alertsSilenceURL": "/controller/resetAlerts?uuid=0c9da2be4b4dd709d908262f19b19ae9&token=0", "autoCompactionSettings": { "databaseFragmentationThreshold": { "percentage": 30, "size": "undefined" }, "indexCircularCompaction": { "daysOfWeek": "Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday", "interval": { "abortOutside": false, "fromHour": 0, "fromMinute": 0, "toHour": 0, "toMinute": 0 } }, "indexCompactionMode": "circular", "indexFragmentationThreshold": { "percentage": 30 }, "parallelDBAndViewCompaction": false, "viewFragmentationThreshold": { "percentage": 30, "size": "undefined" } }, "balanced": true, "buckets": { "terseBucketsBase": "/pools/default/b/", "terseStreamingBucketsBase": "/pools/default/bs/", "uri": "/pools/default/buckets?v=30828773&uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "cbasMemoryQuota": 1117, "checkPermissionsURI": "/pools/default/checkPermissions?v=N%2BNsWbIb5Z1FV3EXsBhHKEFdp1I%3D", "clusterName": "", "controllers": { "addNode": { "uri": "/controller/addNodeV2?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "clusterLogsCollection": { "cancelURI": "/controller/cancelLogsCollection?uuid=0c9da2be4b4dd709d908262f19b19ae9", "startURI": "/controller/startLogsCollection?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "ejectNode": { "uri": "/controller/ejectNode?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "failOver": { "uri": "/controller/failOver?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "reAddNode": { "uri": "/controller/reAddNode?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "reFailOver": { "uri": "/controller/reFailOver?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "rebalance": { "uri": "/controller/rebalance?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "replication": { "createURI": "/controller/createReplication?uuid=0c9da2be4b4dd709d908262f19b19ae9", "validateURI": "/controller/createReplication?just_validate=1" }, "setAutoCompaction": { "uri": "/controller/setAutoCompaction?uuid=0c9da2be4b4dd709d908262f19b19ae9", "validateURI": "/controller/setAutoCompaction?just_validate=1" }, "setRecoveryType": { "uri": "/controller/setRecoveryType?uuid=0c9da2be4b4dd709d908262f19b19ae9" }, "startGracefulFailover": { "uri": "/controller/startGracefulFailover?uuid=0c9da2be4b4dd709d908262f19b19ae9" } }, "counters": { "failover": 4, "failover_complete": 4, "rebalance_start": 5, "rebalance_success": 4 }, "eventingMemoryQuota": 256, "ftsMemoryQuota": 256, "indexMemoryQuota": 512, "indexStatusURI": "/indexStatus?v=21137658", "maxBucketCount": 10, "memoryQuota": 1884, "name": "default", "nodeStatusesUri": "/nodeStatuses", "nodes": [ { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.90:8092/", "couchApiBaseHTTPS": "https://172.23.104.90:18092/", "cpuCount": 4, "hostname": "172.23.104.90:8091", "interestingStats": { "cmd_get": 487, "couch_docs_actual_disk_size": 146832343, "couch_docs_data_size": 121259891, "couch_spatial_data_size": 0, "couch_spatial_disk_size": 0, "couch_views_actual_disk_size": 0, "couch_views_data_size": 0, "curr_items": 9090, "curr_items_tot": 9090, "ep_bg_fetched": 0, "get_hits": 487, "mem_used": 96246480, "ops": 487, "vb_active_num_non_resident": 0, "vb_replica_curr_items": 0 }, "mcdMemoryAllocated": 3220, "mcdMemoryReserved": 3220, "memoryFree": 3130527744, "memoryTotal": 4220846080, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.90", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "kv" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 11.45038167938931, "mem_free": 3130527744, "mem_total": 4220846080, "swap_total": 3758092288, "swap_used": 87867392 }, "thisNode": true, "uptime": "1123", "version": "6.0.0-1567-enterprise" }, { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.91:8092/", "couchApiBaseHTTPS": "https://172.23.104.91:18092/", "cpuCount": 4, "hostname": "172.23.104.91:8091", "interestingStats": {}, "mcdMemoryAllocated": 3220, "mcdMemoryReserved": 3220, "memoryFree": 3177287680, "memoryTotal": 4220846080, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.91", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "eventing" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 24.28940568475452, "mem_free": 3177287680, "mem_total": 4220846080, "swap_total": 3758092288, "swap_used": 0 }, "uptime": "1123", "version": "6.0.0-1567-enterprise" }, { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.92:8092/", "couchApiBaseHTTPS": "https://172.23.104.92:18092/", "cpuCount": 4, "hostname": "172.23.104.92:8091", "interestingStats": {}, "mcdMemoryAllocated": 3220, "mcdMemoryReserved": 3220, "memoryFree": 3267891200, "memoryTotal": 4220846080, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.92", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "index" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 2, "mem_free": 3267891200, "mem_total": 4220846080, "swap_total": 3758092288, "swap_used": 0 }, "uptime": "1128", "version": "6.0.0-1567-enterprise" }, { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.97:8092/", "couchApiBaseHTTPS": "https://172.23.104.97:18092/", "cpuCount": 4, "hostname": "172.23.104.97:8091", "interestingStats": {}, "mcdMemoryAllocated": 3189, "mcdMemoryReserved": 3189, "memoryFree": 3212210176, "memoryTotal": 4179963904, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.97", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "n1ql" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 1.005025125628141, "mem_free": 3212210176, "mem_total": 4179963904, "swap_total": 0, "swap_used": 0 }, "uptime": "1115", "version": "6.0.0-1567-enterprise" } ], "rebalanceProgressUri": "/pools/default/rebalanceProgress", "rebalanceStatus": "running", "remoteClusters": { "uri": "/pools/default/remoteClusters?uuid=0c9da2be4b4dd709d908262f19b19ae9", "validateURI": "/pools/default/remoteClusters?just_validate=1" }, "serverGroupsUri": "/pools/default/serverGroups?v=62824633", "stopRebalanceUri": "/controller/stopRebalance?uuid=0c9da2be4b4dd709d908262f19b19ae9", "storageTotals": { "hdd": { "free": 28951971472, "quotaTotal": 33278128128, "total": 33278128128, "used": 4326156656, "usedByData": 146832343 }, "ram": { "quotaTotal": 1975517184, "quotaTotalPerNode": 1975517184, "quotaUsed": 419430400, "quotaUsedPerNode": 419430400, "total": 4220846080, "used": 2551402496, "usedByData": 96246480 } }, "tasks": { "uri": "/pools/default/tasks?v=76297930" } } Detailed automation Log runs available @ http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/57/console
            jeelan.poola Jeelan Poola made changes -
            Due Date 26/Sep/18
            jeelan.poola Jeelan Poola made changes -
            Due Date 26/Sep/18 05/Sep/18

            Bala, thanks a lot - would it be possible to query the same node to which the POST is done, so we can eliminate the possibility the nodes themselves disagree on the status of node versions?

            siri Sriram Melkote (Inactive) added a comment - Bala, thanks a lot - would it be possible to query the same node to which the POST is done, so we can eliminate the possibility the nodes themselves disagree on the status of node versions?

            Sriram Melkote - I have printed the o/p from both master and eventing node. Looks consistent to me.

            Below is the o/p from eventing node.

            {
                "alerts": [], 
                "alertsSilenceURL": "/controller/resetAlerts?uuid=d1669ef4edf3775e32ad2c3dbadd098d&token=0", 
                "autoCompactionSettings": {
                    "databaseFragmentationThreshold": {
                        "percentage": 30, 
                        "size": "undefined"
                    }, 
                    "indexCircularCompaction": {
                        "daysOfWeek": "Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday", 
                        "interval": {
                            "abortOutside": false, 
                            "fromHour": 0, 
                            "fromMinute": 0, 
                            "toHour": 0, 
                            "toMinute": 0
                        }
                    }, 
                    "indexCompactionMode": "circular", 
                    "indexFragmentationThreshold": {
                        "percentage": 30
                    }, 
                    "parallelDBAndViewCompaction": false, 
                    "viewFragmentationThreshold": {
                        "percentage": 30, 
                        "size": "undefined"
                    }
                }, 
                "balanced": true, 
                "buckets": {
                    "terseBucketsBase": "/pools/default/b/", 
                    "terseStreamingBucketsBase": "/pools/default/bs/", 
                    "uri": "/pools/default/buckets?v=30828773&uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                }, 
                "cbasMemoryQuota": 1117, 
                "checkPermissionsURI": "/pools/default/checkPermissions?v=jBtQiQbeIGrZGhLZ4UFz2Ibcf%2Fg%3D", 
                "clusterName": "", 
                "controllers": {
                    "addNode": {
                        "uri": "/controller/addNodeV2?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "clusterLogsCollection": {
                        "cancelURI": "/controller/cancelLogsCollection?uuid=d1669ef4edf3775e32ad2c3dbadd098d", 
                        "startURI": "/controller/startLogsCollection?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "ejectNode": {
                        "uri": "/controller/ejectNode?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "failOver": {
                        "uri": "/controller/failOver?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "reAddNode": {
                        "uri": "/controller/reAddNode?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "reFailOver": {
                        "uri": "/controller/reFailOver?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "rebalance": {
                        "uri": "/controller/rebalance?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "replication": {
                        "createURI": "/controller/createReplication?uuid=d1669ef4edf3775e32ad2c3dbadd098d", 
                        "validateURI": "/controller/createReplication?just_validate=1"
                    }, 
                    "setAutoCompaction": {
                        "uri": "/controller/setAutoCompaction?uuid=d1669ef4edf3775e32ad2c3dbadd098d", 
                        "validateURI": "/controller/setAutoCompaction?just_validate=1"
                    }, 
                    "setRecoveryType": {
                        "uri": "/controller/setRecoveryType?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }, 
                    "startGracefulFailover": {
                        "uri": "/controller/startGracefulFailover?uuid=d1669ef4edf3775e32ad2c3dbadd098d"
                    }
                }, 
                "counters": {
                    "failover": 4, 
                    "failover_complete": 4, 
                    "rebalance_start": 5, 
                    "rebalance_success": 4
                }, 
                "eventingMemoryQuota": 256, 
                "ftsMemoryQuota": 256, 
                "indexMemoryQuota": 512, 
                "indexStatusURI": "/indexStatus?v=21137658", 
                "maxBucketCount": 10, 
                "memoryQuota": 1884, 
                "name": "default", 
                "nodeStatusesUri": "/nodeStatuses", 
                "nodes": [
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.90:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.90:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.90:8091", 
                        "interestingStats": {
                            "cmd_get": 1, 
                            "couch_docs_actual_disk_size": 144068788, 
                            "couch_docs_data_size": 118933363, 
                            "couch_spatial_data_size": 0, 
                            "couch_spatial_disk_size": 0, 
                            "couch_views_actual_disk_size": 0, 
                            "couch_views_data_size": 0, 
                            "curr_items": 9090, 
                            "curr_items_tot": 9090, 
                            "ep_bg_fetched": 0, 
                            "get_hits": 1, 
                            "mem_used": 96189648, 
                            "ops": 1, 
                            "vb_active_num_non_resident": 0, 
                            "vb_replica_curr_items": 0
                        }, 
                        "mcdMemoryAllocated": 3220, 
                        "mcdMemoryReserved": 3220, 
                        "memoryFree": 3111829504, 
                        "memoryTotal": 4220846080, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.90", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "kv"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 12.21374045801527, 
                            "mem_free": 3111829504, 
                            "mem_total": 4220846080, 
                            "swap_total": 3758092288, 
                            "swap_used": 87785472
                        }, 
                        "uptime": "1120", 
                        "version": "6.0.0-1567-enterprise"
                    }, 
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.91:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.91:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.91:8091", 
                        "interestingStats": {}, 
                        "mcdMemoryAllocated": 3220, 
                        "mcdMemoryReserved": 3220, 
                        "memoryFree": 3174424576, 
                        "memoryTotal": 4220846080, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.91", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "eventing"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 1.763224181360201, 
                            "mem_free": 3174424576, 
                            "mem_total": 4220846080, 
                            "swap_total": 3758092288, 
                            "swap_used": 0
                        }, 
                        "thisNode": true, 
                        "uptime": "1123", 
                        "version": "6.0.0-1567-enterprise"
                    }, 
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.92:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.92:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.92:8091", 
                        "interestingStats": {}, 
                        "mcdMemoryAllocated": 3220, 
                        "mcdMemoryReserved": 3220, 
                        "memoryFree": 3268419584, 
                        "memoryTotal": 4220846080, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.92", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "index"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 1.745635910224439, 
                            "mem_free": 3268419584, 
                            "mem_total": 4220846080, 
                            "swap_total": 3758092288, 
                            "swap_used": 0
                        }, 
                        "uptime": "1124", 
                        "version": "6.0.0-1567-enterprise"
                    }, 
                    {
                        "clusterCompatibility": 327685, 
                        "clusterMembership": "active", 
                        "couchApiBase": "http://172.23.104.97:8092/", 
                        "couchApiBaseHTTPS": "https://172.23.104.97:18092/", 
                        "cpuCount": 4, 
                        "hostname": "172.23.104.97:8091", 
                        "interestingStats": {}, 
                        "mcdMemoryAllocated": 3189, 
                        "mcdMemoryReserved": 3189, 
                        "memoryFree": 3195666432, 
                        "memoryTotal": 4179963904, 
                        "os": "x86_64-unknown-linux-gnu", 
                        "otpNode": "ns_1@172.23.104.97", 
                        "ports": {
                            "direct": 11210, 
                            "httpsCAPI": 18092, 
                            "httpsMgmt": 18091, 
                            "proxy": 11211
                        }, 
                        "recoveryType": "none", 
                        "services": [
                            "n1ql"
                        ], 
                        "status": "healthy", 
                        "systemStats": {
                            "cpu_utilization_rate": 1.5, 
                            "mem_free": 3195666432, 
                            "mem_total": 4179963904, 
                            "swap_total": 0, 
                            "swap_used": 0
                        }, 
                        "uptime": "1116", 
                        "version": "6.0.0-1567-enterprise"
                    }
                ], 
                "rebalanceProgressUri": "/pools/default/rebalanceProgress", 
                "rebalanceStatus": "running", 
                "remoteClusters": {
                    "uri": "/pools/default/remoteClusters?uuid=d1669ef4edf3775e32ad2c3dbadd098d", 
                    "validateURI": "/pools/default/remoteClusters?just_validate=1"
                }, 
                "serverGroupsUri": "/pools/default/serverGroups?v=62824633", 
                "stopRebalanceUri": "/controller/stopRebalance?uuid=d1669ef4edf3775e32ad2c3dbadd098d", 
                "storageTotals": {
                    "hdd": {
                        "free": 28951971472, 
                        "quotaTotal": 33278128128, 
                        "total": 33278128128, 
                        "used": 4326156656, 
                        "usedByData": 144068788
                    }, 
                    "ram": {
                        "quotaTotal": 1975517184, 
                        "quotaTotalPerNode": 1975517184, 
                        "quotaUsed": 419430400, 
                        "quotaUsedPerNode": 419430400, 
                        "total": 4220846080, 
                        "used": 2575118336, 
                        "usedByData": 96189648
                    }
                }, 
                "tasks": {
                    "uri": "/pools/default/tasks?v=109190074"
                }
            } 
            

            Detailed logs can be found @ http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/59/console

            Balakumaran.Gopal Balakumaran Gopal added a comment - Sriram Melkote - I have printed the o/p from both master and eventing node. Looks consistent to me. Below is the o/p from eventing node. { "alerts": [], "alertsSilenceURL": "/controller/resetAlerts?uuid=d1669ef4edf3775e32ad2c3dbadd098d&token=0", "autoCompactionSettings": { "databaseFragmentationThreshold": { "percentage": 30, "size": "undefined" }, "indexCircularCompaction": { "daysOfWeek": "Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday", "interval": { "abortOutside": false, "fromHour": 0, "fromMinute": 0, "toHour": 0, "toMinute": 0 } }, "indexCompactionMode": "circular", "indexFragmentationThreshold": { "percentage": 30 }, "parallelDBAndViewCompaction": false, "viewFragmentationThreshold": { "percentage": 30, "size": "undefined" } }, "balanced": true, "buckets": { "terseBucketsBase": "/pools/default/b/", "terseStreamingBucketsBase": "/pools/default/bs/", "uri": "/pools/default/buckets?v=30828773&uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "cbasMemoryQuota": 1117, "checkPermissionsURI": "/pools/default/checkPermissions?v=jBtQiQbeIGrZGhLZ4UFz2Ibcf%2Fg%3D", "clusterName": "", "controllers": { "addNode": { "uri": "/controller/addNodeV2?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "clusterLogsCollection": { "cancelURI": "/controller/cancelLogsCollection?uuid=d1669ef4edf3775e32ad2c3dbadd098d", "startURI": "/controller/startLogsCollection?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "ejectNode": { "uri": "/controller/ejectNode?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "failOver": { "uri": "/controller/failOver?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "reAddNode": { "uri": "/controller/reAddNode?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "reFailOver": { "uri": "/controller/reFailOver?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "rebalance": { "uri": "/controller/rebalance?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "replication": { "createURI": "/controller/createReplication?uuid=d1669ef4edf3775e32ad2c3dbadd098d", "validateURI": "/controller/createReplication?just_validate=1" }, "setAutoCompaction": { "uri": "/controller/setAutoCompaction?uuid=d1669ef4edf3775e32ad2c3dbadd098d", "validateURI": "/controller/setAutoCompaction?just_validate=1" }, "setRecoveryType": { "uri": "/controller/setRecoveryType?uuid=d1669ef4edf3775e32ad2c3dbadd098d" }, "startGracefulFailover": { "uri": "/controller/startGracefulFailover?uuid=d1669ef4edf3775e32ad2c3dbadd098d" } }, "counters": { "failover": 4, "failover_complete": 4, "rebalance_start": 5, "rebalance_success": 4 }, "eventingMemoryQuota": 256, "ftsMemoryQuota": 256, "indexMemoryQuota": 512, "indexStatusURI": "/indexStatus?v=21137658", "maxBucketCount": 10, "memoryQuota": 1884, "name": "default", "nodeStatusesUri": "/nodeStatuses", "nodes": [ { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.90:8092/", "couchApiBaseHTTPS": "https://172.23.104.90:18092/", "cpuCount": 4, "hostname": "172.23.104.90:8091", "interestingStats": { "cmd_get": 1, "couch_docs_actual_disk_size": 144068788, "couch_docs_data_size": 118933363, "couch_spatial_data_size": 0, "couch_spatial_disk_size": 0, "couch_views_actual_disk_size": 0, "couch_views_data_size": 0, "curr_items": 9090, "curr_items_tot": 9090, "ep_bg_fetched": 0, "get_hits": 1, "mem_used": 96189648, "ops": 1, "vb_active_num_non_resident": 0, "vb_replica_curr_items": 0 }, "mcdMemoryAllocated": 3220, "mcdMemoryReserved": 3220, "memoryFree": 3111829504, "memoryTotal": 4220846080, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.90", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "kv" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 12.21374045801527, "mem_free": 3111829504, "mem_total": 4220846080, "swap_total": 3758092288, "swap_used": 87785472 }, "uptime": "1120", "version": "6.0.0-1567-enterprise" }, { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.91:8092/", "couchApiBaseHTTPS": "https://172.23.104.91:18092/", "cpuCount": 4, "hostname": "172.23.104.91:8091", "interestingStats": {}, "mcdMemoryAllocated": 3220, "mcdMemoryReserved": 3220, "memoryFree": 3174424576, "memoryTotal": 4220846080, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.91", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "eventing" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 1.763224181360201, "mem_free": 3174424576, "mem_total": 4220846080, "swap_total": 3758092288, "swap_used": 0 }, "thisNode": true, "uptime": "1123", "version": "6.0.0-1567-enterprise" }, { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.92:8092/", "couchApiBaseHTTPS": "https://172.23.104.92:18092/", "cpuCount": 4, "hostname": "172.23.104.92:8091", "interestingStats": {}, "mcdMemoryAllocated": 3220, "mcdMemoryReserved": 3220, "memoryFree": 3268419584, "memoryTotal": 4220846080, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.92", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "index" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 1.745635910224439, "mem_free": 3268419584, "mem_total": 4220846080, "swap_total": 3758092288, "swap_used": 0 }, "uptime": "1124", "version": "6.0.0-1567-enterprise" }, { "clusterCompatibility": 327685, "clusterMembership": "active", "couchApiBase": "http://172.23.104.97:8092/", "couchApiBaseHTTPS": "https://172.23.104.97:18092/", "cpuCount": 4, "hostname": "172.23.104.97:8091", "interestingStats": {}, "mcdMemoryAllocated": 3189, "mcdMemoryReserved": 3189, "memoryFree": 3195666432, "memoryTotal": 4179963904, "os": "x86_64-unknown-linux-gnu", "otpNode": "ns_1@172.23.104.97", "ports": { "direct": 11210, "httpsCAPI": 18092, "httpsMgmt": 18091, "proxy": 11211 }, "recoveryType": "none", "services": [ "n1ql" ], "status": "healthy", "systemStats": { "cpu_utilization_rate": 1.5, "mem_free": 3195666432, "mem_total": 4179963904, "swap_total": 0, "swap_used": 0 }, "uptime": "1116", "version": "6.0.0-1567-enterprise" } ], "rebalanceProgressUri": "/pools/default/rebalanceProgress", "rebalanceStatus": "running", "remoteClusters": { "uri": "/pools/default/remoteClusters?uuid=d1669ef4edf3775e32ad2c3dbadd098d", "validateURI": "/pools/default/remoteClusters?just_validate=1" }, "serverGroupsUri": "/pools/default/serverGroups?v=62824633", "stopRebalanceUri": "/controller/stopRebalance?uuid=d1669ef4edf3775e32ad2c3dbadd098d", "storageTotals": { "hdd": { "free": 28951971472, "quotaTotal": 33278128128, "total": 33278128128, "used": 4326156656, "usedByData": 144068788 }, "ram": { "quotaTotal": 1975517184, "quotaTotalPerNode": 1975517184, "quotaUsed": 419430400, "quotaUsedPerNode": 419430400, "total": 4220846080, "used": 2575118336, "usedByData": 96189648 } }, "tasks": { "uri": "/pools/default/tasks?v=109190074" } } Detailed logs can be found @ http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/59/console

            Toy build http://server.jenkins.couchbase.com/view/Toys/job/toy-unix/3227/ has additional logs. Please see if we can run upgrade test on this as it logs more information about what is happening.

            siri Sriram Melkote (Inactive) added a comment - Toy build http://server.jenkins.couchbase.com/view/Toys/job/toy-unix/3227/ has additional logs. Please see if we can run upgrade test on this as it logs more information about what is happening.
            Balakumaran.Gopal Balakumaran Gopal added a comment - Sriram Melkote - Logs - http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/60/console cbcollect_info https://s3.amazonaws.com/bugdb/jira/upgrade_MB-31074/collectinfo-2018-09-04T111118-ns_1%40172.23.104.107.zip https://s3.amazonaws.com/bugdb/jira/upgrade_MB-31074/collectinfo-2018-09-04T111118-ns_1%40172.23.104.108.zip --->eventing https://s3.amazonaws.com/bugdb/jira/upgrade_MB-31074/collectinfo-2018-09-04T111118-ns_1%40172.23.104.109.zip https://s3.amazonaws.com/bugdb/jira/upgrade_MB-31074/collectinfo-2018-09-04T111118-ns_1%40172.23.104.90.zip
            siri Sriram Melkote (Inactive) added a comment - - edited

            Bala, it seems the test may have an issue. At start of the cluster setup, I see (393216 is 6.0):

            2018-09-04 03:44:17,738 - root - INFO - server: ip:172.23.104.108 ... 'clusterCompatibility': 393216
            

            And later, I see (327685 is 5.5):

            2018-09-04 04:05:34,543 - root - INFO - Output of pools/default from eventing node after upgrade is
            ... "clusterCompatibility": 327685, "couchApiBase": "http://172.23.104.108:8092/" ...
            

            In run http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/60/console but perhaps others as well

            siri Sriram Melkote (Inactive) added a comment - - edited Bala, it seems the test may have an issue. At start of the cluster setup, I see (393216 is 6.0): 2018-09-04 03:44:17,738 - root - INFO - server: ip:172.23.104.108 ... 'clusterCompatibility': 393216 And later, I see (327685 is 5.5): 2018-09-04 04:05:34,543 - root - INFO - Output of pools/default from eventing node after upgrade is ... "clusterCompatibility": 327685, "couchApiBase": "http://172.23.104.108:8092/" ... In run http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/60/console but perhaps others as well
            siri Sriram Melkote (Inactive) made changes -
            Assignee Sriram Melkote [ siri ] Balakumaran Gopal [ balakumaran.gopal ]
            siri Sriram Melkote (Inactive) made changes -
            Component/s test-execution [ 10231 ]
            Component/s eventing [ 14026 ]
            siri Sriram Melkote (Inactive) made changes -
            Attachment consoleText-2018-09-04040534 [ 58178 ]
            Balakumaran.Gopal Balakumaran Gopal added a comment - - edited

            Sriram Melkote - this is not a test issue, this is the hack through which we could run toy build in upgrade test.

            Basically we use the following steps in test.

            1. initially we have 8 nodes with no cb installed.
            2. install 5.5 on 4 nodes
            3. install 6.0 on 4 nodes.
            4. make 4 node 5.5 cluster.
            5. deploy bucket op eventing.
            6. add all 4 6.0 nodes to 5.5 cluster.
            7. failover/rebalance out all 4 5.5 nodes.
            8. deploy the timer function which was failing.

            Since there was no way to install toy in step 3, i installed 6.0 toy on all 8 nodes(externally) and commented out step 3, which gives the same result. 

            We certainly ensure at step5 all nodes are in 5.5 and step 8 all nodes are at 6.0 as confirmed by o/p of pools/default. i have also validated this by logging into cluster manually when the script was  running.

             

            Balakumaran.Gopal Balakumaran Gopal added a comment - - edited Sriram Melkote - this is not a test issue, this is the hack through which we could run toy build in upgrade test. Basically we use the following steps in test. initially we have 8 nodes with no cb installed. install 5.5 on 4 nodes install 6.0 on 4 nodes. make 4 node 5.5 cluster. deploy bucket op eventing. add all 4 6.0 nodes to 5.5 cluster. failover/rebalance out all 4 5.5 nodes. deploy the timer function which was failing. Since there was no way to install toy in step 3, i installed 6.0 toy on all 8 nodes(externally) and commented out step 3, which gives the same result.  We certainly ensure at step5 all nodes are in 5.5 and step 8 all nodes are at 6.0 as confirmed by o/p of pools/default. i have also validated this by logging into cluster manually when the script was  running.  
            siri Sriram Melkote (Inactive) added a comment - - edited

            Bala, even in the non-toy run, http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/57/console - we see:

            2018-08-29 01:31:54,890 - root - INFO - Output of pools/default after upgrade is 
            ... "clusterCompatibility": 327685, ...
            

            Note - clusterCompatibility/65536 = major ver, clusterCompatibility%65536 = minor ver

            Which indicates all nodes after upgrade are consistently at 5.5 – same pattern seen with toy build. In the toy build, the logging shows the same as well. Are you sure you didn't failover 6.0 nodes by mistake?

            siri Sriram Melkote (Inactive) added a comment - - edited Bala, even in the non-toy run, http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/57/console - we see: 2018-08-29 01:31:54,890 - root - INFO - Output of pools/default after upgrade is ... "clusterCompatibility": 327685, ... Note - clusterCompatibility/65536 = major ver, clusterCompatibility%65536 = minor ver Which indicates all nodes after upgrade are consistently at 5.5 – same pattern seen with toy build. In the toy build, the logging shows the same as well. Are you sure you didn't failover 6.0 nodes by mistake?
            Balakumaran.Gopal Balakumaran Gopal added a comment - - edited

            Yes..but in non toy we install 6.0(step 3)..its almost same unless this runs as part of conf file as we need to reset upgrade version.. Doesn't the cbcollect concur with the steps i shared ?

            upgraded cluster is still avail : http://172.23.104.108:8091/ui/index.html#!/servers/list

            old nodes : ns_1@172.23.104.103,ns_1@172.23.104.104,ns_1@172.23.104.105,ns_1@172.23.104.106
             

            Balakumaran.Gopal Balakumaran Gopal added a comment - - edited Yes..but in non toy we install 6.0(step 3)..its almost same unless this runs as part of conf file as we need to reset upgrade version.. Doesn't the cbcollect concur with the steps i shared ? upgraded cluster is still avail : http://172.23.104.108:8091/ui/index.html#!/servers/list old nodes : ns_1@172.23.104.103,ns_1@172.23.104.104,ns_1@172.23.104.105,ns_1@172.23.104.106  
            siri Sriram Melkote (Inactive) made changes -
            Component/s test-execution [ 10231 ]

            Bala, I'm not really sure what's happening then. As you'll see in run http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/59/console, at 2018-08-29 03:03:16 - the test says upgrade is done, but /pools/default reports clusterCompatibility: 327685 (5.5) for all nodes, and master and eventing nodes agree on this.

            So eventing's behaviour is correct - as /pools/default is reporting cluster is 5.5, we're refusing to deploy 6.0 syntax.

            You mentioned this is a regression. We need to consider the possibility something has changed in ns_server. Will it be possible to run the test on older build? If we see /pools/default reported clusterCompatibility changed between builds, we can ask for help from ns_server team.

            siri Sriram Melkote (Inactive) added a comment - Bala, I'm not really sure what's happening then. As you'll see in run http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/59/console , at 2018-08-29 03:03:16 - the test says upgrade is done, but /pools/default reports clusterCompatibility: 327685 (5.5) for all nodes, and master and eventing nodes agree on this. So eventing's behaviour is correct - as /pools/default is reporting cluster is 5.5, we're refusing to deploy 6.0 syntax. You mentioned this is a regression. We need to consider the possibility something has changed in ns_server. Will it be possible to run the test on older build? If we see /pools/default reported clusterCompatibility changed between builds, we can ask for help from ns_server team.

            Sure, I can test on older build. However, Its not a regression. This is the first time we are testing this.

            Balakumaran.Gopal Balakumaran Gopal added a comment - Sure, I can test on older build. However, Its not a regression. This is the first time we are testing this.
            Balakumaran.Gopal Balakumaran Gopal made changes -
            Is this a Regression? Unknown [ 10452 ] No [ 10451 ]

            Bala, as it's not a regression, it's not worth testing older builds. Perhaps waiting longer after rebalance finishes may be worthwhile to try. You could poll /pools/default until "clusterCompatibility": 327685 no longer appears before proceeding to the next step.

            siri Sriram Melkote (Inactive) added a comment - Bala, as it's not a regression, it's not worth testing older builds. Perhaps waiting longer after rebalance finishes may be worthwhile to try. You could poll /pools/default until "clusterCompatibility": 327685 no longer appears before proceeding to the next step.

            I tried it on a older build 6.0.0-1432. It worked fine.  "clusterCompatibility": 327685 in there too.

            log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/61/console

            Balakumaran.Gopal Balakumaran Gopal added a comment - I tried it on a older build 6.0.0-1432. It worked fine.  "clusterCompatibility": 327685 in there too. log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/61/console

            1432 did not enforce cluster version rules as it did not have this commit. Hence it would have ignored the issue.
            7c40b1a MB-30940: Do not allow timers if cluster has vulcan nodes

            siri Sriram Melkote (Inactive) added a comment - 1432 did not enforce cluster version rules as it did not have this commit. Hence it would have ignored the issue. 7c40b1a MB-30940 : Do not allow timers if cluster has vulcan nodes
            lynn.straus Lynn Straus made changes -
            Component/s eventing [ 14026 ]
            siri Sriram Melkote (Inactive) made changes -
            Component/s eventing [ 14026 ]
            siri Sriram Melkote (Inactive) made changes -
            Summary Eventing API's detect wrong version after upgrade when upgraded using online upgrade using failover After cluster upgrade, clusterCompatibility still reports old number
            siri Sriram Melkote (Inactive) made changes -
            Summary After cluster upgrade, clusterCompatibility still reports old number After cluster upgrade, clusterCompatibility still at old value
            siri Sriram Melkote (Inactive) made changes -
            Description +Script to Repro+
            {noformat}./testrunner -i /tmp/upgrade3.ini -p get-cbcollect-info=True -t eventing.eventing_upgrade.EventingUpgrade.test_online_upgrade_with_failover_rebalance_with_eventing,nodes_init=4,dataset=default,groups=simple,skip_cleanup=True,initial_version=5.5.0-2958,doc-per-day=2,upgrade_version=6.0.0-1567
            {noformat}
            +Steps to Repro+
             * Create a 4 node cluster kv-eventing-index-n1ql in 5.5.0-2958
             * Deploy a bucket op function
             * Add 4 alice nodes kv-eventing-index-n1ql
             * Failover all the old vulcan nodes and rebalance out all the nodes.
             * Deploy a timer function using the following API which fails with "*Function requires 6.0 but cluster is at 5.5"*

            +Request+
            {noformat}2018-08-2802:30:11,
            061- root - ERROR - POST http://172.23.104.91:8091/_p/event/setApplication/?name=test_import_function_2 body:{
               "depcfg":{
                  "buckets":[
                     {
                        "alias":"dst_bucket",
                        "bucket_name":"dst_bucket1"
                     }
                  ],
                  "source_bucket":"src_bucket",
                  "metadata_bucket":"metadata"
               },
               "appcode":"function OnUpdate(doc,meta) {\n var expiry = new Date();\n expiry.setSeconds(expiry.getSeconds() + 5);\n\n var context = {docID : meta.id};\n createTimer(NDtimerCallback, expiry, meta.id, context);\n}\nfunction NDtimerCallback(context) {\n dst_bucket[context.docID] = 'from NDtimerCallback';\n}",
               "id":0,
               "settings":{
                  "enable_recursive_mutation":false,
                  "app_log_max_files":10,
                  "curl_timeout":500,
                  "skip_timer_threshold":86400,
                  "dcp_stream_boundary":"everything",
                  "use_memory_manager":true,
                  "persist_interval":5000,
                  "sock_batch_size":100,
                  "dcp_num_connections":1,
                  "enable_snapshot_smr":false,
                  "log_level":"TRACE",
                  "min_page_items":50,
                  "fuzz_offset":0,
                  "max_delta_chain_len":200,
                  "xattr_doc_timer_entry_prune_threshold":100,
                  "worker_feedback_queue_cap":10000,
                  "tick_duration":60000,
                  "deadline_timeout":3,
                  "app_log_max_size":10485760,
                  "max_page_items":400,
                  "worker_count":3,
                  "lss_read_ahead_size":1048576,
                  "deployment_status":true,
                  "lss_cleaner_threshold":30,
                  "description":"",
                  "dcp_gen_chan_size":10000,
                  "lss_cleaner_max_threshold":70,
                  "feedback_batch_size":100,
                  "auto_swapper":true,
                  "worker_queue_cap":100000,
                  "cpp_worker_thread_count":2,
                  "cron_timers_per_doc":1000,
                  "feedback_read_buffer_size":65536,
                  "execution_timeout":1,
                  "processing_status":true,
                  "cleanup_timers":false,
                  "timer_processing_tick_interval":500,
                  "breakpad_on":true,
                  "lcb_inst_capacity":5,
                  "vb_ownership_giveup_routine_count":3,
                  "data_chan_size":10000,
                  "vb_ownership_takeover_routine_count":3,
                  "checkpoint_interval":10000
               },
               "appname":"test_import_function_2"
            }
            {noformat}
            +Response+
            {noformat}headers:{
               'Content-type':'application/json',
               'Authorization':'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n'
            }error:406reason:unknown{
               "name":"ERR_CLUSTER_VERSION",
               "code":42,
               "description":"This function syntax is unsupported on current cluster version",
               "attributes":null,
               "runtime_info":{
                  "code":42,
                  "info":"Function requires 6.0 but cluster is at 5.5"
               }
            }
            {noformat}
            However entire cluster is already in 6.0.0 before we run the cluster. This doesn't seem to happen from UI but only through this API which we extensively use in automation.
             At the same time this API works fine for upgrade when we upgrade through swap rebalance and regular rebalance.

            Logs attached.

            Automation Log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/56/consoleText
            +Title+
            Original was: Eventing API's detect wrong version after upgrade when upgraded using online upgrade using failover
            Updated to reflect it is seen that in /pools/default listing itself and so not specific to eventing.

            +Script to Repro+
            {noformat}./testrunner -i /tmp/upgrade3.ini -p get-cbcollect-info=True -t eventing.eventing_upgrade.EventingUpgrade.test_online_upgrade_with_failover_rebalance_with_eventing,nodes_init=4,dataset=default,groups=simple,skip_cleanup=True,initial_version=5.5.0-2958,doc-per-day=2,upgrade_version=6.0.0-1567
            {noformat}
            +Steps to Repro+
             * Create a 4 node cluster kv-eventing-index-n1ql in 5.5.0-2958
             * Deploy a bucket op function
             * Add 4 alice nodes kv-eventing-index-n1ql
             * Failover all the old vulcan nodes and rebalance out all the nodes.
             * Deploy a timer function using the following API which fails with "*Function requires 6.0 but cluster is at 5.5"*

            +Request+
            {noformat}2018-08-2802:30:11,
            061- root - ERROR - POST http://172.23.104.91:8091/_p/event/setApplication/?name=test_import_function_2 body:{
               "depcfg":{
                  "buckets":[
                     {
                        "alias":"dst_bucket",
                        "bucket_name":"dst_bucket1"
                     }
                  ],
                  "source_bucket":"src_bucket",
                  "metadata_bucket":"metadata"
               },
               "appcode":"function OnUpdate(doc,meta) {\n var expiry = new Date();\n expiry.setSeconds(expiry.getSeconds() + 5);\n\n var context = {docID : meta.id};\n createTimer(NDtimerCallback, expiry, meta.id, context);\n}\nfunction NDtimerCallback(context) {\n dst_bucket[context.docID] = 'from NDtimerCallback';\n}",
               "id":0,
               "settings":{
                  "enable_recursive_mutation":false,
                  "app_log_max_files":10,
                  "curl_timeout":500,
                  "skip_timer_threshold":86400,
                  "dcp_stream_boundary":"everything",
                  "use_memory_manager":true,
                  "persist_interval":5000,
                  "sock_batch_size":100,
                  "dcp_num_connections":1,
                  "enable_snapshot_smr":false,
                  "log_level":"TRACE",
                  "min_page_items":50,
                  "fuzz_offset":0,
                  "max_delta_chain_len":200,
                  "xattr_doc_timer_entry_prune_threshold":100,
                  "worker_feedback_queue_cap":10000,
                  "tick_duration":60000,
                  "deadline_timeout":3,
                  "app_log_max_size":10485760,
                  "max_page_items":400,
                  "worker_count":3,
                  "lss_read_ahead_size":1048576,
                  "deployment_status":true,
                  "lss_cleaner_threshold":30,
                  "description":"",
                  "dcp_gen_chan_size":10000,
                  "lss_cleaner_max_threshold":70,
                  "feedback_batch_size":100,
                  "auto_swapper":true,
                  "worker_queue_cap":100000,
                  "cpp_worker_thread_count":2,
                  "cron_timers_per_doc":1000,
                  "feedback_read_buffer_size":65536,
                  "execution_timeout":1,
                  "processing_status":true,
                  "cleanup_timers":false,
                  "timer_processing_tick_interval":500,
                  "breakpad_on":true,
                  "lcb_inst_capacity":5,
                  "vb_ownership_giveup_routine_count":3,
                  "data_chan_size":10000,
                  "vb_ownership_takeover_routine_count":3,
                  "checkpoint_interval":10000
               },
               "appname":"test_import_function_2"
            }
            {noformat}
            +Response+
            {noformat}headers:{
               'Content-type':'application/json',
               'Authorization':'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n'
            }error:406reason:unknown{
               "name":"ERR_CLUSTER_VERSION",
               "code":42,
               "description":"This function syntax is unsupported on current cluster version",
               "attributes":null,
               "runtime_info":{
                  "code":42,
                  "info":"Function requires 6.0 but cluster is at 5.5"
               }
            }
            {noformat}
            However entire cluster is already in 6.0.0 before we run the cluster. This doesn't seem to happen from UI but only through this API which we extensively use in automation.
             At the same time this API works fine for upgrade when we upgrade through swap rebalance and regular rebalance.

            Logs attached.

            Automation Log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/56/consoleText
            lynn.straus Lynn Straus made changes -
            Component/s eventing [ 14026 ]
            siri Sriram Melkote (Inactive) made changes -
            Component/s eventing [ 14026 ]

            Lynn Straus More work needs to be done, please don't set component to eventing. There isn't enough reason to believe this is an eventing bug at this time.

            siri Sriram Melkote (Inactive) added a comment - Lynn Straus More work needs to be done, please don't set component to eventing. There isn't enough reason to believe this is an eventing bug at this time.

            Balakumaran Gopal Can we please wait 10s after rebalance is done? Dave Finlay indicates this is the max lag. Thanks.

            siri Sriram Melkote (Inactive) added a comment - - edited Balakumaran Gopal Can we please wait 10s after rebalance is done? Dave Finlay indicates this is the max lag. Thanks.

            Added sleep more than 10s , working fine now.

            log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/63/console

            Balakumaran.Gopal Balakumaran Gopal added a comment - Added sleep more than 10s , working fine now. log : http://qa.sc.couchbase.com/job/test_bala_upgrade_new1/63/console
            siri Sriram Melkote (Inactive) made changes -
            Component/s ns_server [ 10019 ]
            siri Sriram Melkote (Inactive) made changes -
            Link This issue relates to MB-22002 [ MB-22002 ]
            siri Sriram Melkote (Inactive) made changes -
            Resolution Duplicate [ 3 ]
            Status Open [ 1 ] Resolved [ 5 ]

            Build couchbase-server-6.0.0-1610 contains eventing commit 607c899 with commit message:
            MB-31074: Add more logging to diagnose cluster version issue

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1610 contains eventing commit 607c899 with commit message: MB-31074 : Add more logging to diagnose cluster version issue

            Build couchbase-server-6.5.0-1294 contains eventing commit 607c899 with commit message:
            MB-31074: Add more logging to diagnose cluster version issue

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1294 contains eventing commit 607c899 with commit message: MB-31074 : Add more logging to diagnose cluster version issue

            Bulk closing all bugs with resolution != Fixed/Done

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Bulk closing all bugs with resolution != Fixed/Done
            mihir.kamdar Mihir Kamdar (Inactive) made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty