Let me explain the underlying bug, and I think that will answer your first questions.
In a normal Server upgrade, Windows Installer (which is the Windows subsystem which handles software installs, uninstall, and upgrades) removes the files for the older version of Server and then replaces them with the files from the newer version of Server. In the 7.0.4->7.1.0 upgrade in particular, though, Windows Installer gets confused because a few files are actually newer (have a higher version number) in the "older" Server version 7.0.4. It ends up removing those files but then not replacing them at all, so when 7.1.0 tries to start, it finds some files missing. This causes (at least) memcached and cbas to fail, making the node unusable.
When the customer invokes the Repair operation on this broken 7.1.0 installation, Windows Installer verifies all Server files on the drive, and replaces any that don't match what the 7.1.0 installer says they should be, including any files that are outright missing. So after the Repair operation completes, the Server installation now looks exactly like it should have looked after the initial upgrade. Therefore Server can start and all functionality should be normal.
Hopefully that gives you the detail you need to understand the bug. FYI, earlier when I said that "7.0.4->7.1.0 is broken and can't be fixed", what I meant was that we cannot fix the bug, not that there's no way to fix a customer's installation. Any customer doing any upgrade from 7.0.4->7.1.0 will trigger this bug and wind up with a broken Server installation, and there is nothing we can do about that. However, once they've hit the bug and have a broken Server, using Repair should fix their installation.
To reply to your other questions:
will this have any side affects if the node is running just CBAS vs CBAS+others like (KV, GSI etc.).
To the best of my knowledge, the services that are on the node won't matter. The node will be unusable until the Repair operation is performed.
Also do customers need to eject the node completely out of the cluster to apply this change or failover is good?
If they're doing an Upgrade on Windows, they are implicitly shutting the node down as part of an Offline Cluster Upgrade, is that right? So now the individual node Upgrade process has an additional step, which is the Repair operation. The node will not re-enter the cluster prior to the Repair being complete. I don't believe this additional step changes the logic of whether an Offline Cluster Upgrade should be performed, or the process by which the Offline Cluster Upgrade will be done.
I do not currently believe that this bug causes any additional risk of data loss during the upgrade procedure. However I certainly cannot swear to that. If the customer is following the Offline Cluster Upgrade procedure properly, they should have a full data backup prior to starting.
Also at a later time, if customers want to follow any different approach to upgrade from 7.1.0 to 7.1.1?
The bug here only affects Upgrade from exactly 7.0.4 to exactly 7.1.0. No other pair of Server versions are known to be affected. In particular, 7.0.4->7.1.1 is known to work normally, and so far as I know 7.1.0->7.1.1 works normally as well. And if a customer upgrades from 7.0.4 to 7.1.0, hits this bug, and Repairs their installation, then there should be no reason they couldn't later follow the normal upgrade process from 7.1.0 to any future Server version.
It is possible there will be other pairs of Server versions affected by variants of this same bug in future. This is why I'm strongly considering disallowing node upgrade on Windows entirely. That's far outside the scope of this ticket though.
do we have QE sign off on this interim solution or yet to be validated?
You would need to talk to QE about that. My local testing has only been with trivial one-node configurations.
DOC-10124