Error handling for the transactions protocol involves turning all errors into a generic form (TransactionOperationFailed) which contains a few top-level instructions to the core loop to control transactional flow.
The core loop is intentional abstracted away from details such as transient failures, CAS mismatches, write write conflicts and the like. It is up to the internals of each operation, following the transactions spec, to turn all of these complexities into a simple TransactionOperationFailed.
Now we have a situation where the core loop is the transactions layer on the client side, and operations are executed on the query backend.
In addition, if the user is running queries outside of the SDK, then they must essentially implement the core loop.
We need to find a way to continue the TransactionOperationFailed process, across the query interface. Currently the core loop only receives errors like:
But a) it's intended to be abstracted from these, b) it means if the user is driving transactions from outside the SDK, they have to decide how to handle all of these, and c) it's impossible for the core loop to know what to do with this. A timeout fetching the ATR may need to be handled differently from a timeout fetching the document - that's the intent behind TransactionOperationFailed, so that the part of the code that knows what's going on and is best placed to make the decision, chooses what happens to the transaction next.
The impact is that, anytime the transaction needs to retry, it currently will not. This prevents handling of write-write conflicts, amongst other things. It also means that the correct exception cannot be raised to the user to indicate, for example, TransactionFailed vs TransactionCommitAmbiguous.