Provide more fault-tolerant batch mutations

Description

Currently the Spark connector tries to do all mutations concurrently, which doesn't scale excellently.  It should be possible to batch operations.  (Update: actually it does do batches of 128 operations, per executor.)

The specifics of this ticket are being left vague until I can talk with and make sure we nail the customer's requirements.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Activity

Show:

Graham Pople May 23, 2019 at 3:35 PM

Thanks for the suggestions , I've added some documentation to the patch that details this, along with some suggestions on what tunables the app can use.

Raymundo Flores May 23, 2019 at 3:18 PM
Edited

Hello Graham,

Thanks for your help in this, I think that is a good starting point for the users to extends and adapt the loader to their requirements.

As we discussed via Slack, it will be great to document the relation between executors and maxconcurrent operations. This will help our users to forecast the write load and fix the loader to their cluster sizing.

Thanks again

Graham Pople May 23, 2019 at 2:27 PM

I haven't gone as far as you've suggested, for this initial patch.  Here I've just added application control over the size of the batching, but it's a decent start and I think will actually cover most of the cases where users don't need to push the envelope.

For those users that truly want to maximise performance, more advanced flow control will be required.  But at that point I think apps may be better off doing the saveToCouchbase logic themselves - if you look at the saveToCouchbase code, you'll see it's very simple.  Because it's very tricky to write a truly generic doc loader like this, e.g. the app could also want its own logging, profiling and so on in there.  So my feeling is that saveToCouchbase should try to hit 80% of the cases, and those apps that want to push things further will want to do it themselves. 

What are your thoughts?

Raymundo Flores May 14, 2019 at 9:17 AM

Example of loader:

https://github.com/rfmvlc/spark-loader

 

We are able to control the timeouts and retry delay, something that would be great is to have the ability of:

  • Throttling

  • Fallback mechanism

  • # of retries configuration

 

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Fix versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 10, 2019 at 2:20 PM
Updated April 24, 2020 at 8:06 PM
Resolved May 29, 2019 at 2:15 PM
Instabug