XMLWordPrintable

Details

    • New Feature
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 0

    Description

      The primary goal is to partition large data read queries to optimally use spark workers

       

      From field

      1. High level description of what we need: possibility of partitioning reads when a SQL query is executed in Spark. Main idea, run analytics queries in the same way than Couchbase Analytics service but using Spark and Couchbase Query/Index services instead (no need to deploy Couchbase Analytics Services).
      2. Objective of the feature: needs to read large amounts of data from the Spark connector via SQL++. It is necessary to split the load on different Spark node executors as a single Spark node executor would not have enough capacities. Also, memory backpressure is not enough to support business requirements. Probably, a possible solution to take into account is to add a feature into Couchbase query service which allows to execute the same query concurrently from multiple Spark node executors. Another alternative might be support it in the Spark connector driver in order to split the query in multiple parallel chunks. Something like the JDBC parameters mentioned in the description.
      3. Success Criteria: ability to ** read large amounts of data in parallel in multiple Spark node executors when a SQL++ query is executed.
      4. Assumptions: BBVA nodes are limited to 64 GB of RAM on both Couchbase and Spark nodes executors.
      5. Milestones: support more than 100M+ documents in the current project, probably 1000+ in the next projects.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            graham.pople Graham Pople
            priya.rajagopal Priya Rajagopal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty