Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-22583

Improve performance of cbimport json list

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Done
    • Major
    • 6.6.0
    • 5.0.0
    • tools

    Description

      I have spent a long time making performance improvements for cbimport json with list formats and I've verified though various testing that the bottleneck has become the golang encoding/json library. To make any further improvements we will need to submit code to golang.

      There are two current issues that I am seeing that are candidates for improving encoding/json. The first thing to note is that our architecture for loading data consists of a single read which passes the json objects read to multiple worker threads. When reading objects from the json list file we only read the bytes and do unmarshaling on the worker threads to improve performance. Reading each object in bytes is done by passing json.RawMessage to the decoder.

      One problem is that although we are asking just for the bytes, encoding/json internally re-scans the bytes (during the unmarshal step) to ensure they are an object and this is unnecessary since we have already done this. At the moment it looks like the code is written in a more general form and does not handle this edge case.

      The above issue in the currently library only uses a small amount of CPU, but the amount of CPU increases when the internal buffer in the decoder gets larger. This presents another problem because for large json files we generally want a larger buffer so that when we read data from disk we can reduce the amount of system calls. What I have found is that if we make the buffer larger we see big gains when reading the file from disk, but these gains are lost to more time spent in the unnecessary extra scanning step. If we could increase the buffer size and skip the unnecessary unmarshal step we could get a 2x performance improvement.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              carlos.gonzalez Carlos Gonzalez Betancort (Inactive)
              mikew Mike Wiederhold [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty