Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
master
-
0
Description
What is the problem?
We currently support a json file with an array of documents or a document per line. There is also a third internal format that the sample buckets use. This is a zip with a document per file and then some extra metadata for things like creating indexes and it is documented, although whether it is supported is unclear as we say
This format is intended to load Couchbase sample data sets
We do know however that customers have tried to use it.
The first two formats are not terribly suitable to large datasets. JSON documents are often very compressible but these formats require a single file on disk. The sample format isn't suitable because we (should be) reserving the right to change it according to our own needs.
What is the solution?
We should remove the documentation of the sample format and introduce a new one file per document zip format. This format should be documented and we should keep it backward compatible.
We can consider whether to include index definitions and such but I would lean towards not doing so.