Description
The "tracks3" data set, from Couchbase training, has a schema that wreaks havoc with the schema inferencer. Each track has a subdocument called "reviews", and that subdocument has a different field for each review, where the field name is the id of the person who did the review. Thus there is a huge number of field names, resulting in a schema description hundreds of thousands of lines long. The inferencer needs to do something smarter in cases like this, perhaps having a parameterized maximum number of fields.
Attachments
For Gerrit Dashboard: MB-18536 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
62939,2 | MB-18536 - Handle the "dictionary" pattern in schema inferencing. | master | cbq-gui | Status: MERGED | +2 | +1 |