Description
The "tracks3" data set, from Couchbase training, has a schema that wreaks havoc with the schema inferencer. Each track has a subdocument called "reviews", and that subdocument has a different field for each review, where the field name is the id of the person who did the review. Thus there is a huge number of field names, resulting in a schema description hundreds of thousands of lines long. The inferencer needs to do something smarter in cases like this, perhaps having a parameterized maximum number of fields.