Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: Morpheus
Affects Version/s: Morpheus
Component/s: analytics
Labels:
- gfn
- triaged

Story Points:
1

Description

Creating external datasets from CSV files should be able to infer the attribute names from the file header, if present, and sample records to infer the attributes' data types. For example, in a create statement there could be an "infer" flag that takes the number of records to scan, like the following open source syntax example (where the prescribed number is 10):

CREATE EXTERNAL DATASET Employee() USING localfs (("path"="localhost:///employees.csv"), ("format"="delimited-text"), ("delimiter"=","), ("header"=true), ("infer"=10))

One could imagine offering some different "infer" options - e.g.,
"infer" = N — look at the header plus the first N rows to infer a likely schema
"infer" = ALL — look at the whole file to infer a (bullet-proof) schema
"infer" = SAMPLE(N) — pick N rows at random to infer a schema (don't know if this is useful or not)

One could also imagine no-header versions of the above where field names come from the CREATE statement but the data types are inferred, though that seems to make less sense (as it helps to save less work).

The overall goal would be to make the user experience for CSV/TSV as close as we can get to the ease of dealing with JSON.

Attachments

Issue Links

links to

*DB issue

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Till Westmann

Reporter:: Till Westmann

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 17/May/21 2:14 PM

Updated:: 11/Apr/24 6:05 PM

Gerrit Reviews

There are no open Gerrit changes

[CX] Infer schema from CSV header

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty