Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53286

[CX] Parquet fails with bucket names containing "dot" character between "numbers"

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown
    • Analytics Sprint 10, Analytics Sprint 11, Analytics Sprint 12, Analytics Sprint 13, Analytics Sprint 14

    Description

      Buckets containing the "dot" character (.) between numbers, such as my-bucket-1.1 fail when reading parquet data (current with internal error). This is due to the internal mechanism of how Hadoop coverts the URI to a path-style access, leading Hadoop to not pick up the bucket name properly.

      This is a known issue by Hadoop and AWS, and the recommendation is to avoid using bucket names containing "dot" (.) in the name.

      This issue is to ensure that we don't return internal error upon encountering such scenarios.

      References:

      https://issues.apache.org/jira/browse/HADOOP-17241

      https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/

       

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-53286
          # Subject Branch Project Status CR V

          Activity

            People

              Hussain.Towaileb Hussain Towaileb
              Hussain.Towaileb Hussain Towaileb
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty