Uploaded image for project: 'Couchbase Documentation'
  1. Couchbase Documentation
  2. DOC-122

Doc Request: Instructions for using non-latin characters in views/queries

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: tech-debt
    • Component/s: dev guide and SDKs
    • Labels:
      None

      Description

      Could we have some specific instructions on dealing with non-latin characters in views and queries?

      The question is around ensuring that documents and values with a mix of latin and non-latin in the same range are included in the query and providing developers the best practice from a client-side library as well as Couchbase syntax perspective,

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        kzeller kzeller added a comment -

        Hi Aaron,

        I've been asked to bother you to get more information about how to use non-latin characters in views/queries. Do you have some information you can send via email?

        Thanks,

        Karen

        Show
        kzeller kzeller added a comment - Hi Aaron, I've been asked to bother you to get more information about how to use non-latin characters in views/queries. Do you have some information you can send via email? Thanks, Karen
        Hide
        aaron Aaron Miller (Inactive) added a comment - - edited

        Views have no problems with non-latin characters in data, as long as they are in UTF-8 coded Unicode strings.

        As for how they're ordered, that's defined by a pretty complex set of rules and data tables, being the Unicode Collation Algorithm as implemented by the ICU library.

        ICU Collation is documented at: http://userguide.icu-project.org/collation/architecture
        the Unicode Collation Algorithm standard at: http://www.unicode.org/reports/tr10/

        We use ICU with the "root locale" (meaning none of the customizations or tailorings mentioned in the UCA document are at play, I believe).

        A much easier way to figure out how your data will be ordered is just to try it.

        Show
        aaron Aaron Miller (Inactive) added a comment - - edited Views have no problems with non-latin characters in data, as long as they are in UTF-8 coded Unicode strings. As for how they're ordered, that's defined by a pretty complex set of rules and data tables, being the Unicode Collation Algorithm as implemented by the ICU library. ICU Collation is documented at: http://userguide.icu-project.org/collation/architecture the Unicode Collation Algorithm standard at: http://www.unicode.org/reports/tr10/ We use ICU with the "root locale" (meaning none of the customizations or tailorings mentioned in the UCA document are at play, I believe). A much easier way to figure out how your data will be ordered is just to try it.
        Hide
        perry Perry Krug added a comment -

        Thanks Aaron. Unfortunately "just try it" doesn't really match the needs of customers who are trying to figure out how to make it work. Documenting and providing examples will do that.

        Show
        perry Perry Krug added a comment - Thanks Aaron. Unfortunately "just try it" doesn't really match the needs of customers who are trying to figure out how to make it work. Documenting and providing examples will do that.
        Hide
        kzeller kzeller added a comment -

        Hi Aaron,

        See Perry's questions above. To fulfill this need for documentation, I believe we need the following underlying information:

        1) List all the rules we use for non-latin collation in the engine

        2) What is the order of precedence between the rules? What does this imply for certain characters vs. others.

        3) 3-4 examples of queries and the results you get based on the rules.

        I can't think of anything else, but this can at least get me started drafting the information.

        Perry - if you know specific areas of confusion/need for info. from customers, let me know.

        Thanks,

        Karen

        Show
        kzeller kzeller added a comment - Hi Aaron, See Perry's questions above. To fulfill this need for documentation, I believe we need the following underlying information: 1) List all the rules we use for non-latin collation in the engine 2) What is the order of precedence between the rules? What does this imply for certain characters vs. others. 3) 3-4 examples of queries and the results you get based on the rules. I can't think of anything else, but this can at least get me started drafting the information. Perry - if you know specific areas of confusion/need for info. from customers, let me know. Thanks, Karen
        Hide
        aaron Aaron Miller (Inactive) added a comment -

        The rules are at http://www.unicode.org/reports/tr10/ where there are also examples. There are a lot of them, they are not simple.

        The rules also require looking at data tables to classify characters, mostly this one: http://www.unicode.org/Public/UCA/latest/allkeys.txt

        Show
        aaron Aaron Miller (Inactive) added a comment - The rules are at http://www.unicode.org/reports/tr10/ where there are also examples. There are a lot of them, they are not simple. The rules also require looking at data tables to classify characters, mostly this one: http://www.unicode.org/Public/UCA/latest/allkeys.txt
        Hide
        cihan Cihan Biyikoglu added a comment -

        Aaron no longer here. assigning to Ruth

        Show
        cihan Cihan Biyikoglu added a comment - Aaron no longer here. assigning to Ruth
        Hide
        akurtzman Amy Kurtzman added a comment -

        Check with Siri M. to find out whether this is needed for secondary indexes. If not, close this issue. If it is needed, then schedule it for Sherlock.

        Show
        akurtzman Amy Kurtzman added a comment - Check with Siri M. to find out whether this is needed for secondary indexes. If not, close this issue. If it is needed, then schedule it for Sherlock.
        Hide
        perry Perry Krug added a comment -

        Whether or not it is needed for secondary indexes, it is still valid for views and their queries. The biggest value here will come from describing the "how to do it" rather than the "how it works", and that extends well beyond just the views themselves and into the SDK. It's probably more of a "developer guide" documentation piece with some tie-ins to the underlying server.

        Show
        perry Perry Krug added a comment - Whether or not it is needed for secondary indexes, it is still valid for views and their queries. The biggest value here will come from describing the "how to do it" rather than the "how it works", and that extends well beyond just the views themselves and into the SDK. It's probably more of a "developer guide" documentation piece with some tie-ins to the underlying server.

          People

          • Assignee:
            marija Marija Jovanovic
            Reporter:
            perry Perry Krug
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Gerrit Reviews

              There are no open Gerrit changes