Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47476

[FTS] Sort by field breaks with missing values on a partitioned index

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      I've observed this breakage in a couchbase system with 2 nodes when I use ..

      • the following sort order: {numericField, stringField}
      • the numericField is missing in a few documents, while stringField is available for all

      only in these 2 combinations of sort by field..

      • [{"type": "number", "field": "numericField", "missing": "last", "desc": false},{"type": "text", "field": "stringField"}] 

      • [{"type": "number", "field": "numericField", "missing": "first", "desc": true},{"type": "text", "field": "stringField"}] 

      One thing that's common between the above 2 sort orders is that documents with "missing" entries use this string encoding as their value (for sorting purposes) ..

      HighTerm = strings.Repeat(string([]byte{0xff}), 10) 

      By instrumenting additional logs into the bleve MultiSearch merge-results-and-sort code - I see that sometimes, when doc1.Sort[0] and doc2.Sort[0] are both HighTerm, the following expression returns -1 when a 0 is expected ..

      strings.Compare(doc1.Sort[0], doc2.Sort[0]) 

      All this said, I have NOT been able to reproduce this at a unit level (yet). This unit test sort of captures the use case ..

      func TestMB47476(t *testing.T) {
      	ei1 := &stubIndex{err: nil, searchResult: &SearchResult{
      		Total: 3,
      		Status: &SearchStatus{},
      		Hits: []*search.DocumentMatch{
      			{
      				IndexInternalID: index.IndexInternalID("2"),
      				Sort: []string{" \u0001@\u001e>\n\u001e\\\u0014=8", "3D Systems Corp"},
      				// Sort[0] is encoding of 28.97
      			},
      			{
      				IndexInternalID: index.IndexInternalID("5"),
      				Sort: []string{search.HighTerm, "10X Genomics Inc"},
      			},
      			{
      				IndexInternalID: index.IndexInternalID("6"),
      				Sort: []string{search.HighTerm, "2U Inc"},
      			},
      		},
      	}}
      	ei2 := &stubIndex{err: nil, searchResult: &SearchResult{
      		Total: 7,
      		Status: &SearchStatus{},
      		Hits: []*search.DocumentMatch{
      			{
      				IndexInternalID: index.IndexInternalID("1"),
      				Sort: []string{" \u0001@\u0010\u0001#kBGW\u0005", "Achieve Life Sciences Inc"},
      				// Sort[0] is encoding of 8.01
      			},
      			{
      				IndexInternalID: index.IndexInternalID("3"),
      				Sort: []string{" \u0001@!%a#kBGW", "1Life Healthcare Inc"},
      				// Sort[0] is encoding of 37.18
      			},
      			{
      				IndexInternalID: index.IndexInternalID("4"),
      				Sort: []string{search.HighTerm, "1-800-Flowers.com Inc"},
      			},
      			{
      				IndexInternalID: index.IndexInternalID("7"),
      				Sort: []string{search.HighTerm, "8x8 Inc"},
      			},
      			{
      				IndexInternalID: index.IndexInternalID("8"),
      				Sort: []string{search.HighTerm, "AAR Corp"},
      			},
      			{
      				IndexInternalID: index.IndexInternalID("9"),
      				Sort: []string{search.HighTerm, "Accenture"},
      			},
      			{
      				IndexInternalID: index.IndexInternalID("10"),
      				Sort: []string{search.HighTerm, "Samsung Electronics Co Ltd"},
      			},
      		},
      	}}
       
       
      	sr := NewSearchRequest(NewMatchAllQuery())
      	sr.SortBy([]string{"price", "name"})
       
       
      	results, err := MultiSearch(context.Background(), sr, ei1, ei2)
      	if err != nil {
      		t.Error(err)
      	}
       
       
      	if results.Total != 10 {
      		t.Fatalf("Unexpected number of hits: %v", results.Total)
      	}
       
       
      	for i := range results.Hits {
      		if string(results.Hits[i].IndexInternalID) != strconv.Itoa(i+1) {
      			t.Fatalf("Unexpected ordering of hits: %v", results.Hits)
      		}
      	}
      }

       => Since the unit test above passes, the issue seems to be around HighTerm (the sort value for missing entries) getting corrupted prior to merging results (obtained over the network) from multiple indexes.

      Attachments

        Activity

          People

            girish.benakappa Girish Benakappa
            abhinav Abhi Dangeti
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty