Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38524

Bulk Collection creation leads to missing DCP system events

    XMLWordPrintable

Details

    • Bug
    • Resolution: User Error
    • Major
    • 7.0.0
    • Cheshire-Cat
    • couchbase-bucket
    • None

    Description

      I’m doing some very basic scalability scripting where for a bucket B1, scope S1, I create X number of collections. The following email is all without any I/O on any collection.
      I know there’s a bulk create in the works (or perhaps already avail), but for now the script is just simply looping X times creating collection with X diff names (directly to ns_server).

      XDCR has a stat called “changes_left” where, in layman’s terms:

      Changes_left = sum(for_each_vb(highSeqno)) – sum(for_each_vb(highestSeqnoSeen))
      

      When X is lower than a threshold, all is well. XDCR’s DCP receives all the system events as the collection creation comes down, and changes_left goes to 0.
      When X is above a threshold, (i.e. 50 or more collections being created), XDCR starts seeing non-0 changes_left.

      Digging in, it looks like all the vb’s are missing a same number of system events.

      The following is the difference between highseqno and XDCR’s throughSeqno (what it has received from DCP) for vb’s that are mismatched:

      2020-03-31T14:45:51.687-07:00 INFO GOXDCR.StatsMgr: 4059309627b374c3d685d47fdf65955d/B1/B2 total_docs=54272, docs_processed=35934, changes_left=18338
      2020-03-31T14:45:51.688-07:00 INFO GOXDCR.StatsMgr: Difference map: map[5:53 6:53 8:53 13:53 22:53 23:53 24:53 25:53 26:53 30:53 31:53 32:53 34:53 40:53 41:53 45:53 48:53 50:53 51:53 53:53 54:53 56:53 58:53 60:53 63:53 64:53 66:53 69:53 71:53 72:53 75:53 79:53 80:53 88:53 92:53 93:53 94:53 96:53 100:53 102:53 104:53 111:53 118:53 120:53 122:53 125:53 128:53 129:53 130:53 131:53 133:53 135:53 139:53 144:53 145:53 147:53 149:53 159:53 161:53 162:53 163:53 164:52 166:53 168:53 170:53 173:53 177:53 178:53 181:53 184:53 186:53 188:53 190:53 194:53 200:53 204:53 206:53 209:53 210:53 215:53 219:53 220:53 223:53 224:53 227:53 228:53 229:53 234:53 236:53 242:53 243:53 249:53 250:53 252:53 254:53 258:53 264:53 265:53 267:53 269:53 273:53 274:53 275:53 277:53 282:53 286:53 287:53 292:53 293:53 298:53 300:53 301:53 305:53 308:53 313:53 316:53 317:53 327:53 334:53 337:53 340:53 341:53 342:53 345:53 355:53 357:53 359:53 362:53 366:53 375:53 377:53 380:53 381:53 390:53 391:53 393:53 394:53 397:53 400:53 403:53 404:53 405:53 406:53 412:53 415:53 417:53 429:53 436:53 439:53 440:53 445:53 449:53 454:53 456:53 473:53 474:53 476:53 480:53 482:53 484:53 486:53 487:53 488:53 490:53 491:53 492:53 493:53 496:53 499:53 502:53 503:53 507:53 509:53 513:53 514:53 516:53 517:53 519:53 524:53 525:53 528:53 530:53 531:53 540:53 543:53 546:53 548:53 550:53 551:53 557:53 559:35 561:53 562:53 564:53 567:53 568:53 570:53 571:53 575:53 583:53 586:27 587:53 588:53 591:53 592:53 599:30 601:41 602:53 604:53 605:53 610:53 612:53 616:53 621:53 623:53 626:53 635:53 636:53 637:53 640:53 641:53 645:53 646:53 651:53 658:53 659:53 668:53 669:27 673:53 674:53 677:53 681:53 685:53 687:53 690:53 691:53 694:53 696:53 700:53 703:53 707:53 708:53 709:53 714:53 715:53 723:53 726:53 729:53 730:53 734:53 737:53 738:53 741:53 747:53 757:53 759:53 763:53 765:53 768:53 769:53 770:53 771:53 773:53 775:53 784:53 787:53 788:53 790:53 791:53 792:53 794:53 796:53 800:53 801:53 802:53 808:53 818:53 822:53 823:53 826:53 827:53 828:53 831:53 832:53 833:53 834:53 838:53 839:53 842:53 843:53 845:53 847:53 849:53 853:53 854:53 862:53 877:53 879:53 884:53 887:53 888:53 889:53 891:53 892:53 893:53 901:53 902:53 903:53 904:53 906:53 907:53 908:53 909:53 912:53 913:53 919:53 920:53 921:53 922:53 925:53 926:53 927:53 930:53 931:53 936:53 939:53 944:53 945:53 946:53 957:53 962:53 964:53 966:53 973:53 979:53 980:53 983:53 991:53 996:53 998:53 1003:53 1004:53 1011:53 1018:53 1019:53 1020:53 1021:53 1022:53]
      

      In KV’s log, I see a bunch of

      2020-03-31T14:44:38.136855-07:00 WARNING (B1) 38: Invalid streaming connection: cookie:0x000000010d0edfc0
      

      And also a bunch of backfills (not sure if the 53 matches what I’m seeing is a coincidence)

      2020-03-31T14:44:38.150837-07:00 INFO 53: (B1) DCP (Producer) eq_dcpq:xdcr:dcp_4059309627b374c3d685d47fdf65955d/B1/B2_127.0.0.1:12000_1:fY168VhvUTgsrrsvGowuSQ== - (vb:1018) Scheduling backfill from 1 to 53, reschedule flag : False
       

      I’m attaching the Memcached log… Could you please give me some pointers on what is happening?
      XDCR keeps track of potential deduplications from DCP, and it doesn’t seem like that path was hit in my setup.

      Attachments

        1. results.zip
          7.19 MB
        2. n_0.zip
          802 kB
        3. memcached.log.000000.txt.zip
          524 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              neil.huang Neil Huang
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty