Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51403

500s on upsert REST api: badrpc 'EXIT' handle_mutation_rv

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • No

    Description

      A recent regression, we are seeing failures upserting docs using the REST api, after some number of docs are upserted (in this example we are upserting 30 docs, and we see this failure after a dozen or so successful upserts. When we move on to the next doc it works, but then we start seeing the same failure on all keys after that:

      n_0:

      172.18.0.3 - couchbase [10/Mar/2022:10:02:44 -0800] "POST /pools/default/buckets/testBucket/docs/key-0 HTTP/1.1" 200 2 - "Apache-HttpClient/4.5.13 (Java/11)" 4283
      ...
      172.18.0.3 - couchbase [10/Mar/2022:10:03:22 -0800] "POST /pools/default/buckets/testBucket/docs/key-12 HTTP/1.1" 200 2 - "Apache-HttpClient/4.5.13 (Java/11)" 2724
      172.18.0.3 - - [10/Mar/2022:10:03:25 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 1865
      172.18.0.3 - - [10/Mar/2022:10:03:30 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2718
      172.18.0.3 - - [10/Mar/2022:10:03:34 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2330
      172.18.0.3 - - [10/Mar/2022:10:03:39 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2501
      172.18.0.3 - - [10/Mar/2022:10:03:44 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2762
      172.18.0.3 - - [10/Mar/2022:10:03:48 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2478
      172.18.0.3 - - [10/Mar/2022:10:03:53 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2602
      172.18.0.3 - - [10/Mar/2022:10:03:58 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2767
      172.18.0.3 - - [10/Mar/2022:10:04:03 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2614
      172.18.0.3 - - [10/Mar/2022:10:04:08 -0800] "POST /pools/default/buckets/testBucket/docs/key-13 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2403
      172.18.0.3 - couchbase [10/Mar/2022:10:04:10 -0800] "POST /pools/default/buckets/testBucket/docs/key-14 HTTP/1.1" 200 2 - "Apache-HttpClient/4.5.13 (Java/11)" 2315
      172.18.0.3 - - [10/Mar/2022:10:04:13 -0800] "POST /pools/default/buckets/testBucket/docs/key-15 HTTP/1.1" 500 44 - "Apache-HttpClient/4.5.13 (Java/11)" 2373
      ...
      

      [ns_server:error,2022-03-10T10:22:47.348-08:00,n_0@172.18.0.3:<0.24422.2>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
                                       {path,
                                        "/pools/default/buckets/testBucket/docs/key-21"},
                                       {method,'POST'},
                                       {type,error},
                                       {what,
                                        {case_clause,
                                         {badrpc,
                                          {'EXIT',
                                           {function_clause,
                                            [{capi_crud,handle_mutation_rv,
                                              [{mc_header,1,134,0,0,0,0,0,undefined},
                                               {mc_entry,undefined,undefined,0,0,0,
                                                undefined,0}],
                                              [{file,"src/capi_crud.erl"},
                                               {line,28}]},
                                             {capi_crud,set,6,[]}]}}}}},
                                       {trace,
                                        [{menelaus_web_crud,handle_post,4,
                                          [{file,"src/menelaus_web_crud.erl"},
                                           {line,334}]},
                                         {request_tracker,request,2,
                                          [{file,"src/request_tracker.erl"},
                                           {line,40}]},
                                         {menelaus_util,handle_request,2,
                                          [{file,"src/menelaus_util.erl"},
                                           {line,221}]},
                                         {mochiweb_http,headers,6,
                                          [{file,
                                            "/home/couchbase/jenkins/workspace/cbas-cbcluster-stress-oraclejdk11/couchdb/src/mochiweb/mochiweb_http.erl"},
                                           {line,153}]},
                                         {proc_lib,init_p_do_apply,3,
                                          [{file,"proc_lib.erl"},{line,226}]}]}]
      
      

      Attachments

        1. cbcollect_info_n_0.zip
          6.08 MB
        2. cbcollect_info_n_1.zip
          4.57 MB
        3. cbcollect_info_n_2.zip
          5.22 MB
        4. screenshot-1.png
          screenshot-1.png
          60 kB

        Issue Links

          For Gerrit Dashboard: MB-51403
          # Subject Branch Project Status CR V

          Activity

            steve.watanabe Steve Watanabe added a comment - - edited

            Michael Blow is there a small test or script that I can run to reproduce this issue (e.g. the docs and commands used in "Our test tries to upsert 30 2MB docs")? I would use this to test the bubbling up of the etmpfail errors. I'm not going to investigate the reason memcached is returning the error. As suggested above you might open a separate ticket to track that.

            steve.watanabe Steve Watanabe added a comment - - edited Michael Blow is there a small test or script that I can run to reproduce this issue (e.g. the docs and commands used in "Our test tries to upsert 30 2MB docs")? I would use this to test the bubbling up of the etmpfail errors. I'm not going to investigate the reason memcached is returning the error. As suggested above you might open a separate ticket to track that.
            michael.blow Michael Blow added a comment -

            I don't have a standalone repro script at present, but the following steps are what our test is doing.

            • Create cluster w/ 2 KV nodes
            • Create couchbase bucket w/ 100 MB memory quota
            • Attempt to upsert 20 MB docs into the bucket, 30 times via REST api
            • Observe 500 failures about 1/3 into the upserts, reportedly due to ETMPFAIL from memcached

            I have cloned this issue for kv_engine to investigate the ETMPFAILs. Not handling ETMPFAILs isn't a regression it seems, so the ns_server issue seems less critical

            michael.blow Michael Blow added a comment - I don't have a standalone repro script at present, but the following steps are what our test is doing. Create cluster w/ 2 KV nodes Create couchbase bucket w/ 100 MB memory quota Attempt to upsert 20 MB docs into the bucket, 30 times via REST api Observe 500 failures about 1/3 into the upserts, reportedly due to ETMPFAIL from memcached I have cloned this issue for kv_engine to investigate the ETMPFAILs. Not handling ETMPFAILs isn't a regression it seems, so the ns_server issue seems less critical

            Michael Blow Please attach or point me to one/some of the 20MB docs being used. As this issue isn't a regression in Neo and mb-51408 tracks the kv issue leading to the etmpfail errors I'm moving this to Morpheus.

            steve.watanabe Steve Watanabe added a comment - Michael Blow Please attach or point me to one/some of the 20MB docs being used. As this issue isn't a regression in Neo and mb-51408 tracks the kv issue leading to the etmpfail errors I'm moving this to Morpheus.
            steve.watanabe Steve Watanabe added a comment - - edited

            Michael Blow No need for the documents. I injected the error instead. With my proposed change it'll return http error 503 along with the reason.

            $ curl -s -u Administrator:asdasd localhost:9000/pools/default/buckets/travel-sample/docs/airline_10123 -d '"{\"id\":10642,\"type\":\"airline\",\"name\":\"Jc royal.britannica\",\"iata\":null,\"icao\":\"JRB\",\"callsign\":null,\"country\":\"United Kingdom\"}"' | jq
            {
              "error": "retry_needed",
              "reason": "etmpfail returned from memcached"
            }
            

            steve.watanabe Steve Watanabe added a comment - - edited Michael Blow No need for the documents. I injected the error instead. With my proposed change it'll return http error 503 along with the reason. $ curl -s -u Administrator:asdasd localhost:9000/pools/default/buckets/travel-sample/docs/airline_10123 -d '"{\"id\":10642,\"type\":\"airline\",\"name\":\"Jc royal.britannica\",\"iata\":null,\"icao\":\"JRB\",\"callsign\":null,\"country\":\"United Kingdom\"}"' | jq { "error": "retry_needed", "reason": "etmpfail returned from memcached" }

            Build couchbase-server-7.2.0-1025 contains ns_server commit 1ce340a with commit message:
            MB-51403 Handle etmpfail in CRUD endpoints

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1025 contains ns_server commit 1ce340a with commit message: MB-51403 Handle etmpfail in CRUD endpoints

            People

              steve.watanabe Steve Watanabe
              michael.blow Michael Blow
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty