Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42825

Join queries hang without returning results

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Cheshire-Cat, 6.6.1
    • Fix Version/s: 6.6.1, 7.0.0
    • Component/s: query
    • Environment:
      Docker for Windows 10 20H2
      Image: couchbase/server:7.0.0-beta
      All services enabled
      Plasma indexes
      Loaded beer-sample and travel-sample buckets and created an empty default bucket
    • Triage:
      Untriaged
    • Link to Log File, atop/blg, CBCollectInfo, Core dump:
      CBCollectInfo is attached
    • Story Points:
      1
    • Is this a Regression?:
      Yes

      Description

      Queries involving joins may hang unexpectedly and never return results. This query in particular, generated by the Linq2Couchbase integration tests, seems to work often:

      SELECT `Extent2`.`airportname` as `airportName`, `Extent1`.`airline` as `airline`
      FROM `travel-sample` as `Extent1`
      INNER JOIN `travel-sample` as `Extent2` ON (`Extent1`.`destinationairport` = `Extent2`.`faa`) 
      WHERE ((`Extent1`.`type` = 'route') AND (`Extent2`.`type` = 'airport')) 
      LIMIT 1
      

      I have replicated repeatedly from the .NET SDK, using cURL, and in the Query Workbench. The HTTP request appears to get hung and never return a value and, in the case of cURL, never timeout. However, sometimes (rarely) valid results are returned.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            Sitaram.Vemulapalli Sitaram Vemulapalli added a comment - - edited

            Repro:
            install travel-sample
            run the query (couple of times in UI or cbq shell)

            Looks like some synchronization issue.

            goroutine 184 [sync.Cond.Wait]:
            runtime.goparkunlock(...)
                    /usr/local/go/src/runtime/proc.go:310
            sync.runtime_notifyListWait(0xc001a62648, 0x7fd000000000)
                    /usr/local/go/src/runtime/sema.go:510 +0xf8
            sync.(*Cond).Wait(0xc001a62638)
                    /usr/local/go/src/sync/cond.go:56 +0x9d
            github.com/couchbase/query/execution.(*base).baseDone(0xc001a62480)
                    /root/master/query/src/github.com/couchbase/query/execution/base.go:307 +0x247
            github.com/couchbase/query/execution.(*IntersectScan).Done(0xc001a62480)
                    /root/master/query/src/github.com/couchbase/query/execution/scan_intersect.go:257 +0x31
            github.com/couchbase/query/execution.(*Sequence).Done(0xc001a62b40)
                    /root/master/query/src/github.com/couchbase/query/execution/sequence.go:172 +0xc9
            github.com/couchbase/query/execution.(*NLJoin).Done(0xc001a62d80)
                    /root/master/query/src/github.com/couchbase/query/execution/join_nl.go:288 +0x6e
            github.com/couchbase/query/execution.(*Sequence).Done(0xc001a63200)
                    /root/master/query/src/github.com/couchbase/query/execution/sequence.go:172 +0xc9
            github.com/couchbase/query/execution.(*Sequence).Done(0xc001a63440)
                    /root/master/query/src/github.com/couchbase/query/execution/sequence.go:172 +0xc9
            github.com/couchbase/query/execution.(*Authorize).Done(0xc001a63b00)
                    /root/master/query/src/github.com/couchbase/query/execution/authorize.go:158 +0x72
            github.com/couchbase/query/server.(*BaseRequest).CompleteRequest(0xc001c04000, 0x2ef372d1, 0x2eef6824, 0x1, 0x42, 0x0, 0xc001b60200, 0xc000a4cd20)
            
            

            Show
            Sitaram.Vemulapalli Sitaram Vemulapalli added a comment - - edited Repro: install travel-sample run the query (couple of times in UI or cbq shell) Looks like some synchronization issue. goroutine 184 [sync.Cond.Wait]: runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go: 310 sync.runtime_notifyListWait( 0xc001a62648 , 0x7fd000000000 ) /usr/local/go/src/runtime/sema.go: 510 + 0xf8 sync.(*Cond).Wait( 0xc001a62638 ) /usr/local/go/src/sync/cond.go: 56 + 0x9d github.com/couchbase/query/execution.(*base).baseDone( 0xc001a62480 ) /root/master/query/src/github.com/couchbase/query/execution/base.go: 307 + 0x247 github.com/couchbase/query/execution.(*IntersectScan).Done( 0xc001a62480 ) /root/master/query/src/github.com/couchbase/query/execution/scan_intersect.go: 257 + 0x31 github.com/couchbase/query/execution.(*Sequence).Done( 0xc001a62b40 ) /root/master/query/src/github.com/couchbase/query/execution/sequence.go: 172 + 0xc9 github.com/couchbase/query/execution.(*NLJoin).Done( 0xc001a62d80 ) /root/master/query/src/github.com/couchbase/query/execution/join_nl.go: 288 + 0x6e github.com/couchbase/query/execution.(*Sequence).Done( 0xc001a63200 ) /root/master/query/src/github.com/couchbase/query/execution/sequence.go: 172 + 0xc9 github.com/couchbase/query/execution.(*Sequence).Done( 0xc001a63440 ) /root/master/query/src/github.com/couchbase/query/execution/sequence.go: 172 + 0xc9 github.com/couchbase/query/execution.(*Authorize).Done( 0xc001a63b00 ) /root/master/query/src/github.com/couchbase/query/execution/authorize.go: 158 + 0x72 github.com/couchbase/query/server.(*BaseRequest).CompleteRequest( 0xc001c04000 , 0x2ef372d1 , 0x2eef6824 , 0x1 , 0x42 , 0x0 , 0xc001b60200 , 0xc000a4cd20 )
            Hide
            marco.greco Marco Greco added a comment -

            I don't have the complete picture yet, but it seems like the some children of the intersect scan have gone away without notifying the parent

            goroutine 91047 [semacquire, 1 minutes]:
            sync.runtime_Semacquire(0xc0000e7c54)
            /usr/local/go/src/runtime/sema.go:56 +0x42
            sync.(*WaitGroup).Wait(0xc0000e7c54)
            /usr/local/go/src/sync/waitgroup.go:130 +0x64
            github.com/couchbase/query/execution.(*valueExchange).retrieveChildNoStop(0xc0000e7b80, 0x1)
            /home/marco/query/src/github.com/couchbase/query/execution/exchange.go:488 +0xbc
            github.com/couchbase/query/execution.(*base).childrenWaitNoStop(0xc0000e7b80, 0x1)
            /home/marco/query/src/github.com/couchbase/query/execution/base.go:780 +0x50
            github.com/couchbase/query/execution.(*IntersectScan).RunOnce.func1()
            /home/marco/query/src/github.com/couchbase/query/execution/scan_intersect.go:151 +0x86c
            github.com/couchbase/query/util.(*Once).Do(0xc0000e7cb8, 0xc0019ddf50)
            /home/marco/query/src/github.com/couchbase/query/util/sync.go:55 +0x4a
            github.com/couchbase/query/execution.(*IntersectScan).RunOnce(0xc0000e7b80, 0xc000e0c000, 0x28b8940, 0xc003b2fe00)
            /home/marco/query/src/github.com/couchbase/query/execution/scan_intersect.go:65 +0x82
            github.com/couchbase/query/execution.execOp(0x28c08e0, 0xc0000e7b80, 0xc000e0c000, 0x28b8940, 0xc003b2fe00)
            /home/marco/query/src/github.com/couchbase/query/execution/base.go:495 +0x54
            created by github.com/couchbase/query/execution.(*base).fork
            /home/marco/query/src/github.com/couchbase/query/execution/base.go:505 +0xfa

            Show
            marco.greco Marco Greco added a comment - I don't have the complete picture yet, but it seems like the some children of the intersect scan have gone away without notifying the parent goroutine 91047 [semacquire, 1 minutes] : sync.runtime_Semacquire(0xc0000e7c54) /usr/local/go/src/runtime/sema.go:56 +0x42 sync.(*WaitGroup).Wait(0xc0000e7c54) /usr/local/go/src/sync/waitgroup.go:130 +0x64 github.com/couchbase/query/execution.(*valueExchange).retrieveChildNoStop(0xc0000e7b80, 0x1) /home/marco/query/src/github.com/couchbase/query/execution/exchange.go:488 +0xbc github.com/couchbase/query/execution.(*base).childrenWaitNoStop(0xc0000e7b80, 0x1) /home/marco/query/src/github.com/couchbase/query/execution/base.go:780 +0x50 github.com/couchbase/query/execution.(*IntersectScan).RunOnce.func1() /home/marco/query/src/github.com/couchbase/query/execution/scan_intersect.go:151 +0x86c github.com/couchbase/query/util.(*Once).Do(0xc0000e7cb8, 0xc0019ddf50) /home/marco/query/src/github.com/couchbase/query/util/sync.go:55 +0x4a github.com/couchbase/query/execution.(*IntersectScan).RunOnce(0xc0000e7b80, 0xc000e0c000, 0x28b8940, 0xc003b2fe00) /home/marco/query/src/github.com/couchbase/query/execution/scan_intersect.go:65 +0x82 github.com/couchbase/query/execution.execOp(0x28c08e0, 0xc0000e7b80, 0xc000e0c000, 0x28b8940, 0xc003b2fe00) /home/marco/query/src/github.com/couchbase/query/execution/base.go:495 +0x54 created by github.com/couchbase/query/execution.(*base).fork /home/marco/query/src/github.com/couchbase/query/execution/base.go:505 +0xfa
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3906 contains query commit f40fd5e with commit message:
            MB-42825 hangs on intersect scan

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3906 contains query commit f40fd5e with commit message: MB-42825 hangs on intersect scan
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9200 contains query commit 3c7ab4b with commit message:
            MB-42825 hangs on intersect scan

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9200 contains query commit 3c7ab4b with commit message: MB-42825 hangs on intersect scan
            Hide
            pierre.regazzoni Pierre Regazzoni added a comment - - edited

            I could easily repro issue on build 6.6.1-9192.

            I then tried on 6.6.1-9200:

            • I did first run of query with 50 occurrence which worked.
            • Then tried 100 times but query then hanged on iteration 46.

            [root@localhost ~]# cat query-46.out 
            SELECT `Extent2`.`airportname` as `airportName`, `Extent1`.`airline` as `airline` FROM `travel-sample` as `Extent1` INNER JOIN `travel-sample` as `Extent2` ON (`Extent1`.`destinationairport` = `Extent2`.`faa`) WHERE ((`Extent1`.`type` = 'route') AND (`Extent2`.`type` = 'airport')) LIMIT 1 

            You can check cluster http://172.23.104.92:8091/ which has 6.6.1-9200.

            To repro simply run on server node:

            cd /root/
            for (( i=0; i<100; ++i)); do /opt/couchbase/bin/cbq -u Administrator -p password -f query.sql -o query-${i}.out; done 

             

            Show
            pierre.regazzoni Pierre Regazzoni added a comment - - edited I could easily repro issue on build 6.6.1-9192. I then tried on 6.6.1-9200: I did first run of query with 50 occurrence which worked. Then tried 100 times but query then hanged on iteration 46. [root @localhost ~]# cat query- 46 .out SELECT `Extent2`.`airportname` as `airportName`, `Extent1`.`airline` as `airline` FROM `travel-sample` as `Extent1` INNER JOIN `travel-sample` as `Extent2` ON (`Extent1`.`destinationairport` = `Extent2`.`faa`) WHERE ((`Extent1`.`type` = 'route' ) AND (`Extent2`.`type` = 'airport' )) LIMIT 1 You can check cluster http://172.23.104.92:8091/  which has 6.6.1-9200. To repro simply run on server node: cd /root/ for (( i= 0 ; i< 100 ; ++i)); do /opt/couchbase/bin/cbq -u Administrator -p password -f query.sql -o query-${i}.out; done  
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3920 contains query commit 4beab79 with commit message:
            MB-42825 reopen allowed on a stop signal

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3920 contains query commit 4beab79 with commit message: MB-42825 reopen allowed on a stop signal
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9204 contains query commit fe59898 with commit message:
            MB-42825 reopen allowed on a stop signal

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9204 contains query commit fe59898 with commit message: MB-42825 reopen allowed on a stop signal
            Hide
            pierre.regazzoni Pierre Regazzoni added a comment -

            Verified with following build:

            • 6.6.1-9204
            • 7.0.0-3920
              Ran query over 100 times without issue multiple times.
              Also did a continuous run for 5 minutes without hangs.
            Show
            pierre.regazzoni Pierre Regazzoni added a comment - Verified with following build: 6.6.1-9204 7.0.0-3920 Ran query over 100 times without issue multiple times. Also did a continuous run for 5 minutes without hangs.
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-3934 contains query commit ddc4d65 with commit message:
            MB-42825 tighten children waiting

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3934 contains query commit ddc4d65 with commit message: MB-42825 tighten children waiting
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9209 contains query commit b65840a with commit message:
            MB-42825 tighten children waiting

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9209 contains query commit b65840a with commit message: MB-42825 tighten children waiting
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Marco Greco Are we waiting for any further fixes? If not, can this be resolved now ?

            Show
            mihir.kamdar Mihir Kamdar added a comment - Marco Greco Are we waiting for any further fixes? If not, can this be resolved now ?

              People

              Assignee:
              pierre.regazzoni Pierre Regazzoni
              Reporter:
              btburnett3 Brant Burnett
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                    PagerDuty