Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6219

items are not marked as deleted/expired in couchstore after they expire (View query results with stale=false include expired items)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Security Level: Public
    • Labels:
      None
    • Environment:
      build#1580 on Ubuntu 64bit

      Description

      View query results with stale=false include expired items.

      Steps to reproduce(build#1580):
      1. Create default bucket
      2. Load 10 json docs with expiry set to 30 seconds.
      3. Create a view(default map func) and query with stale=false.
      4. Wait for 2-3 minutes.
      5. Query view again with stale=false.

      Some of the items are still returned in the query results even when index is rebuilt.
      I observed that the number of rows returned by the view query is always the same as curr_items.

      Diagnostics are attached.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        peter peter added a comment -

        Filipe explained in the comments how this works. To speed up deletion from the indexes, the expiry pager interval can be changed which may have an adverse effect in performance.

        Show
        peter peter added a comment - Filipe explained in the comments how this works. To speed up deletion from the indexes, the expiry pager interval can be changed which may have an adverse effect in performance.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Thanks for the suggestion Perry.
        Unfortunately it wouldn't work for several reasons.

        First view-engine has no way to communicate with memcached currently.

        Second, it would slow things down significantly.

        Third, how could that work for reduces? For precomputed reduce values, which is the strength of couchdb's btrees + mapreduce, how do you "unreduce", exclude values produced for expired documents, and re-compute reductions? Not only you would need to know the map values produced by expired documents, you would also need to know the map values for the non-expired documents. Not to mention the big performance penalty here.

        There's a lot of other technical issues that would impact either correctness, the incremental view update approach or performance. Those 3 listed above are just the ones people in general not familiar with implementation/design would grasp quickly.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Thanks for the suggestion Perry. Unfortunately it wouldn't work for several reasons. First view-engine has no way to communicate with memcached currently. Second, it would slow things down significantly. Third, how could that work for reduces? For precomputed reduce values, which is the strength of couchdb's btrees + mapreduce, how do you "unreduce", exclude values produced for expired documents, and re-compute reductions? Not only you would need to know the map values produced by expired documents, you would also need to know the map values for the non-expired documents. Not to mention the big performance penalty here. There's a lot of other technical issues that would impact either correctness, the incremental view update approach or performance. Those 3 listed above are just the ones people in general not familiar with implementation/design would grasp quickly.
        Hide
        perry Perry Krug added a comment -

        Just a thought as I came across this bug. What if for each query result, the query engine contacted memcached to see if each doc was still valid before including it in the query response? That way, the view engine wouldn't have to keep track of all documents in all vbuckets, only the ones that it is sending out. This would not only take care of expiration (since memcached would return "not_found") but also deleted documents that have not yet been removed from disk. Rather than doing a 'get' (which would fetch it from disk in DGM), we could use the "stats key" operation to just check whether the key is still valid within memcached. Since there would be a bit (ableit small) amount of overhead on the query response, this could be an optional check?

        The rows would eventually get cleaned from the index, this is just preventing the client from getting a massive amount of already expired items at the minute 59 mark before the hourly process is run.

        Show
        perry Perry Krug added a comment - Just a thought as I came across this bug. What if for each query result, the query engine contacted memcached to see if each doc was still valid before including it in the query response? That way, the view engine wouldn't have to keep track of all documents in all vbuckets, only the ones that it is sending out. This would not only take care of expiration (since memcached would return "not_found") but also deleted documents that have not yet been removed from disk. Rather than doing a 'get' (which would fetch it from disk in DGM), we could use the "stats key" operation to just check whether the key is still valid within memcached. Since there would be a bit (ableit small) amount of overhead on the query response, this could be an optional check? The rows would eventually get cleaned from the index, this is just preventing the client from getting a massive amount of already expired items at the minute 59 mark before the hourly process is run.
        Hide
        deepkaran.salooja Deepkaran Salooja added a comment -

        Yes, that's correct. Once the expiry pager has run and indexes have been updated, the queries do not return the expired items.

        Show
        deepkaran.salooja Deepkaran Salooja added a comment - Yes, that's correct. Once the expiry pager has run and indexes have been updated, the queries do not return the expired items.
        Hide
        peter peter added a comment -

        Deep, can you confirm that after an hour, once the expiry pager has run and the next time the indexes are updated, they disappear from the view? If so, then we don't have a bug. There is still a valid discussion going on about how the situation around queries can be improved but I want to find out if things are working as designed for now.

        Show
        peter peter added a comment - Deep, can you confirm that after an hour, once the expiry pager has run and the next time the indexes are updated, they disappear from the view? If so, then we don't have a bug. There is still a valid discussion going on about how the situation around queries can be improved but I want to find out if things are working as designed for now.

          People

          • Assignee:
            peter peter
            Reporter:
            deepkaran.salooja Deepkaran Salooja
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes