Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6219

items are not marked as deleted/expired in couchstore after they expire (View query results with stale=false include expired items)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Security Level: Public
    • Labels:
      None
    • Environment:
      build#1580 on Ubuntu 64bit

      Description

      View query results with stale=false include expired items.

      Steps to reproduce(build#1580):
      1. Create default bucket
      2. Load 10 json docs with expiry set to 30 seconds.
      3. Create a view(default map func) and query with stale=false.
      4. Wait for 2-3 minutes.
      5. Query view again with stale=false.

      Some of the items are still returned in the query results even when index is rebuilt.
      I observed that the number of rows returned by the view query is always the same as curr_items.

      Diagnostics are attached.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        That's not an expected.
        Items are lazily expired by ep-engine, meaning that it will not perform document deletes in the database after the 30 seconds.

        There's no way to control that or know that from the view-engine.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - That's not an expected. Items are lazily expired by ep-engine, meaning that it will not perform document deletes in the database after the 30 seconds. There's no way to control that or know that from the view-engine.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        seems like something can be modified in ep-engine so that when items expire we dont see them in views anymore

        Show
        farshid Farshid Ghods (Inactive) added a comment - seems like something can be modified in ep-engine so that when items expire we dont see them in views anymore
        Hide
        peter peter added a comment -

        Chiyoung, is this something Jin or Mike can help out with if it's in ep_engine? If not, it may need to be passed to Aaron. Thank you.

        Show
        peter peter added a comment - Chiyoung, is this something Jin or Mike can help out with if it's in ep_engine? If not, it may need to be passed to Aaron. Thank you.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        This was discussed internally a few times, but I don't think any decision was made.

        Mike gave some info in the forum to a user about this:

        http://www.couchbase.com/forums/thread/expiration-time-docs-dp4

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - This was discussed internally a few times, but I don't think any decision was made. Mike gave some info in the forum to a user about this: http://www.couchbase.com/forums/thread/expiration-time-docs-dp4
        Hide
        chiyoung Chiyoung Seo added a comment -

        The item or expiry pager wasn't scheduled yet to clear up all expired items from memory hashtable and disk. That's why you still see those expired items in the view query.

        The item pager will be scheduled if the current memory usage is above high water mark. The expiry pager will be scheduled once every hour by default, but you can change the expiry pager's interval to a shorter period (e.g., 5 minutes) at runtime.

        Show
        chiyoung Chiyoung Seo added a comment - The item or expiry pager wasn't scheduled yet to clear up all expired items from memory hashtable and disk. That's why you still see those expired items in the view query. The item pager will be scheduled if the current memory usage is above high water mark. The expiry pager will be scheduled once every hour by default, but you can change the expiry pager's interval to a shorter period (e.g., 5 minutes) at runtime.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Dipti,

        this means that users will see the expired items in the index for sometimes up to an hour which is the default value for the expiry pager.

        Show
        farshid Farshid Ghods (Inactive) added a comment - Dipti, this means that users will see the expired items in the index for sometimes up to an hour which is the default value for the expiry pager.
        Hide
        dipti Dipti Borkar added a comment -

        Peter, as discussed this is something we should be able to do at query time. We do need to fix this for 2.0. Can you please help understand the options with the view engine team?
        Let me know if you need additional feedback from me.

        Show
        dipti Dipti Borkar added a comment - Peter, as discussed this is something we should be able to do at query time. We do need to fix this for 2.0. Can you please help understand the options with the view engine team? Let me know if you need additional feedback from me.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        There's no efficient way to do this in view engine. It would imply for each stale=false request to scan all documents in every vbucket and check if they expired, not to mention other smaller issues.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - There's no efficient way to do this in view engine. It would imply for each stale=false request to scan all documents in every vbucket and check if they expired, not to mention other smaller issues.
        Hide
        peter peter added a comment -

        Deep, can you confirm that after an hour, once the expiry pager has run and the next time the indexes are updated, they disappear from the view? If so, then we don't have a bug. There is still a valid discussion going on about how the situation around queries can be improved but I want to find out if things are working as designed for now.

        Show
        peter peter added a comment - Deep, can you confirm that after an hour, once the expiry pager has run and the next time the indexes are updated, they disappear from the view? If so, then we don't have a bug. There is still a valid discussion going on about how the situation around queries can be improved but I want to find out if things are working as designed for now.
        Hide
        deepkaran.salooja Deepkaran Salooja added a comment -

        Yes, that's correct. Once the expiry pager has run and indexes have been updated, the queries do not return the expired items.

        Show
        deepkaran.salooja Deepkaran Salooja added a comment - Yes, that's correct. Once the expiry pager has run and indexes have been updated, the queries do not return the expired items.
        Hide
        perry Perry Krug added a comment -

        Just a thought as I came across this bug. What if for each query result, the query engine contacted memcached to see if each doc was still valid before including it in the query response? That way, the view engine wouldn't have to keep track of all documents in all vbuckets, only the ones that it is sending out. This would not only take care of expiration (since memcached would return "not_found") but also deleted documents that have not yet been removed from disk. Rather than doing a 'get' (which would fetch it from disk in DGM), we could use the "stats key" operation to just check whether the key is still valid within memcached. Since there would be a bit (ableit small) amount of overhead on the query response, this could be an optional check?

        The rows would eventually get cleaned from the index, this is just preventing the client from getting a massive amount of already expired items at the minute 59 mark before the hourly process is run.

        Show
        perry Perry Krug added a comment - Just a thought as I came across this bug. What if for each query result, the query engine contacted memcached to see if each doc was still valid before including it in the query response? That way, the view engine wouldn't have to keep track of all documents in all vbuckets, only the ones that it is sending out. This would not only take care of expiration (since memcached would return "not_found") but also deleted documents that have not yet been removed from disk. Rather than doing a 'get' (which would fetch it from disk in DGM), we could use the "stats key" operation to just check whether the key is still valid within memcached. Since there would be a bit (ableit small) amount of overhead on the query response, this could be an optional check? The rows would eventually get cleaned from the index, this is just preventing the client from getting a massive amount of already expired items at the minute 59 mark before the hourly process is run.
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Thanks for the suggestion Perry.
        Unfortunately it wouldn't work for several reasons.

        First view-engine has no way to communicate with memcached currently.

        Second, it would slow things down significantly.

        Third, how could that work for reduces? For precomputed reduce values, which is the strength of couchdb's btrees + mapreduce, how do you "unreduce", exclude values produced for expired documents, and re-compute reductions? Not only you would need to know the map values produced by expired documents, you would also need to know the map values for the non-expired documents. Not to mention the big performance penalty here.

        There's a lot of other technical issues that would impact either correctness, the incremental view update approach or performance. Those 3 listed above are just the ones people in general not familiar with implementation/design would grasp quickly.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Thanks for the suggestion Perry. Unfortunately it wouldn't work for several reasons. First view-engine has no way to communicate with memcached currently. Second, it would slow things down significantly. Third, how could that work for reduces? For precomputed reduce values, which is the strength of couchdb's btrees + mapreduce, how do you "unreduce", exclude values produced for expired documents, and re-compute reductions? Not only you would need to know the map values produced by expired documents, you would also need to know the map values for the non-expired documents. Not to mention the big performance penalty here. There's a lot of other technical issues that would impact either correctness, the incremental view update approach or performance. Those 3 listed above are just the ones people in general not familiar with implementation/design would grasp quickly.
        Hide
        peter peter added a comment -

        Filipe explained in the comments how this works. To speed up deletion from the indexes, the expiry pager interval can be changed which may have an adverse effect in performance.

        Show
        peter peter added a comment - Filipe explained in the comments how this works. To speed up deletion from the indexes, the expiry pager interval can be changed which may have an adverse effect in performance.

          People

          • Assignee:
            peter peter
            Reporter:
            deepkaran.salooja Deepkaran Salooja
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes