Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6538

In rare cases CRC codes dont match when reading data from couch file

    Details

      Description

      I experimented with building index on 6 cluster_run nodes and 9E6 simple docs. Everything went fine and results appeared right, but I'm seeing

      [ns_server:debug,2012-09-05T21:31:14.218,n_5@10.17.21.241:compaction_daemon:compaction_daemon:schedule_next_compaction:1204]Finished compaction too soon. Next run will be in 30s
      [couchdb:error,2012-09-05T21:31:14.296,n_2@10.17.21.241:<0.9681.0>:couch_log:error:42]Set view `default`, replica group `_design/dev_t`, doc loader error
      error:

      {file_corruption,<<"file corruption">>}

      stacktrace: [

      {couch_file,pread_iolist,2}

      ,

      {couch_db,open_doc_int,3}

      ,

      {couch_set_view_updater,load_doc,4}

      ,

      {couch_set_view_updater,'-load_changes/7-fun-0-',6}

      ,

      {couch_btree,stream_kv_node2,8}

      ,

      {couch_btree,stream_kp_node,7}

      ,

      {couch_btree,fold,4}

      ,

      {couch_db,enum_docs_since,5}

      ]

      [couchdb:error,2012-09-05T21:31:14.297,n_2@10.17.21.241:<0.6715.0>:couch_log:error:42]Set view `default`, replica group `_design/dev_t`, received error from updater:

      {file_corruption, <<"file corruption">>}

      [couchdb:info,2012-09-05T21:31:17.856,n_2@10.17.21.241:<0.6715.0>:couch_log:info:39]Starting updater for set view `default`, replica group `_design/dev_t`
      [couchdb:info,2012-09-05T21:31:17.856,n_2@10.17.21.241:<0.9753.0>:couch_log:info:39]Updater for set view `default`, replica group `_design/dev_t` started

      in logs. Will attach logs from this box.

      1. 158.couch.4.xz
        350 kB
        Aleksey Kondratenko
      2. 252.couch.1.xz
        1.13 MB
        Aleksey Kondratenko
      3. 253.couch.1.xz
        1.13 MB
        Aleksey Kondratenko
      4. ns-diag-20120905213312.txt.xz
        411 kB
        Aleksey Kondratenko
      5. ns-diag-20121023170207.txt.xz
        832 kB
        Aleksey Kondratenko
      1. corrupt2.png
        166 kB
      2. corruption.png
        383 kB
      3. Untitled 2 vs Untitled.png
        167 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        karan Karan Kumar (Inactive) added a comment -

        Which build?

        Show
        karan Karan Kumar (Inactive) added a comment - Which build?
        Hide
        karan Karan Kumar (Inactive) added a comment -

        ohh. cluster_run

        Show
        karan Karan Kumar (Inactive) added a comment - ohh. cluster_run
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        This happens when reading from a database file, not from an index file.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - This happens when reading from a database file, not from an index file.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        corrupted files attached

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - corrupted files attached
        Hide
        aaron Aaron Miller (Inactive) added a comment -

        in the corrupted doc in 252.couch it looks like the file got stomped on by one byte. Both docs have the same CRC, and should have the same data, but this byte got messed up somehow.

        Show
        aaron Aaron Miller (Inactive) added a comment - in the corrupted doc in 252.couch it looks like the file got stomped on by one byte. Both docs have the same CRC, and should have the same data, but this byte got messed up somehow.
        Hide
        aaron Aaron Miller (Inactive) added a comment -

        see attached screenshot

        Show
        aaron Aaron Miller (Inactive) added a comment - see attached screenshot
        Hide
        aaron Aaron Miller (Inactive) added a comment -

        other file (253.couch.1)

        Show
        aaron Aaron Miller (Inactive) added a comment - other file (253.couch.1)
        Hide
        aaron Aaron Miller (Inactive) added a comment -

        I don't understand the name change here. The files in question were never compacted.

        Show
        aaron Aaron Miller (Inactive) added a comment - I don't understand the name change here. The files in question were never compacted.
        Hide
        kzeller kzeller added a comment -

        Added to beta release notes: In rare cases codes used to test for data corruption (CRC, checksum) codes do not match when reading data from couch
        file.

        Show
        kzeller kzeller added a comment - Added to beta release notes: In rare cases codes used to test for data corruption (CRC, checksum) codes do not match when reading data from couch file.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Aliaksey,

        did you use RAM disk for persistence when running this test ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - Aliaksey, did you use RAM disk for persistence when running this test ?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        No. Don't understand why this would matter. Any (well except for direct io) write to filesystem is write to kernel's page cache first.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - No. Don't understand why this would matter. Any (well except for direct io) write to filesystem is write to kernel's page cache first.
        Hide
        damien damien added a comment -

        We think this was a regression, possibly a dangling pointer, in the ep-engine that has since been fixed. Please reopen if there is another instance of the recently.

        Show
        damien damien added a comment - We think this was a regression, possibly a dangling pointer, in the ep-engine that has since been fixed. Please reopen if there is another instance of the recently.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        got this again

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - got this again
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        vbucket in question was in bucket other which was populated by incoming xdcr

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - vbucket in question was in bucket other which was populated by incoming xdcr
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        attaching diags from node having that badness

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - attaching diags from node having that badness
        Hide
        aaron Aaron Miller (Inactive) added a comment -

        Single byte error again.

        Show
        aaron Aaron Miller (Inactive) added a comment - Single byte error again.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Sorry folks, found that my box actually has bad RAM.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Sorry folks, found that my box actually has bad RAM.

          People

          • Assignee:
            aaron Aaron Miller (Inactive)
            Reporter:
            alkondratenko Aleksey Kondratenko (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes