Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5095

Failures on Backing up of Data with Large WAL size on 1.8.1 throws error : Database disk image is malformed

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.8.1
    • Fix Version/s: 2.0
    • Component/s: tools
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Operating System : Ubuntu Single Node
      Branch : 1.8.1-753

      Description

      Backup fails for files w/ large size WAL[ 5M plus]. Backup is successful for smaller WAL sizes [default :1000].

      Steps to Reproduce this issue
      ------------------------------------------------------

      Load 1M of data : [/opt/couchbase/bin/memcachetest -h localhost:11211 -i 1000000 -M 1024 -K cb_2 -l ]
      Run a backup of the data [ while database is online] : sudo -su couchbase /opt/couchbase/bin/cbbackup /opt/couchbase/var/lib/couchbase/data/default-data/default /tmp/rev7

      Errors
      --------------------------------------------------------
      ..
      Backup of default done, check integrity now
      ok
      Vacuum of default done
      Backup of default-0.mb done, check integrity now

          • in database main ***
            On tree page 395901 cell 3: 2nd reference to page 395903
            On tree page 426067 cell 0: 2nd reference to page 410862
            On tree page 426067 cell 1: 2nd reference to page 410863
            .
            .
            .
            .

      On tree page 461483 cell 2: 2nd reference to page 462480
      On tree page 467250 cell 1: 2nd reference to page 440802
      On tree page 472512 cell 1: 2nd reference to page 471560
      On tree page 358585 cell 1: 2nd reference to page 358586
      On tree page 474942 cell 1: 2nd reference to page 473623
      Page 491172: btreeInitPage() returns error code 11
      On tree page 485795 cell 89: Child page depth differs
      Page 491173: btreeInitPage() returns error code 11
      Page 491175: btreeInitPage() returns error code 11
      On tree page 336498 cell 0: 2nd reference to page 336497
      On tree page 385778 cell 2: 2nd reference to page 385777
      On tree page 394903 cell 1: 2nd reference to page 394901
      On tree page 427473 cell 1: 2nd reference to page 427474
      On tree page 436046 cell 0: 2nd reference to page 436045
      On tree page 437909 cell 0: 2nd reference to page 437907
      On tree page 444827 cell 1: 2nd reference to page 444046
      On tree page 467244 cell 0: 2nd reference to page 471558
      On tree page 483494 cell 0: 2nd reference to page 469621
      On tree page 376439 cell 1: 2nd reference to page 376440
      On tree page 376453 cell 1: 2nd reference to page 376412
      On tree page 399713 cell 2: 2nd reference to page 399714
      On tree page 411158 cell 1: 2nd reference to page 412526
      On tree page 426543 cell 0: 2nd reference to page 362805
      Page 491168: btreeInitPage() returns error code 11
      On tree page 482558 cell 86: Child page depth differs
      Page 491169: btreeInitPage() returns error code 11
      Page 491170: btreeInitPage() returns error code 11
      Page 491171: btreeInitPage() returns error code 11
      On tree page 385413 cell 1: 2nd reference to page 385411
      On tree page 407221 cell 1: 2nd reference to page 407222
      On tree page 416240 cell 1: 2nd reference to page 416241
      On tree page 426539 cell 1: 2nd reference to page 427355
      On tree page 430171 cell 2: 2nd reference to page 429186
      On tree page 433173 cell 0: 2nd reference to page 433172
      On tree page 434095 cell 1: 2nd reference to page 434096
      On tree page 439678 cell 0: 2nd reference to page 439677
      On tree page 479441 cell 1: 2nd reference to page 479442
      Page 491162: btreeInitPage() returns error code 11
      On tree page 487719 cell 79: Child page depth differs
      Page 491163: btreeInitPage() returns error code 11
      Page 491166: btreeInitPage() returns error code 11
      On tree page 376307 cell 1: 2nd reference to page 376308
      On tree page 385403 cell 1: 2nd reference to page 388156
      On tree page 428240 cell 1: 2nd reference to page 428241
      On tree page 430165 cell 3: 2nd reference to page 430166
      Error: database disk image is malformed
      Vacuum of default-3.mb done

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        ketaki Ketaki Gangal created issue -
        steve Steve Yen made changes -
        Field Original Value New Value
        Assignee Bin Cui [ bcui ] Chiyoung Seo [ chiyoung ]
        steve Steve Yen made changes -
        Assignee Chiyoung Seo [ chiyoung ] Steve Yen [ steve ]
        karan Karan Kumar (Inactive) made changes -
        Fix Version/s 1.8.1 [ 10295 ]
        Fix Version/s 1.8.1-release-candidate [ 10299 ]
        Sprint Status Current Sprint
        Sprint Priority 0
        Hide
        dipti Dipti Borkar added a comment -

        is there a workaround?

        Show
        dipti Dipti Borkar added a comment - is there a workaround?
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Ohh.. Sorry I posted the wrong comment..

        Show
        karan Karan Kumar (Inactive) added a comment - Ohh.. Sorry I posted the wrong comment..
        karan Karan Kumar (Inactive) made changes -
        Comment [ Can we close this? ]
        Hide
        karan Karan Kumar (Inactive) added a comment -

        @Steve: Not sure if you have taken a look at this?.. This bug was wrongly tagged and did not show up in the filter.

        Show
        karan Karan Kumar (Inactive) added a comment - @Steve: Not sure if you have taken a look at this?.. This bug was wrongly tagged and did not show up in the filter.
        peter peter made changes -
        Affects Version/s 1.8.1 [ 10295 ]
        Affects Version/s 1.8.1-release-candidate [ 10299 ]
        peter peter made changes -
        Fix Version/s 2.0 [ 10114 ]
        Fix Version/s 1.8.1 [ 10295 ]
        Hide
        steve Steve Yen added a comment -

        Still trying to reproduce. Running single node centos 1.8.1-910, with 3M items created via a concurrent mix of memcachetest and mcsoda...

        /opt/couchbase/bin/memcachetest -h localhost:11211 -i 1000000 -M 1024 -K cb_0 -l -P 95
        ./pytests/performance/mcsoda.py membase://HOST:8091 max-items=200000 ratio-sets=1.0 vbuckets=1024 doc-gen=0

        While the client load tools were running, backup took forever...

        /opt/couchbase/bin/cbbackup /opt/couchbase/var/lib/couchbase/data/default-data/default /tmp/backup0

        After stopping the client load tools, the backup eventually finished.

        WAL sizes were >1MB, but no malformed issues so far...

        1. ls -al /opt/couchbase/var/lib/couchbase/data/default-data/total 1443244
          drwxr-xr-x 2 couchbase couchbase 4096 2012-06-15 05:11 .
          drwxr-xr-x 3 couchbase couchbase 4096 2012-06-14 08:48 ..
          rw-rr- 1 couchbase couchbase 52224 2012-06-15 04:57 default
          rw-rr- 1 couchbase couchbase 489541632 2012-06-15 05:09 default-0.mb
          rw-rr- 1 couchbase couchbase 32768 2012-06-15 05:08 default-0.mb-shm
          rw-rr- 1 couchbase couchbase 2618984 2012-06-15 05:09 default-0.mb-wal
          rw-rr- 1 couchbase couchbase 326151168 2012-06-15 05:09 default-1.mb
          rw-rr- 1 couchbase couchbase 32768 2012-06-15 05:08 default-1.mb-shm
          rw-rr- 1 couchbase couchbase 1618144 2012-06-15 05:09 default-1.mb-wal
          rw-rr- 1 couchbase couchbase 327179264 2012-06-15 05:09 default-2.mb
          rw-rr- 1 couchbase couchbase 32768 2012-06-15 05:08 default-2.mb-shm
          rw-rr- 1 couchbase couchbase 1473520 2012-06-15 05:09 default-2.mb-wal
          rw-rr- 1 couchbase couchbase 326189056 2012-06-15 05:09 default-3.mb
          rw-rr- 1 couchbase couchbase 32768 2012-06-15 05:08 default-3.mb-shm
          rw-rr- 1 couchbase couchbase 1768008 2012-06-15 05:09 default-3.mb-wal
          rw-rr- 1 couchbase couchbase 32768 2012-06-15 05:11 default-shm
          rw-rr- 1 couchbase couchbase 1084712 2012-06-15 05:11 default-wal
        Show
        steve Steve Yen added a comment - Still trying to reproduce. Running single node centos 1.8.1-910, with 3M items created via a concurrent mix of memcachetest and mcsoda... /opt/couchbase/bin/memcachetest -h localhost:11211 -i 1000000 -M 1024 -K cb_0 -l -P 95 ./pytests/performance/mcsoda.py membase://HOST:8091 max-items=200000 ratio-sets=1.0 vbuckets=1024 doc-gen=0 While the client load tools were running, backup took forever... /opt/couchbase/bin/cbbackup /opt/couchbase/var/lib/couchbase/data/default-data/default /tmp/backup0 After stopping the client load tools, the backup eventually finished. WAL sizes were >1MB, but no malformed issues so far... ls -al /opt/couchbase/var/lib/couchbase/data/default-data/total 1443244 drwxr-xr-x 2 couchbase couchbase 4096 2012-06-15 05:11 . drwxr-xr-x 3 couchbase couchbase 4096 2012-06-14 08:48 .. rw-r r - 1 couchbase couchbase 52224 2012-06-15 04:57 default rw-r r - 1 couchbase couchbase 489541632 2012-06-15 05:09 default-0.mb rw-r r - 1 couchbase couchbase 32768 2012-06-15 05:08 default-0.mb-shm rw-r r - 1 couchbase couchbase 2618984 2012-06-15 05:09 default-0.mb-wal rw-r r - 1 couchbase couchbase 326151168 2012-06-15 05:09 default-1.mb rw-r r - 1 couchbase couchbase 32768 2012-06-15 05:08 default-1.mb-shm rw-r r - 1 couchbase couchbase 1618144 2012-06-15 05:09 default-1.mb-wal rw-r r - 1 couchbase couchbase 327179264 2012-06-15 05:09 default-2.mb rw-r r - 1 couchbase couchbase 32768 2012-06-15 05:08 default-2.mb-shm rw-r r - 1 couchbase couchbase 1473520 2012-06-15 05:09 default-2.mb-wal rw-r r - 1 couchbase couchbase 326189056 2012-06-15 05:09 default-3.mb rw-r r - 1 couchbase couchbase 32768 2012-06-15 05:08 default-3.mb-shm rw-r r - 1 couchbase couchbase 1768008 2012-06-15 05:09 default-3.mb-wal rw-r r - 1 couchbase couchbase 32768 2012-06-15 05:11 default-shm rw-r r - 1 couchbase couchbase 1084712 2012-06-15 05:11 default-wal
        Hide
        steve Steve Yen added a comment -

        When I run memcachetest/mcsoda at full speed, then cbbackup appears to not make any progress. It's likely that cbbackup is unable to acquire file locks, since ep-engine has them and isn't letting go.

        When I run client-load-tools at a slower ops/second (max-ops-per-sec), then cbbackup does finish...

        ./pytests/performance/mcsoda.py membase://10.3.121.192:8091 max-items=200000 ratio-sets=0.1 vbuckets=1024 doc-gen=0 cur-items=200000 max-ops-per-sec=10

        In either case, I haven't reproduced the "image is malformed" issue yet.

        Show
        steve Steve Yen added a comment - When I run memcachetest/mcsoda at full speed, then cbbackup appears to not make any progress. It's likely that cbbackup is unable to acquire file locks, since ep-engine has them and isn't letting go. When I run client-load-tools at a slower ops/second (max-ops-per-sec), then cbbackup does finish... ./pytests/performance/mcsoda.py membase://10.3.121.192:8091 max-items=200000 ratio-sets=0.1 vbuckets=1024 doc-gen=0 cur-items=200000 max-ops-per-sec=10 In either case, I haven't reproduced the "image is malformed" issue yet.
        Hide
        dipti Dipti Borkar added a comment -

        is this using old cbbackup or tap backup? is this still current sprint / P0 ?

        Show
        dipti Dipti Borkar added a comment - is this using old cbbackup or tap backup? is this still current sprint / P0 ?
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        wait until wal file size is less than 10 MB before running cbbackup

        Show
        farshid Farshid Ghods (Inactive) added a comment - wait until wal file size is less than 10 MB before running cbbackup
        steve Steve Yen made changes -
        Sprint Status Current Sprint
        Sprint Priority 0
        peter peter made changes -
        Component/s tools [ 10223 ]
        Component/s backup_restore [ 10070 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Won't Fix [ 2 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            steve Steve Yen
            Reporter:
            ketaki Ketaki Gangal
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes