Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
None
-
Untriaged
-
0
-
Unknown
Description
The error occurs when attempting to write a block that is larger than the remaining size of the segment.
1. If the first write and the growLog call were successful, a new segment was added.
2. The attempt to write to this new segment failed.
The whole Append operation is failed. Then we have the tailOffset remains the same and pointing to the previous segment, even though a new segment has already been created and added. When a retry occurs, it incorrectly writes to the new segment as though it were the previous one, leading to corruption of the log.
This issue is caused by the assumption that the tailOffset always points to the last segment.
The problem was simulated in GSI test with 90% of write errors injected. The recoveryCleaner panics due to checksum failure.
panic: LSS /tmp/gsi_testdir/devel_0_0/docIndex/recovery(shard16603717930189774736) : fatal: LSS Block is corrupted
|
|
goroutine 227 [running]:
|
github.com/couchbase/plasma.(*Shard).raisePanic(...)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/shard.go:1277
|
github.com/couchbase/plasma.(*LSSCtx).raiseCorrupted(0xc00041c3c0, {0x100b6c9e0?, 0xc0003ea190})
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/lssctx.go:202 +0x9d
|
github.com/couchbase/plasma.(*RecoveryCleaner).CleanLSS.func3(0x1, 0xd?, {0x100b6c9e0?, 0xc0003ea190})
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/recovery_cleaner.go:315 +0xd9
|
github.com/couchbase/plasma.(*RecoveryCleaner).CleanLSS.func6(0x1fffff72, 0x1fffff76?, {0x134ae6f76, 0x2042, 0x2a08a}, 0x0)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/recovery_cleaner.go:396 +0x15b7
|
github.com/couchbase/plasma.(*lsStore).RunCleaner.func1(0xc00046e7e0?, {0x134ae6f76?, 0xc001194880?, 0x100012d48?}, 0x101a4e108?)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/lss.go:616 +0x47
|
github.com/couchbase/plasma.(*lsStore).Visitor2(0xc00053fb08?, 0xc0634be8c2?, 0x20711000, 0xc00053fb38, 0xc001194880)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/lss.go:652 +0x11c
|
github.com/couchbase/plasma.(*lsStore).RunCleaner(0xc00046e7e0, 0xc0017fc370, 0xc00053fd08?)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/lss.go:630 +0x125
|
github.com/couchbase/plasma.(*LSSCtx).LSSRunCleaner(...)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/lssctx.go:1016
|
github.com/couchbase/plasma.(*RecoveryCleaner).CleanLSS(0xc00061a100, 0xc0000b9f00)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/recovery_cleaner.go:526 +0x84e
|
github.com/couchbase/plasma.(*RecoveryCleaner).lssCleanerDaemon(0xc00061a100)
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/recovery_cleaner.go:651 +0x22e
|
created by github.com/couchbase/plasma.(*RecoveryCleaner).Start
|
/Users/rvncerr/couchbase/goproj/src/github.com/couchbase/plasma/recovery_cleaner.go:755 +0x5a
|
Head and tail offsets of corrupted block were additionally printed:
RECOVERY CLEANER 536870770 536879032
|
This block was located across two segments.
16:27:17 plasma % ls -la /tmp/gsi_testdir/devel_0_0/docIndex/recovery
|
total 1132512
|
drwxr-xr-x 6 rvncerr wheel 192 Apr 23 15:47 .
|
drwxr-xr-x 6 rvncerr wheel 192 Apr 23 15:30 ..
|
-rwxr-xr-x 1 rvncerr wheel 107 Apr 23 15:47 error
|
-rwxr-xr-x 1 rvncerr wheel 8192 Apr 23 15:47 header.data
|
-rwxr-xr-x 1 rvncerr wheel 536870912 Apr 23 15:41 log.00000000000000.data
|
-rwxr-xr-x 1 rvncerr wheel 536965120 Apr 23 15:47 log.00000000000001.data
|
The issue is reproducible in the following unit-test.
func testLogWriteFailAcrossFiles(t *testing.T, path string) {
|
// blockSize < segmentSize < 2*blockSize |
segmentSize := int64(1024) |
blockSize := int64(1008) |
|
Env = env.NewUnstableEnvironment(0, 0, 100) |
defer func() {
|
Env = &env.DefaultEnvironment
|
}()
|
|
l, _ := newLog(path, segmentSize, syncMode, ioMode, true, env.DefaultEnvironment) |
|
makeBlock := func(sz int64, val byte) []byte { |
rv := make([]byte, sz) |
for i := range rv { |
rv[i] = val
|
}
|
return rv |
}
|
checkBlock := func(bs []byte, val byte) bool { |
for i := range bs { |
if bs[i] != val { |
return false |
}
|
}
|
return true |
}
|
|
bs0 := makeBlock(blockSize, 0) |
bs1 := makeBlock(blockSize, 1) |
|
Env.SetErrorRateW(99) |
for { |
err := l.Append(bs0)
|
if err != nil { |
fmt.Printf("bs0 append err : %v\n", err) |
} else { |
break |
}
|
}
|
Env.SetErrorRateW(0) |
l.Commit()
|
|
Env.SetErrorRateW(99) |
for { |
err := l.Append(bs1)
|
if err != nil { |
fmt.Printf("bs1 append err : %v\n", err) |
} else { |
break |
}
|
}
|
Env.SetErrorRateW(0) |
l.Commit()
|
|
rbs0 := make([]byte, blockSize) |
rbs1 := make([]byte, blockSize) |
l.Read(rbs0, 0) |
l.Read(rbs1, blockSize)
|
|
if !checkBlock(rbs0, 0) { |
t.Fatalf("rbs0 is corrupted : %v", rbs0) |
}
|
if !checkBlock(rbs1, 1) { |
t.Fatalf("rbs1 is corrupted : %v", rbs1) |
}
|
|
l.Close()
|
}
|
Attachments
Issue Links
- is a backport of
-
MB-61641 Append corrupts log when disk errors are present
- Resolved