Skip to content

Commit a10b177

Browse files
YuKuai-huaweivegard
authored andcommitted
md/raid5: avoid BUG_ON() while continue reshape after reassembling
[ Upstream commit 305a5170dc5cf3d395bb4c4e9239bca6d0b54b49 ] Currently, mdadm support --revert-reshape to abort the reshape while reassembling, as the test 07revert-grow. However, following BUG_ON() can be triggerred by the test: kernel BUG at drivers/md/raid5.c:6278! invalid opcode: 0000 [#1] PREEMPT SMP PTI irq event stamp: 158985 CPU: 6 PID: 891 Comm: md0_reshape Not tainted 6.9.0-03335-g7592a0b0049a #94 RIP: 0010:reshape_request+0x3f1/0xe60 Call Trace: <TASK> raid5_sync_request+0x43d/0x550 md_do_sync+0xb7a/0x2110 md_thread+0x294/0x2b0 kthread+0x147/0x1c0 ret_from_fork+0x59/0x70 ret_from_fork_asm+0x1a/0x30 </TASK> Root cause is that --revert-reshape update the raid_disks from 5 to 4, while reshape position is still set, and after reassembling the array, reshape position will be read from super block, then during reshape the checking of 'writepos' that is caculated by old reshape position will fail. Fix this panic the easy way first, by converting the BUG_ON() to WARN_ON(), and stop the reshape if checkings fail. Noted that mdadm must fix --revert-shape as well, and probably md/raid should enhance metadata validation as well, however this means reassemble will fail and there must be user tools to fix the wrong metadata. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240611132251.1967786-13-yukuai1@huaweicloud.com Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 2c92f8c1c456d556f15cbf51667b385026b2e6a0) Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
1 parent 20cb648 commit a10b177

1 file changed

Lines changed: 13 additions & 7 deletions

File tree

drivers/md/raid5.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5814,7 +5814,9 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
58145814
safepos = conf->reshape_safe;
58155815
sector_div(safepos, data_disks);
58165816
if (mddev->reshape_backwards) {
5817-
BUG_ON(writepos < reshape_sectors);
5817+
if (WARN_ON(writepos < reshape_sectors))
5818+
return MaxSector;
5819+
58185820
writepos -= reshape_sectors;
58195821
readpos += reshape_sectors;
58205822
safepos += reshape_sectors;
@@ -5832,14 +5834,18 @@ static sector_t reshape_request(struct mddev *mddev, sector_t sector_nr, int *sk
58325834
* to set 'stripe_addr' which is where we will write to.
58335835
*/
58345836
if (mddev->reshape_backwards) {
5835-
BUG_ON(conf->reshape_progress == 0);
5837+
if (WARN_ON(conf->reshape_progress == 0))
5838+
return MaxSector;
5839+
58365840
stripe_addr = writepos;
5837-
BUG_ON((mddev->dev_sectors &
5838-
~((sector_t)reshape_sectors - 1))
5839-
- reshape_sectors - stripe_addr
5840-
!= sector_nr);
5841+
if (WARN_ON((mddev->dev_sectors &
5842+
~((sector_t)reshape_sectors - 1)) -
5843+
reshape_sectors - stripe_addr != sector_nr))
5844+
return MaxSector;
58415845
} else {
5842-
BUG_ON(writepos != sector_nr + reshape_sectors);
5846+
if (WARN_ON(writepos != sector_nr + reshape_sectors))
5847+
return MaxSector;
5848+
58435849
stripe_addr = sector_nr;
58445850
}
58455851

0 commit comments

Comments
 (0)