kernel/drivers/md/bcache/journal.c, branch linux-5.1.y

bcache: Revert "bcache: free heap cache_set->flush_btree in bch_journal_free"

2019-07-26T07:12:55Z

commit ba82c1ac1667d6efb91a268edb13fc9cdaecec9b upstream. This reverts commit 6268dc2c4703aabfb0b35681be709acf4c2826c6. This patch depends on commit c4dc2497d50d ("bcache: fix high CPU occupancy during journal") which is reverted in previous patch. So revert this one too. Fixes: 6268dc2c4703 ("bcache: free heap cache_set->flush_btree in bch_journal_free") Signed-off-by: Coly Li Cc: stable@vger.kernel.org Cc: Shenghui Wang Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

bcache: Revert "bcache: fix high CPU occupancy during journal"

2019-07-26T07:12:55Z

commit 249a5f6da57c28a903c75d81505d58ec8c10030d upstream. This reverts commit c4dc2497d50d9c6fb16aa0d07b6a14f3b2adb1e0. This patch enlarges a race between normal btree flush code path and flush_btree_write(), which causes deadlock when journal space is exhausted. Reverts this patch makes the race window from 128 btree nodes to only 1 btree nodes. Fixes: c4dc2497d50d ("bcache: fix high CPU occupancy during journal") Signed-off-by: Coly Li Cc: stable@vger.kernel.org Cc: Tang Junhui Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

bcache: check CACHE_SET_IO_DISABLE bit in bch_journal()

2019-07-26T07:12:46Z

[ Upstream commit 383ff2183ad16a8842d1fbd9dd3e1cbd66813e64 ] When too many I/O errors happen on cache set and CACHE_SET_IO_DISABLE bit is set, bch_journal() may continue to work because the journaling bkey might be still in write set yet. The caller of bch_journal() may believe the journal still work but the truth is in-memory journal write set won't be written into cache device any more. This behavior may introduce potential inconsistent metadata status. This patch checks CACHE_SET_IO_DISABLE bit at the head of bch_journal(), if the bit is set, bch_journal() returns NULL immediately to notice caller to know journal does not work. Signed-off-by: Coly Li Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: fix failure in journal relplay

2019-05-31T13:43:24Z

[ Upstream commit 631207314d88e9091be02fbdd1fdadb1ae2ed79a ] journal replay failed with messages: Sep 10 19:10:43 ceph kernel: bcache: error on bb379a64-e44e-4812-b91d-a5599871a3b1: bcache: journal entries 2057493-2057567 missing! (replaying 2057493-2076601), disabling caching The reason is in journal_reclaim(), when discard is enabled, we send discard command and reclaim those journal buckets whose seq is old than the last_seq_now, but before we write a journal with last_seq_now, the machine is restarted, so the journal with the last_seq_now is not written to the journal bucket, and the last_seq_wrote in the newest journal is old than last_seq_now which we expect to be, so when we doing replay, journals from last_seq_wrote to last_seq_now are missing. It's hard to write a journal immediately after journal_reclaim(), and it harmless if those missed journal are caused by discarding since those journals are already wrote to btree node. So, if miss seqs are started from the beginning journal, we treat it as normal, and only print a message to show the miss journal, and point out it maybe caused by discarding. Patch v2 add a judgement condition to ignore the missed journal only when discard enabled as Coly suggested. (Coly Li: rebase the patch with other changes in bch_journal_replay()) Signed-off-by: Tang Junhui Tested-by: Dennis Schridde Signed-off-by: Coly Li Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: return error immediately in bch_journal_replay()

2019-05-31T13:43:24Z

[ Upstream commit 68d10e6979a3b59e3cd2e90bfcafed79c4cf180a ] When failure happens inside bch_journal_replay(), calling cache_set_err_on() and handling the failure in async way is not a good idea. Because after bch_journal_replay() returns, registering code will continue to execute following steps, and unregistering code triggered by cache_set_err_on() is running in same time. First it is unnecessary to handle failure and unregister cache set in an async way, second there might be potential race condition to run register and unregister code for same cache set. So in this patch, if failure happens in bch_journal_replay(), we don't call cache_set_err_on(), and just print out the same error message to kernel message buffer, then return -EIO immediately caller. Then caller can detect such failure and handle it in synchrnozied way. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin

bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()

2019-05-22T05:39:55Z

commit 1bee2addc0c8470c8aaa65ef0599eeae96dd88bc upstream. In journal_reclaim() ja->cur_idx of each cache will be update to reclaim available journal buckets. Variable 'int n' is used to count how many cache is successfully reclaimed, then n is set to c->journal.key by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache() loop will write the jset data onto each cache. The problem is, if all jouranl buckets on each cache is full, the following code in journal_reclaim(), 529 for_each_cache(ca, c, iter) { 530 struct journal_device *ja = &ca->journal; 531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets; 532 533 /* No space available on this device */ 534 if (next == ja->discard_idx) 535 continue; 536 537 ja->cur_idx = next; 538 k->ptr[n++] = MAKE_PTR(0, 539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]), 540 ca->sb.nr_this_dev); 541 } 542 543 bkey_init(k); 544 SET_KEY_PTRS(k, n); If there is no available bucket to reclaim, the if() condition at line 534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS() will set KEY_PTRS field of c->journal.key to 0. Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in journal_write_unlocked() the journal data is written in following loop, 649 for (i = 0; i < KEY_PTRS(k); i++) { 650-671 submit journal data to cache device 672 } If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data won't be written to cache device here. If system crahed or rebooted before bkeys of the lost journal entries written into btree nodes, data corruption will be reported during bcache reload after rebooting the system. Indeed there is only one cache in a cache set, there is no need to set KEY_PTRS field in journal_reclaim() at all. But in order to keep the for_each_cache() logic consistent for now, this patch fixes the above problem by not setting 0 KEY_PTRS of journal key, if there is no bucket available to reclaim. Signed-off-by: Coly Li Reviewed-by: Hannes Reinecke Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman

bcache: print number of keys in trace_bcache_journal_write

2018-12-13T15:15:54Z

Sometimes flush journal may be very frequent, so it's useful to dump number of keys every time write journal. Signed-off-by: Guoju Fang Signed-off-by: Coly Li Signed-off-by: Jens Axboe

bcache: add separate workqueue for journal_write to avoid deadlock

2018-09-27T15:47:01Z

After write SSD completed, bcache schedules journal_write work to system_wq, which is a public workqueue in system, without WQ_MEM_RECLAIM flag. system_wq is also a bound wq, and there may be no idle kworker on current processor. Creating a new kworker may unfortunately need to reclaim memory first, by shrinking cache and slab used by vfs, which depends on bcache device. That's a deadlock. This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM flag. It's rescuer thread will work to avoid the deadlock. Signed-off-by: Guoju Fang Cc: stable@vger.kernel.org Signed-off-by: Coly Li Signed-off-by: Jens Axboe

bcache: style fixes for lines over 80 characters

2018-08-11T21:46:41Z

This patch fixes the lines over 80 characters into more lines, to minimize warnings by checkpatch.pl. There are still some lines exceed 80 characters, but it is better to be a single line and I don't change them. Signed-off-by: Coly Li Reviewed-by: Shenghui Wang Signed-off-by: Jens Axboe

bcache: add identifier names to arguments of function definitions

2018-08-11T21:46:41Z

There are many function definitions do not have identifier argument names, scripts/checkpatch.pl complains warnings like this, WARNING: function definition argument 'struct bcache_device *' should also have an identifier name #16735: FILE: writeback.h:120: +void bch_sectors_dirty_init(struct bcache_device *); This patch adds identifier argument names to all bcache function definitions to fix such warnings. Signed-off-by: Coly Li Reviewed: Shenghui Wang Signed-off-by: Jens Axboe