summaryrefslogtreecommitdiff
path: root/fs/exfat
AgeCommit message (Collapse)Author
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-23exfat: resolve memory leak from exfat_create_upcase_table()Daniel Yang
If exfat_load_upcase_table reaches end and returns -EINVAL, allocated memory doesn't get freed and while exfat_load_default_upcase_table allocates more memory, leading to a memory leak. Here's link to syzkaller crash report illustrating this issue: https://syzkaller.appspot.com/text?tag=CrashReport&x=1406c201980000 Reported-by: syzbot+e1c69cadec0f1a078e3d@syzkaller.appspotmail.com Fixes: a13d1a4de3b0 ("exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper") Cc: stable@vger.kernel.org Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-23exfat: move extend valid_size into ->page_mkwrite()Yuezhang Mo
It is not a good way to extend valid_size to the end of the mmap area by writing zeros in mmap. Because after calling mmap, no data may be written, or only a small amount of data may be written to the head of the mmap area. This commit moves extending valid_size to exfat_page_mkwrite(). In exfat_page_mkwrite() only extend valid_size to the starting position of new data writing, which reduces unnecessary writing of zeros. If the block is not mapped and is marked as new after being mapped for writing, block_write_begin() will zero the page cache corresponding to the block, so there is no need to call zero_user_segment() in exfat_file_zeroed_range(). And after moving extending valid_size to exfat_page_mkwrite(), the data written by mmap will be copied to the page cache but the page cache may be not mapped to the disk. Calling zero_user_segment() will cause the data written by mmap to be cleared. So this commit removes calling zero_user_segment() from exfat_file_zeroed_range() and renames exfat_file_zeroed_range() to exfat_extend_valid_size(). Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-18exfat: fix memory leak in exfat_load_bitmap()Yuezhang Mo
If the first directory entry in the root directory is not a bitmap directory entry, 'bh' will not be released and reassigned, which will cause a memory leak. Fixes: 1e49a94cf707 ("exfat: add bitmap operations") Cc: stable@vger.kernel.org Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-18exfat: Implement sops->shutdown and ioctlDongliang Cui
We found that when writing a large file through buffer write, if the disk is inaccessible, exFAT does not return an error normally, which leads to the writing process not stopping properly. To easily reproduce this issue, you can follow the steps below: 1. format a device to exFAT and then mount (with a full disk erase) 2. dd if=/dev/zero of=/exfat_mount/test.img bs=1M count=8192 3. eject the device You may find that the dd process does not stop immediately and may continue for a long time. The root cause of this issue is that during buffer write process, exFAT does not need to access the disk to look up directory entries or the FAT table (whereas FAT would do) every time data is written. Instead, exFAT simply marks the buffer as dirty and returns, delegating the writeback operation to the writeback process. If the disk cannot be accessed at this time, the error will only be returned to the writeback process, and the original process will not receive the error, so it cannot be returned to the user side. When the disk cannot be accessed normally, an error should be returned to stop the writing process. Implement sops->shutdown and ioctl to shut down the file system when underlying block device is marked dead. Signed-off-by: Dongliang Cui <dongliang.cui@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-17exfat: do not fallback to buffered writeYuezhang Mo
After commit(11a347fb6cef exfat: change to get file size from DataLength), the remaining area or hole had been filled with zeros before calling exfat_direct_IO(), so there is no need to fallback to buffered write, and ->i_size_aligned is no longer needed, drop it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-09-17exfat: drop ->i_size_ondiskYuezhang Mo
->i_size_ondisk is no longer used by exfat_write_begin() after commit(11a347fb6cef exfat: change to get file size from DataLength), drop it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-08-07fs: Convert aops->write_begin to take a folioMatthew Wilcox (Oracle)
Convert all callers from working on a page to working on one page of a folio (support for working on an entire folio can come later). Removes a lot of folio->page->folio conversions. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-07fs: Convert aops->write_end to take a folioMatthew Wilcox (Oracle)
Most callers have a folio, and most implementations operate on a folio, so remove the conversion from folio->page->folio to fit through this interface. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-17Merge tag 'exfat-for-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Fix deadlock issue reported by syzbot - Handle idmapped mounts * tag 'exfat-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix potential deadlock on __exfat_get_dentry_set exfat: handle idmapped mounts
2024-07-15exfat: fix potential deadlock on __exfat_get_dentry_setSungjong Seo
When accessing a file with more entries than ES_MAX_ENTRY_NUM, the bh-array is allocated in __exfat_get_entry_set. The problem is that the bh-array is allocated with GFP_KERNEL. It does not make sense. In the following cases, a deadlock for sbi->s_lock between the two processes may occur. CPU0 CPU1 ---- ---- kswapd balance_pgdat lock(fs_reclaim) exfat_iterate lock(&sbi->s_lock) exfat_readdir exfat_get_uniname_from_ext_entry exfat_get_dentry_set __exfat_get_dentry_set kmalloc_array ... lock(fs_reclaim) ... evict exfat_evict_inode lock(&sbi->s_lock) To fix this, let's allocate bh-array with GFP_NOFS. Fixes: a3ff29a95fde ("exfat: support dynamic allocate bh for exfat_entry_set_cache") Cc: stable@vger.kernel.org # v6.2+ Reported-by: syzbot+412a392a2cd4a65e71db@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/000000000000fef47e0618c0327f@google.com Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-07-15exfat: handle idmapped mountsMichael Jeanson
Pass the idmapped mount information to the different helper functions. Adapt the uid/gid checks in exfat_setattr to use the vfsuid/vfsgid helpers. Based on the fat implementation in commit 4b7899368108 ("fat: handle idmapped mounts") by Christian Brauner. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-07-02exfat: Convert to new uid/gid option parsing helpersEric Sandeen
Convert to new uid/gid option parsing helpers Signed-off-by: Eric Sandeen <sandeen@redhat.com> Link: https://lore.kernel.org/r/dda575de-11a7-4139-8a25-07957d311ed3@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-25exfat: zero the reserved fields of file and stream extension dentriesYuezhang Mo
From exFAT specification, the reserved fields should initialize to zero and should not use for any purpose. If create a new dentry set in the UNUSED dentries, all fields had been zeroed when allocating cluster to parent directory. But if create a new dentry set in the DELETED dentries, the reserved fields in file and stream extension dentries may be non-zero. Because only the valid bit of the type field of the dentry is cleared in exfat_remove_entries(), if the type of dentry is different from the original(For example, a dentry that was originally a file name dentry, then set to deleted dentry, and then set as a file dentry), the reserved fields is non-zero. So this commit initializes the dentry to 0 before createing file dentry and stream extension dentry. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-31exfat: fix timing of synchronizing bitmap and inodeYuezhang Mo
Commit(f55c096f62f1 exfat: do not zero the extended part) changed the timing of synchronizing bitmap and inode in exfat_cont_expand(). The change caused xfstests generic/013 to fail if 'dirsync' or 'sync' is enabled. So this commit restores the timing. Fixes: f55c096f62f1 ("exfat: do not zero the extended part") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-21Merge tag 'exfat-for-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Improve dirsync performance by syncing on a dentry-set rather than on a per-directory entry * tag 'exfat-for-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: remove duplicate update parent dir exfat: do not sync parent dir if just update timestamp exfat: remove unused functions exfat: convert exfat_find_empty_entry() to use dentry cache exfat: convert exfat_init_ext_entry() to use dentry cache exfat: move free cluster out of exfat_init_ext_entry() exfat: convert exfat_remove_entries() to use dentry cache exfat: convert exfat_add_entry() to use dentry cache exfat: add exfat_get_empty_dentry_set() helper exfat: add __exfat_get_dentry_set() helper
2024-03-19exfat: remove duplicate update parent dirYuezhang Mo
For renaming, the directory only needs to be updated once if it is in the same directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: do not sync parent dir if just update timestampYuezhang Mo
When sync or dir_sync is enabled, there is no need to sync the parent directory's inode if only for updating its timestamp. 1. If an unexpected power failure occurs, the timestamp of the parent directory is not updated to the storage, which has no impact on the user. 2. The number of writes will be greatly reduced, which can not only improve performance, but also prolong device life. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: remove unused functionsYuezhang Mo
exfat_count_ext_entries() is no longer called, remove it. exfat_update_dir_chksum() is no longer called, remove it and rename exfat_update_dir_chksum_with_entry_set() to it. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_find_empty_entry() to use dentry cacheYuezhang Mo
Before this conversion, each dentry traversed needs to be read from the storage device or page cache. There are at least 16 dentries in a sector. This will result in frequent page cache searches. After this conversion, if all directory entries in a sector are used, the sector only needs to be read once. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_init_ext_entry() to use dentry cacheYuezhang Mo
Before this conversion, in exfat_init_ext_entry(), to init the dentries in a dentry set, the sync times is equals the dentry number if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: move free cluster out of exfat_init_ext_entry()Yuezhang Mo
exfat_init_ext_entry() is an init function, it's a bit strange to free cluster in it. And the argument 'inode' will be removed from exfat_init_ext_entry(). So this commit changes to free the cluster in exfat_remove_entries(). Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_remove_entries() to use dentry cacheYuezhang Mo
Before this conversion, in exfat_remove_entries(), to mark the dentries in a dentry set as deleted, the sync times is equals the dentry numbers if 'dirsync' or 'sync' is enabled. That affects not only performance but also device life. After this conversion, only needs to be synchronized once if 'dirsync' or 'sync' is enabled. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: convert exfat_add_entry() to use dentry cacheYuezhang Mo
After this conversion, if "dirsync" or "sync" is enabled, the number of synchronized dentries in exfat_add_entry() will change from 2 to 1. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: add exfat_get_empty_dentry_set() helperYuezhang Mo
This helper is used to lookup empty dentry set. If there are no enough empty dentries at the input location, this helper will return the number of dentries that need to be skipped for the next lookup. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-19exfat: add __exfat_get_dentry_set() helperYuezhang Mo
Since exfat_get_dentry_set() invokes the validate functions of exfat_validate_entry(), it only supports getting a directory entry set of an existing file, doesn't support getting an empty entry set. To remove the limitation, add this helper. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-03-12mm, slab: remove last vestiges of SLAB_MEM_SPREADLinus Torvalds
Yes, yes, I know the slab people were planning on going slow and letting every subsystem fight this thing on their own. But let's just rip off the band-aid and get it over and done with. I don't want to see a number of unnecessary pull requests just to get rid of a flag that no longer has any meaning. This was mainly done with a couple of 'sed' scripts and then some manual cleanup of the end result. Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-03-01Merge tag 'exfat-for-6.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fix from Namjae Jeon: - Fix ftruncate failure when allocating non-contiguous clusters * tag 'exfat-for-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: fix appending discontinuous clusters to empty file
2024-02-25Merge tag 'pull-fixes.pathwalk-rcu-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU pathwalk fixes from Al Viro: "We still have some races in filesystem methods when exposed to RCU pathwalk. This series is a result of code audit (the second round of it) and it should deal with most of that stuff. Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate(). Up to maintainers (a note for NTFS folks - when documentation says that a method may not block, it *does* imply that blocking allocations are to be avoided. Really)" [ More explanations for people who aren't familiar with the vagaries of RCU path walking: most of it is hidden from filesystems, but if a filesystem actively participates in the low-level path walking it needs to make sure the fields involved in that walk are RCU-safe. That "actively participate in low-level path walking" includes things like having its own ->d_hash()/->d_compare() routines, or by having its own directory permission function that doesn't just use the common helpers. Having a ->d_revalidate() function will also have this issue. Note that instead of making everything RCU safe you can also choose to abort the RCU pathwalk if your operation cannot be done safely under RCU, but that obviously comes with a performance penalty. One common pattern is to allow the simple cases under RCU, and abort only if you need to do something more complicated. So not everything needs to be RCU-safe, and things like the inode etc that the VFS itself maintains obviously already are. But these fixes tend to be about properly RCU-delaying things like ->s_fs_info that are maintained by the filesystem and that got potentially released too early. - Linus ] * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ext4_get_link(): fix breakage in RCU mode cifs_get_link(): bail out in unsafe case fuse: fix UAF in rcu pathwalks procfs: make freeing proc_fs_info rcu-delayed procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() nfs: fix UAF on pathwalk running into umount nfs: make nfs_set_verifier() safe for use in RCU pathwalk afs: fix __afs_break_callback() / afs_drop_open_mmap() race hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper affs: free affs_sb_info with kfree_rcu() rcu pathwalk: prevent bogus hard errors from may_lookup() fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
2024-02-25exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helperAl Viro
That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem that is in process of getting shut down. Besides, having nls and upcase table cleanup moved from ->put_super() towards the place where sbi is freed makes for simpler failure exits. Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-18exfat: fix appending discontinuous clusters to empty fileYuezhang Mo
Eric Hong found that when using ftruncate to expand an empty file, exfat_ent_set() will fail if discontinuous clusters are allocated. The reason is that the empty file does not have a cluster chain, but exfat_ent_set() attempts to append the newly allocated cluster to the cluster chain. In addition, exfat_find_last_cluster() only supports finding the last cluster in a non-empty file. So this commit adds a check whether the file is empty. If the file is empty, exfat_find_last_cluster() and exfat_ent_set() are no longer called as they do not need to be called. Fixes: f55c096f62f1 ("exfat: do not zero the extended part") Reported-by: Eric Hong <erichong@qnap.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-18exfat: fix zero the unwritten part for dio readYuezhang Mo
For dio read, bio will be leave in flight when a successful partial aio read have been setup, blockdev_direct_IO() will return -EIOCBQUEUED. In the case, iter->iov_offset will be not advanced, the oops reported by syzbot will occur if revert iter->iov_offset with iov_iter_revert(). The unwritten part had been zeroed by aio read, so there is no need to zero it in dio read. Reported-by: syzbot+fd404f6b03a58e8bc403@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=fd404f6b03a58e8bc403 Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: do not zero the extended partYuezhang Mo
Since the read operation beyond the ValidDataLength returns zero, if we just extend the size of the file, we don't need to zero the extended part, but only change the DataLength without changing the ValidDataLength. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: change to get file size from DataLengthYuezhang Mo
In stream extension directory entry, the ValidDataLength field describes how far into the data stream user data has been written, and the DataLength field describes the file size. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: using ffs instead of internal logicJohn Sanpe
Replaced the internal table lookup algorithm with ffs of the bitops library with better performance. Use it to increase the single processing length of the exfat_find_free_bitmap function, from single-byte search to long type. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-01-08exfat: using hweight instead of internal logicJohn Sanpe
Replace the internal table lookup algorithm with the hweight library, which has instruction set acceleration capabilities. Use it to increase the length of a single calculation of the exfat_find_free_bitmap function to the long type. Signed-off-by: John Sanpe <sanpeqf@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-11-03exfat: fix ctime is not updatedYuezhang Mo
Commit 4c72a36edd54 ("exfat: convert to new timestamp accessors") removed attr_copy() from exfat_set_attr(). It causes xfstests generic/221 to fail. In xfstests generic/221, it tests ctime should be updated even if futimens() update atime only. But in this case, ctime will not be updated if attr_copy() is removed. attr_copy() may also update other attributes, and removing it may cause other bugs, so this commit restores to call attr_copy() in exfat_set_attr(). Fixes: 4c72a36edd54 ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-11-03exfat: fix setting uninitialized time to ctime/atimeYuezhang Mo
An uninitialized time is set to ctime/atime in __exfat_write_inode(). It causes xfstests generic/003 and generic/192 to fail. And since there will be a time gap between setting ctime/atime to the inode and writing back the inode, so ctime/atime should not be set again when writing back the inode. Fixes: 4c72a36edd54 ("exfat: convert to new timestamp accessors") Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31exfat: support create zero-size directoryYuezhang Mo
This commit adds mount option 'zero_size_dir'. If this option enabled, don't allocate a cluster to directory when creating it, and set the directory size to 0. On Windows, a cluster is allocated for a directory when it is created, so the mount option is disabled by default. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31exfat: support handle zero-size directoryYuezhang Mo
After repairing a corrupted file system with exfatprogs' fsck.exfat, zero-size directories may result. It is also possible to create zero-size directories in other exFAT implementation, such as Paragon ufsd dirver. As described in the specification, the lower directory size limits is 0 bytes. Without this commit, sub-directories and files cannot be created under a zero-size directory, and it cannot be removed. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-31exfat: add ioctls for accessing attributesJan Cincera
Add GET and SET attributes ioctls to enable attribute modification. We already do this in FAT and a few userspace utils made for it would benefit from this also working on exFAT, namely fatattr. Signed-off-by: Jan Cincera <hcincera@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-10-18exfat: convert to new timestamp accessorsJeff Layton
Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-31-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-29Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
2023-08-28Merge tag 'v6.6-vfs.super' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull superblock updates from Christian Brauner: "This contains the super rework that was ready for this cycle. The first part changes the order of how we open block devices and allocate superblocks, contains various cleanups, simplifications, and a new mechanism to wait on superblock state changes. This unblocks work to ultimately limit the number of writers to a block device. Jan has already scheduled follow-up work that will be ready for v6.7 and allows us to restrict the number of writers to a given block device. That series builds on this work right here. The second part contains filesystem freezing updates. Overview: The generic superblock changes are rougly organized as follows (ignoring additional minor cleanups): (1) Removal of the bd_super member from struct block_device. This was a very odd back pointer to struct super_block with unclear rules. For all relevant places we have other means to get the same information so just get rid of this. (2) Simplify rules for superblock cleanup. Roughly, everything that is allocated during fs_context initialization and that's stored in fs_context->s_fs_info needs to be cleaned up by the fs_context->free() implementation before the superblock allocation function has been called successfully. After sget_fc() returned fs_context->s_fs_info has been transferred to sb->s_fs_info at which point sb->kill_sb() if fully responsible for cleanup. Adhering to these rules means that cleanup of sb->s_fs_info in fill_super() is to be avoided as it's brittle and inconsistent. Cleanup shouldn't be duplicated between sb->put_super() as sb->put_super() is only called if sb->s_root has been set aka when the filesystem has been successfully born (SB_BORN). That complexity should be avoided. This also means that block devices are to be closed in sb->kill_sb() instead of sb->put_super(). More details in the lower section. (3) Make it possible to lookup or create a superblock before opening block devices There's a subtle dependency on (2) as some filesystems did rely on fill_super() to be called in order to correctly clean up sb->s_fs_info. All these filesystems have been fixed. (4) Switch most filesystem to follow the same logic as the generic mount code now does as outlined in (3). (5) Use the superblock as the holder of the block device. We can now easily go back from block device to owning superblock. (6) Export and extend the generic fs_holder_ops and use them as holder ops everywhere and remove the filesystem specific holder ops. (7) Call from the block layer up into the filesystem layer when the block device is removed, allowing to shut down the filesystem without risk of deadlocks. (8) Get rid of get_super(). We can now easily go back from the block device to owning superblock and can call up from the block layer into the filesystem layer when the device is removed. So no need to wade through all registered superblock to find the owning superblock anymore" Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/ * tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits) super: use higher-level helper for {freeze,thaw} super: wait until we passed kill super super: wait for nascent superblocks super: make locking naming consistent super: use locking helpers fs: simplify invalidate_inodes fs: remove get_super block: call into the file system for ioctl BLKFLSBUF block: call into the file system for bdev_mark_dead block: consolidate __invalidate_device and fsync_bdev block: drop the "busy inodes on changed media" log message dasd: also call __invalidate_device when setting the device offline amiflop: don't call fsync_bdev in FDFMTBEG floppy: call disk_force_media_change when changing the format block: simplify the disk_force_media_change interface nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl xfs use fs_holder_ops for the log and RT devices xfs: drop s_umount over opening the log and RT devices ext4: use fs_holder_ops for the log device ext4: drop s_umount over opening the log device ...
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-10exfat: free the sbi and iocharset in ->kill_sbChristoph Hellwig
As a rule of thumb everything allocated to the fs_context and moved into the super_block should be freed by ->kill_sb so that the teardown handling doesn't need to be duplicated between the fill_super error path and put_super. Implement an exfat-specific kill_sb method to do that and share the code with the mount contex free helper for the mount error handling case. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230809220545.1308228-11-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-10exfat: don't RCU-free the sbiChristoph Hellwig
There are no RCU critical sections for accessing any information in the sbi, so drop the call_rcu indirection for freeing the sbi. Signed-off-by: Christoph Hellwig <hch@lst.de> Message-Id: <20230809220545.1308228-10-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-09fs: pass the request_mask to generic_fillattrJeff Layton
generic_fillattr just fills in the entire stat struct indiscriminately today, copying data from the inode. There is at least one attribute (STATX_CHANGE_COOKIE) that can have side effects when it is reported, and we're looking at adding more with the addition of multigrain timestamps. Add a request_mask argument to generic_fillattr and have most callers just pass in the value that is passed to getattr. Have other callers (e.g. ksmbd) just pass in STATX_BASIC_STATS. Also move the setting of STATX_CHANGE_COOKIE into generic_fillattr. Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: "Paulo Alcantara (SUSE)" <pc@manguebit.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230807-mgctime-v7-2-d1dec143a704@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-06vfs: get rid of old '->iterate' directory operationLinus Torvalds
All users now just use '->iterate_shared()', which only takes the directory inode lock for reading. Filesystems that never got convered to shared mode now instead use a wrapper that drops the lock, re-takes it in write mode, calls the old function, and then downgrades the lock back to read mode. This way the VFS layer and other callers no longer need to care about filesystems that never got converted to the modern era. The filesystems that use the new wrapper are ceph, coda, exfat, jfs, ntfs, ocfs2, overlayfs, and vboxsf. Honestly, several of them look like they really could just iterate their directories in shared mode and skip the wrapper entirely, but the point of this change is to not change semantics or fix filesystems that haven't been fixed in the last 7+ years, but to finally get rid of the dual iterators. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-02fs: add CONFIG_BUFFER_HEADChristoph Hellwig
Add a new config option that controls building the buffer_head code, and select it from all file systems and stacking drivers that need it. For the block device nodes and alternative iomap based buffered I/O path is provided when buffer_head support is not enabled, and iomap needs a a small tweak to define the IOMAP_F_BUFFER_HEAD flag to 0 to not call into the buffer_head code when it doesn't exist. Otherwise this is just Kconfig and ifdef changes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230801172201.1923299-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>