Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"21 hotfixes. 12 are cc:stable and the remainder pertain to post-6.7
issues or aren't considered to be needed in earlier kernel versions"
* tag 'mm-hotfixes-stable-2024-02-10-11-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
nilfs2: fix potential bug in end_buffer_async_write
mm/damon/sysfs-schemes: fix wrong DAMOS tried regions update timeout setup
nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()
MAINTAINERS: Leo Yan has moved
mm/zswap: don't return LRU_SKIP if we have dropped lru lock
fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super
mailmap: switch email address for John Moon
mm: zswap: fix objcg use-after-free in entry destruction
mm/madvise: don't forget to leave lazy MMU mode in madvise_cold_or_pageout_pte_range()
arch/arm/mm: fix major fault accounting when retrying under per-VMA lock
selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros
mm/memory-failure: fix crash in split_huge_page_to_list from soft_offline_page
mm: memcg: optimize parent iteration in memcg_rstat_updated()
nilfs2: fix data corruption in dsync block recovery for small block sizes
mm/userfaultfd: UFFDIO_MOVE implementation should use ptep_get()
exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock)
fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats
fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of lock_task_sighand()
getrusage: use sig->stats_lock rather than lock_task_sighand()
getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
...
|
|
Pull smb server fixes from Steve French:
"Two ksmbd server fixes:
- memory leak fix
- a minor kernel-doc fix"
* tag '6.8-rc3-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read fails
ksmbd: Add kernel-doc for ksmbd_extract_sharename() function
|
|
Pull smb client fixes from Steve French:
- reconnect fix
- multichannel channel selection fix
- minor mount warning fix
- reparse point fix
- null pointer check improvement
* tag '6.8-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb3: clarify mount warning
cifs: handle cases where multiple sessions share connection
cifs: change tcon status when need_reconnect is set on it
smb: client: set correct d_type for reparse points under DFS mounts
smb3: add missing null server pointer check
|
|
Pull ceph fixes from Ilya Dryomov:
"Some fscrypt-related fixups (sparse reads are used only for encrypted
files) and two cap handling fixes from Xiubo and Rishabh"
* tag 'ceph-for-6.8-rc4' of https://github.com/ceph/ceph-client:
ceph: always check dir caps asynchronously
ceph: prevent use-after-free in encode_cap_msg()
ceph: always set initial i_blkbits to CEPH_FSCRYPT_BLOCK_SHIFT
libceph: just wait for more data to be available on the socket
libceph: rename read_sparse_msg_*() to read_partial_sparse_msg_*()
libceph: fail sparse-read if the data length doesn't match
|
|
https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 fixes from Konstantin Komarov:
"Fixed:
- size update for compressed file
- some logic errors, overflows
- memory leak
- some code was refactored
Added:
- implement super_operations::shutdown
Improved:
- alternative boot processing
- reduced stack usage"
* tag 'ntfs3_for_6.8' of https://github.com/Paragon-Software-Group/linux-ntfs3: (28 commits)
fs/ntfs3: Slightly simplify ntfs_inode_printk()
fs/ntfs3: Add ioctl operation for directories (FITRIM)
fs/ntfs3: Fix oob in ntfs_listxattr
fs/ntfs3: Fix an NULL dereference bug
fs/ntfs3: Update inode->i_size after success write into compressed file
fs/ntfs3: Fixed overflow check in mi_enum_attr()
fs/ntfs3: Correct function is_rst_area_valid
fs/ntfs3: Use i_size_read and i_size_write
fs/ntfs3: Prevent generic message "attempt to access beyond end of device"
fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cache
fs/ntfs3: Use kvfree to free memory allocated by kvmalloc
fs/ntfs3: Disable ATTR_LIST_ENTRY size check
fs/ntfs3: Fix c/mtime typo
fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()
fs/ntfs3: Add and fix comments
fs/ntfs3: ntfs3_forced_shutdown use int instead of bool
fs/ntfs3: Implement super_operations::shutdown
fs/ntfs3: Drop suid and sgid bits as a part of fpunch
fs/ntfs3: Add file_modified
fs/ntfs3: Correct use bh_read
...
|
|
When a user tries to use the "sec=krb5p" mount parameter to encrypt
data on connection to a server (when authenticating with Kerberos), we
indicate that it is not supported, but do not note the equivalent
recommended mount parameter ("sec=krb5,seal") which turns on encryption
for that mount (and uses Kerberos for auth). Update the warning message.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Based on our implementation of multichannel, it is entirely
possible that a server struct may not be found in any channel
of an SMB session.
In such cases, we should be prepared to move on and search for
the server struct in the next session.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When a tcon is marked for need_reconnect, the intention
is to have it reconnected.
This change adjusts tcon->status in cifs_tree_connect
when need_reconnect is set. Also, this change has a minor
correction in resetting need_reconnect on success. It makes
sure that it is done with tc_lock held.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Send query dir requests with an info level of
SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than
SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own
inode numbers (e.g. noserverino) so that reparse tags still
can be parsed directly from the responses, but server won't
send UniqueId (server inode number)
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Address static checker warning in cifs_ses_get_chan_index():
warn: variable dereferenced before check 'server'
To be consistent, and reduce risk, we should add another check
for null server pointer.
Fixes: 88675b22d34e ("cifs: do not search for channel if server is terminating")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
According to a syzbot report, end_buffer_async_write(), which handles the
completion of block device writes, may detect abnormal condition of the
buffer async_write flag and cause a BUG_ON failure when using nilfs2.
Nilfs2 itself does not use end_buffer_async_write(). But, the async_write
flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue
with race condition of competition between segments for dirty blocks") as
a means of resolving double list insertion of dirty blocks in
nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the
resulting crash.
This modification is safe as long as it is used for file data and b-tree
node blocks where the page caches are independent. However, it was
irrelevant and redundant to also introduce async_write for segment summary
and super root blocks that share buffers with the backing device. This
led to the possibility that the BUG_ON check in end_buffer_async_write
would fail as described above, if independent writebacks of the backing
device occurred in parallel.
The use of async_write for segment summary buffers has already been
removed in a previous change.
Fix this issue by removing the manipulation of the async_write flag for
the remaining super root block buffer.
Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com
Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot reported a hang issue in migrate_pages_batch() called by mbind()
and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2.
While migrate_pages_batch() locks a folio and waits for the writeback to
complete, the log writer thread that should bring the writeback to
completion picks up the folio being written back in
nilfs_lookup_dirty_data_buffers() that it calls for subsequent log
creation and was trying to lock the folio. Thus causing a deadlock.
In the first place, it is unexpected that folios/pages in the middle of
writeback will be updated and become dirty. Nilfs2 adds a checksum to
verify the validity of the log being written and uses it for recovery at
mount, so data changes during writeback are suppressed. Since this is
broken, an unclean shutdown could potentially cause recovery to fail.
Investigation revealed that the root cause is that the wait for writeback
completion in nilfs_page_mkwrite() is conditional, and if the backing
device does not require stable writes, data may be modified without
waiting.
Fix these issues by making nilfs_page_mkwrite() wait for writeback to
finish regardless of the stable write requirement of the backing device.
Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com
Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When configuring a hugetlb filesystem via the fsconfig() syscall, there is
a possible NULL dereference in hugetlbfs_fill_super() caused by assigning
NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize
is non valid.
E.g: Taking the following steps:
fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC);
fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0);
fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced
with NULL, losing its previous value, and we will print an error:
...
...
case Opt_pagesize:
ps = memparse(param->string, &rest);
ctx->hstate = h;
if (!ctx->hstate) {
pr_err("Unsupported page size %lu MB\n", ps / SZ_1M);
return -EINVAL;
}
return 0;
...
...
This is a problem because later on, we will dereference ctxt->hstate in
hugetlbfs_fill_super()
...
...
sb->s_blocksize = huge_page_size(ctx->hstate);
...
...
Causing below Oops.
Fix this by replacing cxt->hstate value only when then pagesize is known
to be valid.
kernel: hugetlbfs: Unsupported page size 0 MB
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0
kernel: Oops: 0000 [#1] PREEMPT SMP PTI
kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f
kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017
kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
kernel: Call Trace:
kernel: <TASK>
kernel: ? __die_body+0x1a/0x60
kernel: ? page_fault_oops+0x16f/0x4a0
kernel: ? search_bpf_extables+0x65/0x70
kernel: ? fixup_exception+0x22/0x310
kernel: ? exc_page_fault+0x69/0x150
kernel: ? asm_exc_page_fault+0x22/0x30
kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10
kernel: ? hugetlbfs_fill_super+0xb4/0x1a0
kernel: ? hugetlbfs_fill_super+0x28/0x1a0
kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10
kernel: vfs_get_super+0x40/0xa0
kernel: ? __pfx_bpf_lsm_capable+0x10/0x10
kernel: vfs_get_tree+0x25/0xd0
kernel: vfs_cmd_create+0x64/0xe0
kernel: __x64_sys_fsconfig+0x395/0x410
kernel: do_syscall_64+0x80/0x160
kernel: ? syscall_exit_to_user_mode+0x82/0x240
kernel: ? do_syscall_64+0x8d/0x160
kernel: ? syscall_exit_to_user_mode+0x82/0x240
kernel: ? do_syscall_64+0x8d/0x160
kernel: ? exc_page_fault+0x69/0x150
kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76
kernel: RIP: 0033:0x7ffbc0cb87c9
kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48
kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af
kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9
kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003
kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000
kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000
kernel: </TASK>
kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E)
kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E)
kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1
kernel: CR2: 0000000000000028
kernel: ---[ end trace 0000000000000000 ]---
kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0
kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28
kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004
kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000
kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004
kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000
kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400
kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0
Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de
Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The helper function nilfs_recovery_copy_block() of
nilfs_recovery_dsync_blocks(), which recovers data from logs created by
data sync writes during a mount after an unclean shutdown, incorrectly
calculates the on-page offset when copying repair data to the file's page
cache. In environments where the block size is smaller than the page
size, this flaw can cause data corruption and leak uninitialized memory
bytes during the recovery process.
Fix these issues by correcting this byte offset calculation on the page.
Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
lock_task_sighand() can trigger a hard lockup. If NR_CPUS threads call
do_task_stat() at the same time and the process has NR_THREADS, it will
spin with irqs disabled O(NR_CPUS * NR_THREADS) time.
Change do_task_stat() to use sig->stats_lock to gather the statistics
outside of ->siglock protected section, in the likely case this code will
run lockless.
Link: https://lkml.kernel.org/r/20240123153357.GA21857@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
lock_task_sighand()
Patch series "fs/proc: do_task_stat: use sig->stats_".
do_task_stat() has the same problem as getrusage() had before "getrusage:
use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If
NR_CPUS threads call lock_task_sighand() at the same time and the process
has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS *
NR_THREADS) time.
This patch (of 3):
thread_group_cputime() does its own locking, we can safely shift
thread_group_cputime_adjusted() which does another for_each_thread loop
outside of ->siglock protected section.
Not only this removes for_each_thread() from the critical section with
irqs disabled, this removes another case when stats_lock is taken with
siglock held. We want to remove this dependency, then we can change the
users of stats_lock to not disable irqs.
Link: https://lkml.kernel.org/r/20240123153313.GA21832@redhat.com
Link: https://lkml.kernel.org/r/20240123153355.GA21854@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dylan Hatch <dylanbhatch@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in
shmget() call. If SHM_NORESERVE flags is specified then the hugetlb pages
are not reserved. However when the shared memory is attached with the
shmat() call the hugetlb pages are getting reserved incorrectly for
SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug.
-------------------------------
Following test shows the issue.
$cat shmhtb.c
int main()
{
int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE;
int shmid;
shmid = shmget(SKEY, SHMSZ, shmflags);
if (shmid < 0)
{
printf("shmat: shmget() failed, %d\n", errno);
return 1;
}
printf("After shmget()\n");
system("cat /proc/meminfo | grep -i hugepages_");
shmat(shmid, NULL, 0);
printf("\nAfter shmat()\n");
system("cat /proc/meminfo | grep -i hugepages_");
shmctl(shmid, IPC_RMID, NULL);
return 0;
}
#sysctl -w vm.nr_hugepages=20
#./shmhtb
After shmget()
HugePages_Total: 20
HugePages_Free: 20
HugePages_Rsvd: 0
HugePages_Surp: 0
After shmat()
HugePages_Total: 20
HugePages_Free: 20
HugePages_Rsvd: 5 <--
HugePages_Surp: 0
--------------------------------
Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared
memory in the shmat() call.
Link: https://lkml.kernel.org/r/1706040282-12388-1-git-send-email-prakash.sangappa@oracle.com
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it
fails. Seems to be the caller's responsibility to clear the buffer in
error case.
Found by Linux Verification Center (linuxtesting.org).
Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The ksmbd_extract_sharename() function lacked a complete kernel-doc
comment. This patch adds parameter descriptions and detailed function
behavior to improve code readability and maintainability.
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- Address a deadlock regression in RELEASE_LOCKOWNER
* tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: don't take fi_lock in nfsd_break_deleg_cb()
|
|
The MDS will issue the 'Fr' caps for async dirop, while there is
buggy in kclient and it could miss releasing the async dirop caps,
which is 'Fsxr'. And then the MDS will complain with:
"[WRN] client.xxx isn't responding to mclientcaps(revoke) ..."
So when releasing the dirop async requests or when they fail we
should always make sure that being revoked caps could be released.
Link: https://tracker.ceph.com/issues/50223
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Milind Changire <mchangir@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In fs/ceph/caps.c, in encode_cap_msg(), "use after free" error was
caught by KASAN at this line - 'ceph_buffer_get(arg->xattr_buf);'. This
implies before the refcount could be increment here, it was freed.
In same file, in "handle_cap_grant()" refcount is decremented by this
line - 'ceph_buffer_put(ci->i_xattrs.blob);'. It appears that a race
occurred and resource was freed by the latter line before the former
line could increment it.
encode_cap_msg() is called by __send_cap() and __send_cap() is called by
ceph_check_caps() after calling __prep_cap(). __prep_cap() is where
arg->xattr_buf is assigned to ci->i_xattrs.blob. This is the spot where
the refcount must be increased to prevent "use after free" error.
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/59259
Signed-off-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
The fscrypt code will use i_blkbits to setup ci_data_unit_bits when
allocating the new inode, but ceph will initiate i_blkbits ater when
filling the inode, which is too late. Since ci_data_unit_bits will only
be used by the fscrypt framework so initiating i_blkbits with
CEPH_FSCRYPT_BLOCK_SHIFT is safe.
Link: https://tracker.ceph.com/issues/64035
Fixes: 5b1188847180 ("fscrypt: support crypto data unit size less than filesystem block size")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- two fixes preventing deletion and manual creation of subvolume qgroup
- unify error code returned for unknown send flags
- fix assertion during subvolume creation when anonymous device could
be allocated by other thread (e.g. due to backref walk)
* tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: do not ASSERT() if the newly created subvolume already got read
btrfs: forbid deleting live subvol qgroup
btrfs: forbid creating subvol qgroups
btrfs: send: return EOPNOTSUPP on unknown flags
|
|
Pull bcachefs fixes from Kent Overstreet:
"Two serious ones here that we'll want to backport to stable: a fix for
a race in the thread_with_file code, and another locking fixup in the
subvolume deletion path"
* tag 'bcachefs-2024-02-05' of https://evilpiepirate.org/git/bcachefs:
bcachefs: time_stats: Check for last_event == 0 when updating freq stats
bcachefs: install fd later to avoid race with close
bcachefs: unlock parent dir if entry is not found in subvolume deletion
bcachefs: Fix build on parisc by avoiding __multi3()
|
|
A recent change to check_for_locks() changed it to take ->flc_lock while
holding ->fi_lock. This creates a lock inversion (reported by lockdep)
because there is a case where ->fi_lock is taken while holding
->flc_lock.
->flc_lock is held across ->fl_lmops callbacks, and
nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However
it doesn't need to.
Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each
delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list
and so needed the lock. Since then it doesn't walk the list and doesn't
need the lock.
Two actions are performed under the lock. One is to call
nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on
the nfs4_file at all, so don't need the lock.
The other is to set ->fi_had_conflict which is in the nfs4_file.
This field is only ever set here (except when initialised to false)
so there is no possible problem will multiple threads racing when
setting it.
The field is tested twice in nfs4_set_delegation(). The first test does
not hold a lock and is documented as an opportunistic optimisation, so
it doesn't impose any need to hold ->fi_lock while setting
->fi_had_conflict.
The second test in nfs4_set_delegation() *is* make under ->fi_lock, so
removing the locking when ->fi_had_conflict is set could make a change.
The change could only be interesting if ->fi_had_conflict tested as
false even though nfsd_break_one_deleg() ran before ->fi_lock was
unlocked. i.e. while hash_delegation_locked() was running.
As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb()
there can be no importance to this interaction.
So this patch removes the locking from nfsd_break_one_deleg() and moves
the final test on ->fi_had_conflict out of the locked region to make it
clear that locking isn't important to the test. It is still tested
*after* vfs_setlease() has succeeded. This might be significant and as
vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called
under ->flc_lock this "after" is a true ordering provided by a spinlock.
Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This fixes spurious outliers in the frequency stats.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Calling fd_install() makes a file reachable for userland, including the
possibility to close the file descriptor, which leads to calling its
'release' hook. If that happens before the code had a chance to bump the
reference of the newly created task struct, the release callback will
call put_task_struct() too early, leading to the premature destruction
of the kernel thread.
Avoid that race by calling fd_install() later, after all the setup is
done.
Fixes: 1c6fdbd8f246 ("bcachefs: Initial commit")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous bug fixes and cleanups in ext4's multi-block allocator
and extent handling code"
* tag 'for-linus-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
ext4: make ext4_set_iomap() recognize IOMAP_DELALLOC map type
ext4: make ext4_map_blocks() distinguish delalloc only extent
ext4: add a hole extent entry in cache after punch
ext4: correct the hole length returned by ext4_map_blocks()
ext4: convert to exclusive lock while inserting delalloc extents
ext4: refactor ext4_da_map_blocks()
ext4: remove 'needed' in trace_ext4_discard_preallocations
ext4: remove unnecessary parameter "needed" in ext4_discard_preallocations
ext4: remove unused return value of ext4_mb_release_group_pa
ext4: remove unused return value of ext4_mb_release_inode_pa
ext4: remove unused return value of ext4_mb_release
ext4: remove unused ext4_allocation_context::ac_groups_considered
ext4: remove unneeded return value of ext4_mb_release_context
ext4: remove unused parameter ngroup in ext4_mb_choose_next_group_*()
ext4: remove unused return value of __mb_check_buddy
ext4: mark the group block bitmap as corrupted before reporting an error
ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()
ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()
ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap corrupt
ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()
...
|
|
Pull smb client fixes from Steve French:
"Five smb3 client fixes, mostly multichannel related:
- four multichannel fixes including fix for channel allocation when
multiple inactive channels, fix for unneeded race in channel
deallocation, correct redundant channel scaling, and redundant
multichannel disabling scenarios
- add warning if max compound requests reached"
* tag 'v6.8-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: increase number of PDUs allowed in a compound request
cifs: failure to add channel on iface should bump up weight
cifs: do not search for channel if server is terminating
cifs: avoid redundant calls to disable multichannel
cifs: make sure that channel scaling is done only once
|
|
Pull xfs fixes from Chandan Babu:
- Clear XFS_ATTR_INCOMPLETE filter on removing xattr from a node format
attribute fork
- Remove conditional compilation of realtime geometry validator
functions to prevent confusing error messages from being printed on
the console during the mount operation
* tag 'xfs-6.8-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove conditional building of rt geometry validator functions
xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing and eventfs fixes from Steven Rostedt:
- Fix the return code for ring_buffer_poll_wait()
It was returing a -EINVAL instead of EPOLLERR.
- Zero out the tracefs_inode so that all fields are initialized.
The ti->private could have had stale data, but instead of just
initializing it to NULL, clear out the entire structure when it is
allocated.
- Fix a crash in timerlat
The hrtimer was initialized at read and not open, but is canceled at
close. If the file was opened and never read the close will pass a
NULL pointer to hrtime_cancel().
- Rewrite of eventfs.
Linus wrote a patch series to remove the dentry references in the
eventfs_inode and to use ref counting and more of proper VFS
interfaces to make it work.
- Add warning to put_ei() if ei is not set to free. That means
something is about to free it when it shouldn't.
- Restructure the eventfs_inode to make it more compact, and remove the
unused llist field.
- Remove the fsnotify*() funtions for when the inodes were being
created in the lookup code. It doesn't make sense to notify about
creation just because something is being looked up.
- The inode hard link count was not accurate.
It was being updated when a file was looked up. The inodes of
directories were updating their parent inode hard link count every
time the inode was created. That means if memory reclaim cleaned a
stale directory inode and the inode was lookup up again, it would
increment the parent inode again as well. Al Viro said to just have
all eventfs directories have a hard link count of 1. That tells user
space not to trust it.
* tag 'trace-v6.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
eventfs: Keep all directory links at 1
eventfs: Remove fsnotify*() functions from lookup()
eventfs: Restructure eventfs_inode structure to be more condensed
eventfs: Warn if an eventfs_inode is freed without is_freed being set
tracing/timerlat: Move hrtimer_init to timerlat_fd open()
eventfs: Get rid of dentry pointers without refcounts
eventfs: Clean up dentry ops and add revalidate function
eventfs: Remove unused d_parent pointer field
tracefs: dentry lookup crapectomy
tracefs: Avoid using the ei->dentry pointer unnecessarily
eventfs: Initialize the tracefs inode properly
tracefs: Zero out the tracefs_inode when allocating it
ring-buffer: Clean ring_buffer_poll_wait() error return
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 revert from Andreas Gruenbacher:
"It turns out that the commit to use GL_NOBLOCK flag for non-blocking
lookups has several issues, and not all of them have a simple fix"
* tag 'gfs2-v6.8-rc2-revert' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
Revert "gfs2: Use GL_NOBLOCK flag for non-blocking lookups"
|
|
Commit "gfs2: Use GL_NOBLOCK flag for non-blocking lookups" has several
issues, some of which are non-trivial to fix, so revert it for now:
https://lore.kernel.org/gfs2/20240202050230.GA875515@ZenIV/T/
This reverts commit dd00aaeb343255a8a30de671bd27bde79a47c8e5.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since ext4_map_blocks() can recognize a delayed allocated only extent,
make ext4_set_iomap() can also recognize it, and remove the useless
separate check in ext4_iomap_begin_report().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-7-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Add a new map flag EXT4_MAP_DELAYED to indicate the mapping range is a
delayed allocated only (not unwritten) one, and making
ext4_map_blocks() can distinguish it, no longer mixing it with holes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In order to cache hole extents in the extent status tree and keep the
hole length as long as possible, re-add a hole entry to the cache just
after punching a hole.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-5-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In ext4_map_blocks(), if we can't find a range of mapping in the
extents cache, we are calling ext4_ext_map_blocks() to search the real
path and ext4_ext_determine_hole() to determine the hole range. But if
the querying range was partially or completely overlaped by a delalloc
extent, we can't find it in the real extent path, so the returned hole
length could be incorrect.
Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc
extent, but it searches start from the expanded hole_start, doesn't
start from the querying range, so the delalloc extent found could not be
the one that overlaped the querying range, plus, it also didn't adjust
the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle
delalloc and insert adjusted hole extent in ext4_ext_determine_hole().
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
ext4_da_map_blocks() only hold i_data_sem in shared mode and i_rwsem
when inserting delalloc extents, it could be raced by another querying
path of ext4_map_blocks() without i_rwsem, .e.g buffered read path.
Suppose we buffered read a file containing just a hole, and without any
cached extents tree, then it is raced by another delayed buffered write
to the same area or the near area belongs to the same hole, and the new
delalloc extent could be overwritten to a hole extent.
pread() pwrite()
filemap_read_folio()
ext4_mpage_readpages()
ext4_map_blocks()
down_read(i_data_sem)
ext4_ext_determine_hole()
//find hole
ext4_ext_put_gap_in_cache()
ext4_es_find_extent_range()
//no delalloc extent
ext4_da_map_blocks()
down_read(i_data_sem)
ext4_insert_delayed_block()
//insert delalloc extent
ext4_es_insert_extent()
//overwrite delalloc extent to hole
This race could lead to inconsistent delalloc extents tree and
incorrect reserved space counter. Fix this by converting to hold
i_data_sem in exclusive mode when adding a new delalloc extent in
ext4_da_map_blocks().
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Refactor and cleanup ext4_da_map_blocks(), reduce some unnecessary
parameters and branches, no logic changes.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240127015825.1608160-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fix from Namjae Jeon:
- Fix BUG in iov_iter_revert reported from syzbot
* tag 'exfat-for-6.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix zero the unwritten part for dio read
|
|
With the introduction of SMB2_OP_QUERY_WSL_EA, the client may now send
5 commands in a single compound request in order to query xattrs from
potential WSL reparse points, which should be fine as we currently
allow up to 5 PDUs in a single compound request. However, if
encryption is enabled (e.g. 'seal' mount option) or enforced by the
server, current MAX_COMPOUND(5) won't be enough as we require an extra
PDU for the transform header.
Fix this by increasing MAX_COMPOUND to 7 and, while we're at it, add
an WARN_ON_ONCE() and return -EIO instead of -ENOMEM in case we
attempt to send a compound request that couldn't include the extra
transform header.
Signed-off-by: Paulo Alcantara <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
After the interface selection policy change to do a weighted
round robin, each iface maintains a weight_fulfilled. When the
weight_fulfilled reaches the total weight for the iface, we know
that the weights can be reset and ifaces can be allocated from
scratch again.
During channel allocation failures on a particular channel,
weight_fulfilled is not incremented. If a few interfaces are
inactive, we could end up in a situation where the active
interfaces are all allocated for the total_weight, and inactive
ones are all that remain. This can cause a situation where
no more channels can be allocated further.
This change fixes it by increasing weight_fulfilled, even when
channel allocation failure happens. This could mean that if
there are temporary failures in channel allocation, the iface
weights may not strictly be adhered to. But that's still okay.
Fixes: a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
In order to scale down the channels, the following sequence
of operations happen:
1. server struct is marked for terminate
2. the channel is deallocated in the ses->chans array
3. at a later point the cifsd thread actually terminates the server
Between 2 and 3, there can be calls to find the channel for
a server struct. When that happens, there can be an ugly warning
that's logged. But this is expected.
So this change does two things:
1. in cifs_ses_get_chan_index, if server->terminate is set, return
2. always make sure server->terminate is set with chan_lock held
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
When the server reports query network interface info call
as unsupported following a tree connect, it means that
multichannel is unsupported, even if the server capabilities
report otherwise.
When this happens, cifs_chan_skip_or_disable is called to
disable multichannel on the client. However, we only need
to call this when multichannel is currently setup.
Fixes: f591062bdbf4 ("cifs: handle servers that still advertise multichannel after disabling")
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
The directory link count in eventfs was somewhat bogus. It was only being
updated when a directory child was being looked up and not on creation.
One solution would be to update in get_attr() the link count by iterating
the ei->children list and then adding 2. But that could slow down simple
stat() calls, especially if it's done on all directories in eventfs.
Another solution would be to add a parent pointer to the eventfs_inode
and keep track of the number of sub directories it has on creation. But
this adds overhead for something not really worthwhile.
The solution decided upon is to keep all directory links in eventfs as 1.
This tells user space not to rely on the hard links of directories. Which
in this case it shouldn't.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.339968298@goodmis.org
Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The dentries and inodes are created when referenced in the lookup code.
There's no reason to call fsnotify_*() functions when they are created by
a reference. It doesn't make any sense.
Link: https://lore.kernel.org/linux-trace-kernel/20240201002719.GS2087318@ZenIV/
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.166973329@goodmis.org
Cc: stable@vger.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Fixes: a376007917776 ("eventfs: Implement functions to create files and dirs when accessed");
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Some of the eventfs_inode structure has holes in it. Rework the structure
to be a bit more condensed, and also remove the no longer used llist
field.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161617.002321438@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There should never be a case where an evenfs_inode is being freed without
is_freed being set. Add a WARN_ON_ONCE() if it ever happens. That would
mean there was one too many put_ei()s.
Link: https://lore.kernel.org/linux-trace-kernel/20240201161616.843551963@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The eventfs inode had pointers to dentries (and child dentries) without
actually holding a refcount on said pointer. That is fundamentally
broken, and while eventfs tried to then maintain coherence with dentries
going away by hooking into the '.d_iput' callback, that doesn't actually
work since it's not ordered wrt lookups.
There were two reasonms why eventfs tried to keep a pointer to a dentry:
- the creation of a 'events' directory would actually have a stable
dentry pointer that it created with tracefs_start_creating().
And it needed that dentry when tearing it all down again in
eventfs_remove_events_dir().
This use is actually ok, because the special top-level events
directory dentries are actually stable, not just a temporary cache of
the eventfs data structures.
- the 'eventfs_inode' (aka ei) needs to stay around as long as there
are dentries that refer to it.
It then used these dentry pointers as a replacement for doing
reference counting: it would try to make sure that there was only
ever one dentry associated with an event_inode, and keep a child
dentry array around to see which dentries might still refer to the
parent ei.
This gets rid of the invalid dentry pointer use, and renames the one
valid case to a different name to make it clear that it's not just any
random dentry.
The magic child dentry array that is kind of a "reverse reference list"
is simply replaced by having child dentries take a ref to the ei. As
does the directory dentries. That makes the broken use case go away.
Link: https://lore.kernel.org/linux-trace-kernel/202401291043.e62e89dc-oliver.sang@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20240131185513.280463000@goodmis.org
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Ajay Kaher <ajay.kaher@broadcom.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: c1504e510238 ("eventfs: Implement eventfs dir creation functions")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|