io_uring: improve task work cache utilization

While profiling task_work intensive workloads, I noticed that most of the time in tctx_task_work() is spending stalled on loading 'req'. This is one of the unfortunate side effects of using linked lists, particularly when they end up being passe around. Prefetch the next request, if there is one. There's a sufficient amount of work in between that this makes it available for the next loop. While fiddling with the cache layout, move the link outside of the hot completion cacheline. It's rarely used in hot workloads, so better to bring in kbuf which is used for networked loads with provided buffers. This reduces tctx_task_work() overhead from ~3% to 1-1.5% in my testing. Signed-off-by: Jens Axboe <axboe@kernel.dk>
author: Jens Axboe <axboe@kernel.dk> 2022-03-24 10:17:44 -0600
committer: Jens Axboe <axboe@kernel.dk> 2022-03-24 17:09:26 -0600
commit: 34d2bfe7d4b65b375d0edf704133a6b6970f9d81 (patch)
tree: f1386961219cc5fdae8697f05aac801b7ae95e47 /fs/io_uring.c
parent: a73825ba70c93e1eb39a845bb3d9885a787f8ffe (diff)
1 files changed, 5 insertions, 1 deletions
diff --git a/fs/io_uring.c b/fs/io_uring.c
index a76e91fe277c..bb40c80fd9ca 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -928,7 +928,6 @@ struct io_kiocb {
 	struct io_wq_work_node		comp_list;
 	atomic_t			refs;
 	atomic_t			poll_refs;
-	struct io_kiocb			*link;
 	struct io_task_work		io_task_work;
 	/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
 	struct hlist_node		hash_node;
@@ -939,6 +938,7 @@ struct io_kiocb {
 	/* custom credentials, valid IFF REQ_F_CREDS is set */
 	/* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */
 	struct io_buffer		*kbuf;
+	struct io_kiocb			*link;
 	const struct cred		*creds;
 	struct io_wq_work		work;
 };
@@ -2451,6 +2451,8 @@ static void handle_prev_tw_list(struct io_wq_work_node *node,
 		struct io_kiocb *req = container_of(node, struct io_kiocb,
 						    io_task_work.node);
 
+		prefetch(container_of(next, struct io_kiocb, io_task_work.node));
+
 		if (req->ctx != *ctx) {
 			if (unlikely(!*uring_locked && *ctx))
 				ctx_commit_and_unlock(*ctx);
@@ -2483,6 +2485,8 @@ static void handle_tw_list(struct io_wq_work_node *node,
 		struct io_kiocb *req = container_of(node, struct io_kiocb,
 						    io_task_work.node);
 
+		prefetch(container_of(next, struct io_kiocb, io_task_work.node));
+
 		if (req->ctx != *ctx) {
 			ctx_flush_and_put(*ctx, locked);
 			*ctx = req->ctx;
author	Jens Axboe <axboe@kernel.dk>	2022-03-24 10:17:44 -0600
committer	Jens Axboe <axboe@kernel.dk>	2022-03-24 17:09:26 -0600
commit	34d2bfe7d4b65b375d0edf704133a6b6970f9d81 (patch)
tree	f1386961219cc5fdae8697f05aac801b7ae95e47 /fs/io_uring.c
parent	a73825ba70c93e1eb39a845bb3d9885a787f8ffe (diff)