From 67890d579402804b1d32b3280d9860073542528e Mon Sep 17 00:00:00 2001 From: Wesley Cheng Date: Wed, 9 Apr 2025 12:47:40 -0700 Subject: ALSA: Add USB audio device jack type Add an USB jack type, in order to support notifying of a valid USB audio device. Since USB audio devices can have a slew of different configurations that reach beyond the basic headset and headphone use cases, classify these devices differently. Reviewed-by: Pierre-Louis Bossart Signed-off-by: Wesley Cheng Acked-by: Mark Brown Link: https://lore.kernel.org/r/20250409194804.3773260-8-quic_wcheng@quicinc.com Signed-off-by: Greg Kroah-Hartman --- include/uapi/linux/input-event-codes.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/uapi') diff --git a/include/uapi/linux/input-event-codes.h b/include/uapi/linux/input-event-codes.h index 5a199f3d4a26..3b2524e4b667 100644 --- a/include/uapi/linux/input-event-codes.h +++ b/include/uapi/linux/input-event-codes.h @@ -925,7 +925,8 @@ #define SW_MUTE_DEVICE 0x0e /* set = device disabled */ #define SW_PEN_INSERTED 0x0f /* set = pen inserted */ #define SW_MACHINE_COVER 0x10 /* set = cover closed */ -#define SW_MAX 0x10 +#define SW_USB_INSERT 0x11 /* set = USB audio device connected */ +#define SW_MAX 0x11 #define SW_CNT (SW_MAX+1) /* -- cgit v1.2.3 From 178af54a678d08735233e070a9329651e1589587 Mon Sep 17 00:00:00 2001 From: Krishna Chaitanya Chundru Date: Fri, 28 Mar 2025 15:58:32 +0530 Subject: PCI: Add lane equalization register offsets As per PCIe spec 6.0.1, add PCIe lane equalization register offset for data rates 8.0 GT/s, 32.0 GT/s and 64.0 GT/s. Also add a macro for defining data rate 64.0 GT/s physical layer capability ID. Signed-off-by: Krishna Chaitanya Chundru Signed-off-by: Manivannan Sadhasivam Reviewed-by: Manivannan Sadhasivam Link: https://patch.msgid.link/20250328-preset_v6-v9-4-22cfa0490518@oss.qualcomm.com --- include/uapi/linux/pci_regs.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/uapi') diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h index ba326710f9c8..a3a3e942dedf 100644 --- a/include/uapi/linux/pci_regs.h +++ b/include/uapi/linux/pci_regs.h @@ -750,7 +750,8 @@ #define PCI_EXT_CAP_ID_NPEM 0x29 /* Native PCIe Enclosure Management */ #define PCI_EXT_CAP_ID_PL_32GT 0x2A /* Physical Layer 32.0 GT/s */ #define PCI_EXT_CAP_ID_DOE 0x2E /* Data Object Exchange */ -#define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_DOE +#define PCI_EXT_CAP_ID_PL_64GT 0x31 /* Physical Layer 64.0 GT/s */ +#define PCI_EXT_CAP_ID_MAX PCI_EXT_CAP_ID_PL_64GT #define PCI_EXT_CAP_DSN_SIZEOF 12 #define PCI_EXT_CAP_MCAST_ENDPOINT_SIZEOF 40 @@ -1144,12 +1145,21 @@ #define PCI_DLF_CAP 0x04 /* Capabilities Register */ #define PCI_DLF_EXCHANGE_ENABLE 0x80000000 /* Data Link Feature Exchange Enable */ +/* Secondary PCIe Capability 8.0 GT/s */ +#define PCI_SECPCI_LE_CTRL 0x0c /* Lane Equalization Control Register */ + /* Physical Layer 16.0 GT/s */ #define PCI_PL_16GT_LE_CTRL 0x20 /* Lane Equalization Control Register */ #define PCI_PL_16GT_LE_CTRL_DSP_TX_PRESET_MASK 0x0000000F #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK 0x000000F0 #define PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT 4 +/* Physical Layer 32.0 GT/s */ +#define PCI_PL_32GT_LE_CTRL 0x20 /* Lane Equalization Control Register */ + +/* Physical Layer 64.0 GT/s */ +#define PCI_PL_64GT_LE_CTRL 0x20 /* Lane Equalization Control Register */ + /* Native PCIe Enclosure Management */ #define PCI_NPEM_CAP 0x04 /* NPEM capability register */ #define PCI_NPEM_CAP_CAPABLE 0x00000001 /* NPEM Capable */ -- cgit v1.2.3 From 7734fb4ad98c3fdaf0fde82978ef8638195a5285 Mon Sep 17 00:00:00 2001 From: Kevin Wolf Date: Tue, 29 Apr 2025 18:50:18 +0200 Subject: dm mpath: Interface for explicit probing of active paths Multipath cannot directly provide failover for ioctls in the kernel because it doesn't know what each ioctl means and which result could indicate a path error. Userspace generally knows what the ioctl it issued means and if it might be a path error, but neither does it know which path the ioctl took nor does it necessarily have the privileges to fail a path using the control device. In order to allow userspace to address this situation, implement a DM_MPATH_PROBE_PATHS ioctl that prompts the dm-mpath driver to probe all active paths in the current path group to see whether they still work, and fail them if not. If this returns success, userspace can retry the ioctl and expect that the previously hit bad path is now failed (or working again). The immediate motivation for this is the use of SG_IO in QEMU for SCSI passthrough. Following a failed SG_IO ioctl, QEMU will trigger probing to ensure that all active paths are actually alive, so that retrying SG_IO at least has a lower chance of failing due to a path error. However, the problem is broader than just SG_IO (it affects any ioctl), and if applications need failover support for other ioctls, the same probing can be used. This is not implemented on the DM control device, but on the DM mpath block devices, to allow all users who have access to such a block device to make use of this interface, specifically to implement failover for ioctls. For the same reason, it is also unprivileged. Its implementation is effectively just a bunch of reads, which could already be issued by userspace, just without any guarantee that all the rights paths are selected. The probing implemented here is done fully synchronously path by path; probing all paths concurrently is left as an improvement for the future. Co-developed-by: Hanna Czenczek Signed-off-by: Hanna Czenczek Signed-off-by: Kevin Wolf Reviewed-by: Benjamin Marzinski Signed-off-by: Benjamin Marzinski Signed-off-by: Mikulas Patocka --- drivers/md/dm-ioctl.c | 1 + drivers/md/dm-mpath.c | 100 +++++++++++++++++++++++++++++++++++++++++- include/uapi/linux/dm-ioctl.h | 9 +++- 3 files changed, 107 insertions(+), 3 deletions(-) (limited to 'include/uapi') diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c index d42eac944eb5..4165fef4c170 100644 --- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1885,6 +1885,7 @@ static ioctl_fn lookup_ioctl(unsigned int cmd, int *ioctl_flags) {DM_DEV_SET_GEOMETRY_CMD, 0, dev_set_geometry}, {DM_DEV_ARM_POLL_CMD, IOCTL_FLAGS_NO_PARAMS, dev_arm_poll}, {DM_GET_TARGET_VERSION_CMD, 0, get_target_version}, + {DM_MPATH_PROBE_PATHS_CMD, 0, NULL}, /* block device ioctl */ }; if (unlikely(cmd >= ARRAY_SIZE(_ioctls))) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 909ed6890ba5..53861ad5dd1d 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -2021,6 +2021,94 @@ out: return r; } +/* + * Perform a minimal read from the given path to find out whether the + * path still works. If a path error occurs, fail it. + */ +static int probe_path(struct pgpath *pgpath) +{ + struct block_device *bdev = pgpath->path.dev->bdev; + unsigned int read_size = bdev_logical_block_size(bdev); + struct page *page; + struct bio *bio; + blk_status_t status; + int r = 0; + + if (WARN_ON_ONCE(read_size > PAGE_SIZE)) + return -EINVAL; + + page = alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + /* Perform a minimal read: Sector 0, length read_size */ + bio = bio_alloc(bdev, 1, REQ_OP_READ, GFP_KERNEL); + if (!bio) { + r = -ENOMEM; + goto out; + } + + bio->bi_iter.bi_sector = 0; + __bio_add_page(bio, page, read_size, 0); + submit_bio_wait(bio); + status = bio->bi_status; + bio_put(bio); + + if (status && blk_path_error(status)) + fail_path(pgpath); + +out: + __free_page(page); + return r; +} + +/* + * Probe all active paths in current_pg to find out whether they still work. + * Fail all paths that do not work. + * + * Return -ENOTCONN if no valid path is left (even outside of current_pg). We + * cannot probe paths in other pgs without switching current_pg, so if valid + * paths are only in different pgs, they may or may not work. Additionally + * we should not probe paths in a pathgroup that is in the process of + * Initializing. Userspace can submit a request and we'll switch and wait + * for the pathgroup to be initialized. If the request fails, it may need to + * probe again. + */ +static int probe_active_paths(struct multipath *m) +{ + struct pgpath *pgpath; + struct priority_group *pg; + unsigned long flags; + int r = 0; + + mutex_lock(&m->work_mutex); + + spin_lock_irqsave(&m->lock, flags); + if (test_bit(MPATHF_QUEUE_IO, &m->flags)) + pg = NULL; + else + pg = m->current_pg; + spin_unlock_irqrestore(&m->lock, flags); + + if (pg) { + list_for_each_entry(pgpath, &pg->pgpaths, list) { + if (!pgpath->is_active) + continue; + + r = probe_path(pgpath); + if (r < 0) + goto out; + } + } + + if (!atomic_read(&m->nr_valid_paths)) + r = -ENOTCONN; + +out: + mutex_unlock(&m->work_mutex); + return r; +} + static int multipath_prepare_ioctl(struct dm_target *ti, struct block_device **bdev, unsigned int cmd, unsigned long arg, @@ -2031,6 +2119,16 @@ static int multipath_prepare_ioctl(struct dm_target *ti, unsigned long flags; int r; + if (_IOC_TYPE(cmd) == DM_IOCTL) { + *forward = false; + switch (cmd) { + case DM_MPATH_PROBE_PATHS: + return probe_active_paths(m); + default: + return -ENOTTY; + } + } + pgpath = READ_ONCE(m->current_pgpath); if (!pgpath || !mpath_double_check_test_bit(MPATHF_QUEUE_IO, m)) pgpath = choose_pgpath(m, 0); @@ -2182,7 +2280,7 @@ static int multipath_busy(struct dm_target *ti) */ static struct target_type multipath_target = { .name = "multipath", - .version = {1, 14, 0}, + .version = {1, 15, 0}, .features = DM_TARGET_SINGLETON | DM_TARGET_IMMUTABLE | DM_TARGET_PASSES_INTEGRITY, .module = THIS_MODULE, diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h index b08c7378164d..3225e025e30e 100644 --- a/include/uapi/linux/dm-ioctl.h +++ b/include/uapi/linux/dm-ioctl.h @@ -258,10 +258,12 @@ enum { DM_DEV_SET_GEOMETRY_CMD, DM_DEV_ARM_POLL_CMD, DM_GET_TARGET_VERSION_CMD, + DM_MPATH_PROBE_PATHS_CMD, }; #define DM_IOCTL 0xfd +/* Control device ioctls */ #define DM_VERSION _IOWR(DM_IOCTL, DM_VERSION_CMD, struct dm_ioctl) #define DM_REMOVE_ALL _IOWR(DM_IOCTL, DM_REMOVE_ALL_CMD, struct dm_ioctl) #define DM_LIST_DEVICES _IOWR(DM_IOCTL, DM_LIST_DEVICES_CMD, struct dm_ioctl) @@ -285,10 +287,13 @@ enum { #define DM_TARGET_MSG _IOWR(DM_IOCTL, DM_TARGET_MSG_CMD, struct dm_ioctl) #define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl) +/* Block device ioctls */ +#define DM_MPATH_PROBE_PATHS _IO(DM_IOCTL, DM_MPATH_PROBE_PATHS_CMD) + #define DM_VERSION_MAJOR 4 -#define DM_VERSION_MINOR 49 +#define DM_VERSION_MINOR 50 #define DM_VERSION_PATCHLEVEL 0 -#define DM_VERSION_EXTRA "-ioctl (2025-01-17)" +#define DM_VERSION_EXTRA "-ioctl (2025-04-28)" /* Status bits */ #define DM_READONLY_FLAG (1 << 0) /* In/Out */ -- cgit v1.2.3 From 80fa7a03378588582eb40f89b6f418c0c256cf24 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Tue, 20 May 2025 13:16:43 -0400 Subject: vt: bracketed paste support This is comprised of 3 aspects: - Take note of when applications advertise bracketed paste support via "\e[?2004h" and "\e[?2004l". - Insert bracketed paste markers ("\e[200~" and "\e[201~") around pasted content in paste_selection() when bracketed paste is active. - Add TIOCL_GETBRACKETEDPASTE to return bracketed paste status so user space daemons implementing cut-and-paste functionality (e.g. gpm, BRLTTY) may know when to insert bracketed paste markers. Link: https://en.wikipedia.org/wiki/Bracketed-paste Signed-off-by: Nicolas Pitre Reviewed-by: Jiri Slaby Link: https://lore.kernel.org/r/20250520171851.1219676-2-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman --- drivers/tty/vt/selection.c | 31 +++++++++++++++++++++++++++---- drivers/tty/vt/vt.c | 15 +++++++++++++++ include/linux/console_struct.h | 1 + include/uapi/linux/tiocl.h | 1 + 4 files changed, 44 insertions(+), 4 deletions(-) (limited to 'include/uapi') diff --git a/drivers/tty/vt/selection.c b/drivers/tty/vt/selection.c index 791e2f1f7c0b..24b0a53e5a79 100644 --- a/drivers/tty/vt/selection.c +++ b/drivers/tty/vt/selection.c @@ -403,6 +403,12 @@ int paste_selection(struct tty_struct *tty) DECLARE_WAITQUEUE(wait, current); int ret = 0; + bool bp = vc->vc_bracketed_paste; + static const char bracketed_paste_start[] = "\033[200~"; + static const char bracketed_paste_end[] = "\033[201~"; + const char *bps = bp ? bracketed_paste_start : NULL; + const char *bpe = bp ? bracketed_paste_end : NULL; + console_lock(); poke_blanked_console(); console_unlock(); @@ -414,7 +420,7 @@ int paste_selection(struct tty_struct *tty) add_wait_queue(&vc->paste_wait, &wait); mutex_lock(&vc_sel.lock); - while (vc_sel.buffer && vc_sel.buf_len > pasted) { + while (vc_sel.buffer && (vc_sel.buf_len > pasted || bpe)) { set_current_state(TASK_INTERRUPTIBLE); if (signal_pending(current)) { ret = -EINTR; @@ -427,10 +433,27 @@ int paste_selection(struct tty_struct *tty) continue; } __set_current_state(TASK_RUNNING); + + if (bps) { + bps += tty_ldisc_receive_buf(ld, bps, NULL, strlen(bps)); + if (*bps != '\0') + continue; + bps = NULL; + } + count = vc_sel.buf_len - pasted; - count = tty_ldisc_receive_buf(ld, vc_sel.buffer + pasted, NULL, - count); - pasted += count; + if (count) { + pasted += tty_ldisc_receive_buf(ld, vc_sel.buffer + pasted, + NULL, count); + if (vc_sel.buf_len > pasted) + continue; + } + + if (bpe) { + bpe += tty_ldisc_receive_buf(ld, bpe, NULL, strlen(bpe)); + if (*bpe == '\0') + bpe = NULL; + } } mutex_unlock(&vc_sel.lock); remove_wait_queue(&vc->paste_wait, &wait); diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index efb761454166..ed39d9cb4432 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -1870,6 +1870,14 @@ int mouse_reporting(void) return vc_cons[fg_console].d->vc_report_mouse; } +/* invoked via ioctl(TIOCLINUX) */ +static int get_bracketed_paste(struct tty_struct *tty) +{ + struct vc_data *vc = tty->driver_data; + + return vc->vc_bracketed_paste; +} + enum { CSI_DEC_hl_CURSOR_KEYS = 1, /* CKM: cursor keys send ^[Ox/^[[x */ CSI_DEC_hl_132_COLUMNS = 3, /* COLM: 80/132 mode switch */ @@ -1880,6 +1888,7 @@ enum { CSI_DEC_hl_MOUSE_X10 = 9, CSI_DEC_hl_SHOW_CURSOR = 25, /* TCEM */ CSI_DEC_hl_MOUSE_VT200 = 1000, + CSI_DEC_hl_BRACKETED_PASTE = 2004, }; /* console_lock is held */ @@ -1932,6 +1941,9 @@ static void csi_DEC_hl(struct vc_data *vc, bool on_off) case CSI_DEC_hl_MOUSE_VT200: vc->vc_report_mouse = on_off ? 2 : 0; break; + case CSI_DEC_hl_BRACKETED_PASTE: + vc->vc_bracketed_paste = on_off; + break; } } @@ -2157,6 +2169,7 @@ static void reset_terminal(struct vc_data *vc, int do_clear) vc->state.charset = 0; vc->vc_need_wrap = 0; vc->vc_report_mouse = 0; + vc->vc_bracketed_paste = 0; vc->vc_utf = default_utf8; vc->vc_utf_count = 0; @@ -3483,6 +3496,8 @@ int tioclinux(struct tty_struct *tty, unsigned long arg) break; case TIOCL_BLANKEDSCREEN: return console_blanked; + case TIOCL_GETBRACKETEDPASTE: + return get_bracketed_paste(tty); default: return -EINVAL; } diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h index 20f564e98552..59b4fec5f254 100644 --- a/include/linux/console_struct.h +++ b/include/linux/console_struct.h @@ -145,6 +145,7 @@ struct vc_data { unsigned int vc_need_wrap : 1; unsigned int vc_can_do_color : 1; unsigned int vc_report_mouse : 2; + unsigned int vc_bracketed_paste : 1; unsigned char vc_utf : 1; /* Unicode UTF-8 encoding */ unsigned char vc_utf_count; int vc_utf_char; diff --git a/include/uapi/linux/tiocl.h b/include/uapi/linux/tiocl.h index b32acc229024..88faba506c3d 100644 --- a/include/uapi/linux/tiocl.h +++ b/include/uapi/linux/tiocl.h @@ -36,5 +36,6 @@ struct tiocl_selection { #define TIOCL_BLANKSCREEN 14 /* keep screen blank even if a key is pressed */ #define TIOCL_BLANKEDSCREEN 15 /* return which vt was blanked */ #define TIOCL_GETKMSGREDIRECT 17 /* get the vt the kernel messages are restricted to */ +#define TIOCL_GETBRACKETEDPASTE 18 /* get whether paste may be bracketed */ #endif /* _LINUX_TIOCL_H */ -- cgit v1.2.3 From 81cf4d7d2379df853a0cbb8486286783c7380ac3 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Tue, 20 May 2025 13:16:44 -0400 Subject: vt: add VT_GETCONSIZECSRPOS to retrieve console size and cursor position The console dimension and cursor position are available through the /dev/vcsa interface already. However the /dev/vcsa header format uses single-byte fields therefore those values are clamped to 255. As surprizing as this may seem, some people do use 240-column 67-row screens (a 1920x1080 monitor with 8x16 pixel fonts) which is getting close to the limit. Monitors with higher resolution are not uncommon these days (3840x2160 producing a 480x135 character display) and it is just a matter of time before someone with, say, a braille display using the Linux VT console and BRLTTY on such a screen reports a bug about missing and oddly misaligned screen content. Let's add VT_GETCONSIZECSRPOS for the retrieval of console size and cursor position without byte-sized limitations. The actual console size limit as encoded in vt.c is 32767x32767 so using a short here is appropriate. Then this can be used to get the cursor position when /dev/vcsa reports 255. The screen dimension may already be obtained using TIOCGWINSZ and adding the same information to VT_GETCONSIZECSRPOS might be redundant. However applications that care about cursor position also care about display size and having 2 separate system calls to obtain them separately is wasteful. Also, the cursor position can be queried by writing "\e[6n" to a tty and reading back the result but that may be done only by the actual application using that tty and not a sideline observer. Signed-off-by: Nicolas Pitre Link: https://lore.kernel.org/r/20250520171851.1219676-3-nico@fluxnic.net Signed-off-by: Greg Kroah-Hartman --- drivers/tty/vt/vt_ioctl.c | 16 ++++++++++++++++ include/uapi/linux/vt.h | 11 +++++++++++ 2 files changed, 27 insertions(+) (limited to 'include/uapi') diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c index 1f2bdd2e1cc5..61342e06970a 100644 --- a/drivers/tty/vt/vt_ioctl.c +++ b/drivers/tty/vt/vt_ioctl.c @@ -951,6 +951,22 @@ int vt_ioctl(struct tty_struct *tty, (unsigned short __user *)arg); case VT_WAITEVENT: return vt_event_wait_ioctl((struct vt_event __user *)arg); + + case VT_GETCONSIZECSRPOS: + { + struct vt_consizecsrpos concsr; + + console_lock(); + concsr.con_cols = vc->vc_cols; + concsr.con_rows = vc->vc_rows; + concsr.csr_col = vc->state.x; + concsr.csr_row = vc->state.y; + console_unlock(); + if (copy_to_user(up, &concsr, sizeof(concsr))) + return -EFAULT; + return 0; + } + default: return -ENOIOCTLCMD; } diff --git a/include/uapi/linux/vt.h b/include/uapi/linux/vt.h index e9d39c48520a..e5b0c492aa18 100644 --- a/include/uapi/linux/vt.h +++ b/include/uapi/linux/vt.h @@ -2,6 +2,8 @@ #ifndef _UAPI_LINUX_VT_H #define _UAPI_LINUX_VT_H +#include +#include /* * These constants are also useful for user-level apps (e.g., VC @@ -84,4 +86,13 @@ struct vt_setactivate { #define VT_SETACTIVATE 0x560F /* Activate and set the mode of a console */ +/* get console size and cursor position */ +struct vt_consizecsrpos { + __u16 con_rows; /* number of console rows */ + __u16 con_cols; /* number of console columns */ + __u16 csr_row; /* current cursor's row */ + __u16 csr_col; /* current cursor's column */ +}; +#define VT_GETCONSIZECSRPOS _IOR('V', 0x10, struct vt_consizecsrpos) + #endif /* _UAPI_LINUX_VT_H */ -- cgit v1.2.3 From 35ac2034db72bbbc73609aab5f05ff6e0d38fdd0 Mon Sep 17 00:00:00 2001 From: Akshay Gupta Date: Mon, 28 Apr 2025 06:30:30 +0000 Subject: misc: amd-sbi: Add support for AMD_SBI IOCTL The present sbrmi module only support reporting power via hwmon. However, AMD data center range of processors support various system management functionality using custom protocols defined in Advanced Platform Management Link (APML) specification. Register a miscdevice, which creates a device /dev/sbrmiX with an IOCTL interface for the user space to invoke the APML Mailbox protocol, which is already defined in sbrmi_mailbox_xfer(). The APML protocols depend on a set of RMI registers. Having an IOCTL as a single entry point will help in providing synchronization among these protocols as multiple transactions on RMI register set may create race condition. Support for other protocols will be added in subsequent patches. APML mailbox protocol returns additional error codes written by SMU firmware in the out-bound register 0x37. These errors include, invalid core, message not supported over platform and others. This additional error codes can be used to provide more details to user space. Open-sourced and widely used https://github.com/amd/esmi_oob_library will continue to provide user-space programmable API. Reviewed-by: Naveen Krishna Chatradhi Signed-off-by: Akshay Gupta Link: https://lore.kernel.org/r/20250428063034.2145566-7-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman --- Documentation/misc-devices/index.rst | 1 + Documentation/userspace-api/ioctl/ioctl-number.rst | 2 + drivers/misc/amd-sbi/rmi-core.c | 92 +++++++++++++++++++--- drivers/misc/amd-sbi/rmi-core.h | 15 ++-- drivers/misc/amd-sbi/rmi-hwmon.c | 13 ++- drivers/misc/amd-sbi/rmi-i2c.c | 25 +++++- include/uapi/misc/amd-apml.h | 51 ++++++++++++ 7 files changed, 167 insertions(+), 32 deletions(-) create mode 100644 include/uapi/misc/amd-apml.h (limited to 'include/uapi') diff --git a/Documentation/misc-devices/index.rst b/Documentation/misc-devices/index.rst index 8c5b226d8313..081e79415e38 100644 --- a/Documentation/misc-devices/index.rst +++ b/Documentation/misc-devices/index.rst @@ -12,6 +12,7 @@ fit into other categories. :maxdepth: 2 ad525x_dpot + amd-sbi apds990x bh1770glc c2port diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 7a1409ecc238..3191d96ea4da 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -397,6 +397,8 @@ Code Seq# Include File Comments 0xF8 all arch/x86/include/uapi/asm/amd_hsmp.h AMD HSMP EPYC system management interface driver +0xF9 00-0F uapi/misc/amd-apml.h AMD side band system management interface driver + 0xFD all linux/dm-ioctl.h 0xFE all linux/isst_if.h ==== ===== ======================================================= ================================================================ diff --git a/drivers/misc/amd-sbi/rmi-core.c b/drivers/misc/amd-sbi/rmi-core.c index 1d5e2556ab88..7d13c049c98d 100644 --- a/drivers/misc/amd-sbi/rmi-core.c +++ b/drivers/misc/amd-sbi/rmi-core.c @@ -7,7 +7,10 @@ */ #include #include +#include #include +#include +#include #include #include #include "rmi-core.h" @@ -20,15 +23,17 @@ #define TRIGGER_MAILBOX 0x01 int rmi_mailbox_xfer(struct sbrmi_data *data, - struct sbrmi_mailbox_msg *msg) + struct apml_mbox_msg *msg) { - unsigned int bytes; + unsigned int bytes, ec; int i, ret; int sw_status; u8 byte; mutex_lock(&data->lock); + msg->fw_ret_code = 0; + /* Indicate firmware a command is to be serviced */ ret = regmap_write(data->regmap, SBRMI_INBNDMSG7, START_CMD); if (ret < 0) @@ -44,8 +49,8 @@ int rmi_mailbox_xfer(struct sbrmi_data *data, * Command Data In[31:0] to SBRMI::InBndMsg_inst[4:1] * SBRMI_x3C(MSB):SBRMI_x39(LSB) */ - for (i = 0; i < 4; i++) { - byte = (msg->data_in >> i * 8) & 0xff; + for (i = 0; i < AMD_SBI_MB_DATA_SIZE; i++) { + byte = (msg->mb_in_out >> i * 8) & 0xff; ret = regmap_write(data->regmap, SBRMI_INBNDMSG1 + i, byte); if (ret < 0) goto exit_unlock; @@ -69,29 +74,90 @@ int rmi_mailbox_xfer(struct sbrmi_data *data, if (ret) goto exit_unlock; + ret = regmap_read(data->regmap, SBRMI_OUTBNDMSG7, &ec); + if (ret || ec) + goto exit_clear_alert; /* * For a read operation, the initiator (BMC) reads the firmware * response Command Data Out[31:0] from SBRMI::OutBndMsg_inst[4:1] * {SBRMI_x34(MSB):SBRMI_x31(LSB)}. */ - if (msg->read) { - for (i = 0; i < 4; i++) { - ret = regmap_read(data->regmap, - SBRMI_OUTBNDMSG1 + i, &bytes); - if (ret < 0) - goto exit_unlock; - msg->data_out |= bytes << i * 8; - } + for (i = 0; i < AMD_SBI_MB_DATA_SIZE; i++) { + ret = regmap_read(data->regmap, + SBRMI_OUTBNDMSG1 + i, &bytes); + if (ret < 0) + break; + msg->mb_in_out |= bytes << i * 8; } +exit_clear_alert: /* * BMC must write 1'b1 to SBRMI::Status[SwAlertSts] to clear the * ALERT to initiator */ ret = regmap_write(data->regmap, SBRMI_STATUS, sw_status | SW_ALERT_MASK); - + if (ec) { + ret = -EPROTOTYPE; + msg->fw_ret_code = ec; + } exit_unlock: mutex_unlock(&data->lock); return ret; } + +static int apml_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg __user *arg) +{ + struct apml_mbox_msg msg = { 0 }; + int ret; + + /* Copy the structure from user */ + if (copy_from_user(&msg, arg, sizeof(struct apml_mbox_msg))) + return -EFAULT; + + /* Mailbox protocol */ + ret = rmi_mailbox_xfer(data, &msg); + if (ret && ret != -EPROTOTYPE) + return ret; + + return copy_to_user(arg, &msg, sizeof(struct apml_mbox_msg)); +} + +static long sbrmi_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) +{ + void __user *argp = (void __user *)arg; + struct sbrmi_data *data; + + data = container_of(fp->private_data, struct sbrmi_data, sbrmi_misc_dev); + switch (cmd) { + case SBRMI_IOCTL_MBOX_CMD: + return apml_mailbox_xfer(data, argp); + default: + return -ENOTTY; + } +} + +static const struct file_operations sbrmi_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = sbrmi_ioctl, + .compat_ioctl = compat_ptr_ioctl, +}; + +int create_misc_rmi_device(struct sbrmi_data *data, + struct device *dev) +{ + data->sbrmi_misc_dev.name = devm_kasprintf(dev, + GFP_KERNEL, + "sbrmi-%x", + data->dev_static_addr); + data->sbrmi_misc_dev.minor = MISC_DYNAMIC_MINOR; + data->sbrmi_misc_dev.fops = &sbrmi_fops; + data->sbrmi_misc_dev.parent = dev; + data->sbrmi_misc_dev.nodename = devm_kasprintf(dev, + GFP_KERNEL, + "sbrmi-%x", + data->dev_static_addr); + data->sbrmi_misc_dev.mode = 0600; + + return misc_register(&data->sbrmi_misc_dev); +} diff --git a/drivers/misc/amd-sbi/rmi-core.h b/drivers/misc/amd-sbi/rmi-core.h index 3a6028306d10..8ab31c6852d1 100644 --- a/drivers/misc/amd-sbi/rmi-core.h +++ b/drivers/misc/amd-sbi/rmi-core.h @@ -6,10 +6,12 @@ #ifndef _SBRMI_CORE_H_ #define _SBRMI_CORE_H_ +#include #include #include #include #include +#include /* SB-RMI registers */ enum sbrmi_reg { @@ -48,19 +50,15 @@ enum sbrmi_msg_id { /* Each client has this additional data */ struct sbrmi_data { + struct miscdevice sbrmi_misc_dev; struct regmap *regmap; + /* Mutex locking */ struct mutex lock; u32 pwr_limit_max; + u8 dev_static_addr; }; -struct sbrmi_mailbox_msg { - u8 cmd; - bool read; - u32 data_in; - u32 data_out; -}; - -int rmi_mailbox_xfer(struct sbrmi_data *data, struct sbrmi_mailbox_msg *msg); +int rmi_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg *msg); #ifdef CONFIG_AMD_SBRMI_HWMON int create_hwmon_sensor_device(struct device *dev, struct sbrmi_data *data); #else @@ -69,4 +67,5 @@ static inline int create_hwmon_sensor_device(struct device *dev, struct sbrmi_da return 0; } #endif +int create_misc_rmi_device(struct sbrmi_data *data, struct device *dev); #endif /*_SBRMI_CORE_H_*/ diff --git a/drivers/misc/amd-sbi/rmi-hwmon.c b/drivers/misc/amd-sbi/rmi-hwmon.c index 720e800db1f0..f4f015605daa 100644 --- a/drivers/misc/amd-sbi/rmi-hwmon.c +++ b/drivers/misc/amd-sbi/rmi-hwmon.c @@ -6,6 +6,7 @@ */ #include #include +#include #include "rmi-core.h" /* Do not allow setting negative power limit */ @@ -15,7 +16,7 @@ static int sbrmi_read(struct device *dev, enum hwmon_sensor_types type, u32 attr, int channel, long *val) { struct sbrmi_data *data = dev_get_drvdata(dev); - struct sbrmi_mailbox_msg msg = { 0 }; + struct apml_mbox_msg msg = { 0 }; int ret; if (!data) @@ -24,7 +25,6 @@ static int sbrmi_read(struct device *dev, enum hwmon_sensor_types type, if (type != hwmon_power) return -EINVAL; - msg.read = true; switch (attr) { case hwmon_power_input: msg.cmd = SBRMI_READ_PKG_PWR_CONSUMPTION; @@ -35,7 +35,7 @@ static int sbrmi_read(struct device *dev, enum hwmon_sensor_types type, ret = rmi_mailbox_xfer(data, &msg); break; case hwmon_power_cap_max: - msg.data_out = data->pwr_limit_max; + msg.mb_in_out = data->pwr_limit_max; ret = 0; break; default: @@ -44,7 +44,7 @@ static int sbrmi_read(struct device *dev, enum hwmon_sensor_types type, if (ret < 0) return ret; /* hwmon power attributes are in microWatt */ - *val = (long)msg.data_out * 1000; + *val = (long)msg.mb_in_out * 1000; return ret; } @@ -52,7 +52,7 @@ static int sbrmi_write(struct device *dev, enum hwmon_sensor_types type, u32 attr, int channel, long val) { struct sbrmi_data *data = dev_get_drvdata(dev); - struct sbrmi_mailbox_msg msg = { 0 }; + struct apml_mbox_msg msg = { 0 }; if (!data) return -ENODEV; @@ -68,8 +68,7 @@ static int sbrmi_write(struct device *dev, enum hwmon_sensor_types type, val = clamp_val(val, SBRMI_PWR_MIN, data->pwr_limit_max); msg.cmd = SBRMI_WRITE_PKG_PWR_LIMIT; - msg.data_in = val; - msg.read = false; + msg.mb_in_out = val; return rmi_mailbox_xfer(data, &msg); } diff --git a/drivers/misc/amd-sbi/rmi-i2c.c b/drivers/misc/amd-sbi/rmi-i2c.c index 7a9801273a4c..f891f5af4bc6 100644 --- a/drivers/misc/amd-sbi/rmi-i2c.c +++ b/drivers/misc/amd-sbi/rmi-i2c.c @@ -38,15 +38,14 @@ static int sbrmi_enable_alert(struct sbrmi_data *data) static int sbrmi_get_max_pwr_limit(struct sbrmi_data *data) { - struct sbrmi_mailbox_msg msg = { 0 }; + struct apml_mbox_msg msg = { 0 }; int ret; msg.cmd = SBRMI_READ_PKG_MAX_PWR_LIMIT; - msg.read = true; ret = rmi_mailbox_xfer(data, &msg); if (ret < 0) return ret; - data->pwr_limit_max = msg.data_out; + data->pwr_limit_max = msg.mb_in_out; return ret; } @@ -81,8 +80,25 @@ static int sbrmi_i2c_probe(struct i2c_client *client) if (ret < 0) return ret; + data->dev_static_addr = client->addr; dev_set_drvdata(dev, data); - return create_hwmon_sensor_device(dev, data); + + ret = create_hwmon_sensor_device(dev, data); + if (ret < 0) + return ret; + return create_misc_rmi_device(data, dev); +} + +static void sbrmi_i2c_remove(struct i2c_client *client) +{ + struct sbrmi_data *data = dev_get_drvdata(&client->dev); + + misc_deregister(&data->sbrmi_misc_dev); + /* Assign fops and parent of misc dev to NULL */ + data->sbrmi_misc_dev.fops = NULL; + data->sbrmi_misc_dev.parent = NULL; + dev_info(&client->dev, "Removed sbrmi-i2c driver\n"); + return; } static const struct i2c_device_id sbrmi_id[] = { @@ -105,6 +121,7 @@ static struct i2c_driver sbrmi_driver = { .of_match_table = of_match_ptr(sbrmi_of_match), }, .probe = sbrmi_i2c_probe, + .remove = sbrmi_i2c_remove, .id_table = sbrmi_id, }; diff --git a/include/uapi/misc/amd-apml.h b/include/uapi/misc/amd-apml.h new file mode 100644 index 000000000000..a5f086f84b06 --- /dev/null +++ b/include/uapi/misc/amd-apml.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright (C) 2021-2024 Advanced Micro Devices, Inc. + */ +#ifndef _AMD_APML_H_ +#define _AMD_APML_H_ + +#include + +/* Mailbox data size for data_in and data_out */ +#define AMD_SBI_MB_DATA_SIZE 4 + +struct apml_mbox_msg { + /* + * Mailbox Message ID + */ + __u32 cmd; + /* + * [0]...[3] mailbox 32bit input/output data + */ + __u32 mb_in_out; + /* + * Error code is returned in case of soft mailbox error + */ + __u32 fw_ret_code; +}; + +/* + * AMD sideband interface base IOCTL + */ +#define SB_BASE_IOCTL_NR 0xF9 + +/** + * DOC: SBRMI_IOCTL_MBOX_CMD + * + * @Parameters + * + * @struct apml_mbox_msg + * Pointer to the &struct apml_mbox_msg that will contain the protocol + * information + * + * @Description + * IOCTL command for APML messages using generic _IOWR + * The IOCTL provides userspace access to AMD sideband mailbox protocol + * - Mailbox message read/write(0x0~0xFF) + * - returning "-EFAULT" if none of the above + * "-EPROTOTYPE" error is returned to provide additional error details + */ +#define SBRMI_IOCTL_MBOX_CMD _IOWR(SB_BASE_IOCTL_NR, 0, struct apml_mbox_msg) + +#endif /*_AMD_APML_H_*/ -- cgit v1.2.3 From bb13a84ed6b78200952b264b4d7a024b730e8246 Mon Sep 17 00:00:00 2001 From: Akshay Gupta Date: Mon, 28 Apr 2025 06:30:31 +0000 Subject: misc: amd-sbi: Add support for CPUID protocol - AMD provides custom protocol to read Processor feature capabilities and configuration information through side band. The information is accessed by providing CPUID Function, extended function and thread ID to the protocol. Undefined function returns 0. Reviewed-by: Naveen Krishna Chatradhi Signed-off-by: Akshay Gupta Link: https://lore.kernel.org/r/20250428063034.2145566-8-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman --- drivers/misc/amd-sbi/rmi-core.c | 168 ++++++++++++++++++++++++++++++++++++++++ drivers/misc/amd-sbi/rmi-core.h | 5 +- include/uapi/misc/amd-apml.h | 37 +++++++++ 3 files changed, 209 insertions(+), 1 deletion(-) (limited to 'include/uapi') diff --git a/drivers/misc/amd-sbi/rmi-core.c b/drivers/misc/amd-sbi/rmi-core.c index 7d13c049c98d..471f50f0c368 100644 --- a/drivers/misc/amd-sbi/rmi-core.c +++ b/drivers/misc/amd-sbi/rmi-core.c @@ -17,11 +17,160 @@ /* Mask for Status Register bit[1] */ #define SW_ALERT_MASK 0x2 +/* Mask to check H/W Alert status bit */ +#define HW_ALERT_MASK 0x80 /* Software Interrupt for triggering */ #define START_CMD 0x80 #define TRIGGER_MAILBOX 0x01 +/* Default message lengths as per APML command protocol */ +/* CPUID */ +#define CPUID_RD_DATA_LEN 0x8 +#define CPUID_WR_DATA_LEN 0x8 +#define CPUID_RD_REG_LEN 0xa +#define CPUID_WR_REG_LEN 0x9 + +/* CPUID MSR Command Ids */ +#define CPUID_MCA_CMD 0x73 +#define RD_CPUID_CMD 0x91 + +/* CPUID MCAMSR mask & index */ +#define CPUID_MCA_THRD_MASK GENMASK(15, 0) +#define CPUID_MCA_THRD_INDEX 32 +#define CPUID_MCA_FUNC_MASK GENMASK(31, 0) +#define CPUID_EXT_FUNC_INDEX 56 + +/* input for bulk write to CPUID protocol */ +struct cpu_msr_indata { + u8 wr_len; /* const value */ + u8 rd_len; /* const value */ + u8 proto_cmd; /* const value */ + u8 thread; /* thread number */ + union { + u8 reg_offset[4]; /* input value */ + u32 value; + } __packed; + u8 ext; /* extended function */ +}; + +/* output for bulk read from CPUID protocol */ +struct cpu_msr_outdata { + u8 num_bytes; /* number of bytes return */ + u8 status; /* Protocol status code */ + union { + u64 value; + u8 reg_data[8]; + } __packed; +}; + +static inline void prepare_cpuid_input_message(struct cpu_msr_indata *input, + u8 thread_id, u32 func, + u8 ext_func) +{ + input->rd_len = CPUID_RD_DATA_LEN; + input->wr_len = CPUID_WR_DATA_LEN; + input->proto_cmd = RD_CPUID_CMD; + input->thread = thread_id << 1; + input->value = func; + input->ext = ext_func; +} + +static int sbrmi_get_rev(struct sbrmi_data *data) +{ + unsigned int rev; + u16 offset = SBRMI_REV; + int ret; + + ret = regmap_read(data->regmap, offset, &rev); + if (ret < 0) + return ret; + + data->rev = rev; + return 0; +} + +/* Read CPUID function protocol */ +static int rmi_cpuid_read(struct sbrmi_data *data, + struct apml_cpuid_msg *msg) +{ + struct cpu_msr_indata input = {0}; + struct cpu_msr_outdata output = {0}; + int val = 0; + int ret, hw_status; + u16 thread; + + mutex_lock(&data->lock); + /* cache the rev value to identify if protocol is supported or not */ + if (!data->rev) { + ret = sbrmi_get_rev(data); + if (ret < 0) + goto exit_unlock; + } + /* CPUID protocol for REV 0x10 is not supported*/ + if (data->rev == 0x10) { + ret = -EOPNOTSUPP; + goto exit_unlock; + } + + thread = msg->cpu_in_out << CPUID_MCA_THRD_INDEX & CPUID_MCA_THRD_MASK; + + /* Thread > 127, Thread128 CS register, 1'b1 needs to be set to 1 */ + if (thread > 127) { + thread -= 128; + val = 1; + } + ret = regmap_write(data->regmap, SBRMI_THREAD128CS, val); + if (ret < 0) + goto exit_unlock; + + prepare_cpuid_input_message(&input, thread, + msg->cpu_in_out & CPUID_MCA_FUNC_MASK, + msg->cpu_in_out >> CPUID_EXT_FUNC_INDEX); + + ret = regmap_bulk_write(data->regmap, CPUID_MCA_CMD, + &input, CPUID_WR_REG_LEN); + if (ret < 0) + goto exit_unlock; + + /* + * For RMI Rev 0x20, new h/w status bit is introduced. which is used + * by firmware to indicate completion of commands (0x71, 0x72, 0x73). + * wait for the status bit to be set by the hardware before + * reading the data out. + */ + ret = regmap_read_poll_timeout(data->regmap, SBRMI_STATUS, hw_status, + hw_status & HW_ALERT_MASK, 500, 2000000); + if (ret) + goto exit_unlock; + + ret = regmap_bulk_read(data->regmap, CPUID_MCA_CMD, + &output, CPUID_RD_REG_LEN); + if (ret < 0) + goto exit_unlock; + + ret = regmap_write(data->regmap, SBRMI_STATUS, + HW_ALERT_MASK); + if (ret < 0) + goto exit_unlock; + + if (output.num_bytes != CPUID_RD_REG_LEN - 1) { + ret = -EMSGSIZE; + goto exit_unlock; + } + if (output.status) { + ret = -EPROTOTYPE; + msg->fw_ret_code = output.status; + goto exit_unlock; + } + msg->cpu_in_out = output.value; +exit_unlock: + if (ret < 0) + msg->cpu_in_out = 0; + mutex_unlock(&data->lock); + return ret; +} + int rmi_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg *msg) { @@ -123,6 +272,23 @@ static int apml_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg __use return copy_to_user(arg, &msg, sizeof(struct apml_mbox_msg)); } +static int apml_cpuid_xfer(struct sbrmi_data *data, struct apml_cpuid_msg __user *arg) +{ + struct apml_cpuid_msg msg = { 0 }; + int ret; + + /* Copy the structure from user */ + if (copy_from_user(&msg, arg, sizeof(struct apml_cpuid_msg))) + return -EFAULT; + + /* CPUID Protocol */ + ret = rmi_cpuid_read(data, &msg); + if (ret && ret != -EPROTOTYPE) + return ret; + + return copy_to_user(arg, &msg, sizeof(struct apml_cpuid_msg)); +} + static long sbrmi_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -132,6 +298,8 @@ static long sbrmi_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) switch (cmd) { case SBRMI_IOCTL_MBOX_CMD: return apml_mailbox_xfer(data, argp); + case SBRMI_IOCTL_CPUID_CMD: + return apml_cpuid_xfer(data, argp); default: return -ENOTTY; } diff --git a/drivers/misc/amd-sbi/rmi-core.h b/drivers/misc/amd-sbi/rmi-core.h index 8ab31c6852d1..975ae858e9fd 100644 --- a/drivers/misc/amd-sbi/rmi-core.h +++ b/drivers/misc/amd-sbi/rmi-core.h @@ -15,7 +15,8 @@ /* SB-RMI registers */ enum sbrmi_reg { - SBRMI_CTRL = 0x01, + SBRMI_REV, + SBRMI_CTRL, SBRMI_STATUS, SBRMI_OUTBNDMSG0 = 0x30, SBRMI_OUTBNDMSG1, @@ -34,6 +35,7 @@ enum sbrmi_reg { SBRMI_INBNDMSG6, SBRMI_INBNDMSG7, SBRMI_SW_INTERRUPT, + SBRMI_THREAD128CS = 0x4b, }; /* @@ -56,6 +58,7 @@ struct sbrmi_data { struct mutex lock; u32 pwr_limit_max; u8 dev_static_addr; + u8 rev; }; int rmi_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg *msg); diff --git a/include/uapi/misc/amd-apml.h b/include/uapi/misc/amd-apml.h index a5f086f84b06..bb57dc75758a 100644 --- a/include/uapi/misc/amd-apml.h +++ b/include/uapi/misc/amd-apml.h @@ -25,6 +25,24 @@ struct apml_mbox_msg { __u32 fw_ret_code; }; +struct apml_cpuid_msg { + /* + * CPUID input + * [0]...[3] cpuid func, + * [4][5] cpuid: thread + * [6] cpuid: ext function & read eax/ebx or ecx/edx + * [7:0] -> bits [7:4] -> ext function & + * bit [0] read eax/ebx or ecx/edx + * CPUID output + */ + __u64 cpu_in_out; + /* + * Status code for CPUID read + */ + __u32 fw_ret_code; + __u32 pad; +}; + /* * AMD sideband interface base IOCTL */ @@ -48,4 +66,23 @@ struct apml_mbox_msg { */ #define SBRMI_IOCTL_MBOX_CMD _IOWR(SB_BASE_IOCTL_NR, 0, struct apml_mbox_msg) +/** + * DOC: SBRMI_IOCTL_CPUID_CMD + * + * @Parameters + * + * @struct apml_cpuid_msg + * Pointer to the &struct apml_cpuid_msg that will contain the protocol + * information + * + * @Description + * IOCTL command for APML messages using generic _IOWR + * The IOCTL provides userspace access to AMD sideband cpuid protocol + * - CPUID protocol to get CPU details for Function/Ext Function + * at thread level + * - returning "-EFAULT" if none of the above + * "-EPROTOTYPE" error is returned to provide additional error details + */ +#define SBRMI_IOCTL_CPUID_CMD _IOWR(SB_BASE_IOCTL_NR, 1, struct apml_cpuid_msg) + #endif /*_AMD_APML_H_*/ -- cgit v1.2.3 From 69b1ba83d21c4a89f6fcfbca1d515a60df65cf9e Mon Sep 17 00:00:00 2001 From: Akshay Gupta Date: Mon, 28 Apr 2025 06:30:32 +0000 Subject: misc: amd-sbi: Add support for read MCA register protocol - AMD provides custom protocol to read Machine Check Architecture(MCA) registers over sideband. The information is accessed for range of MCA registers by passing register address and thread ID to the protocol. MCA register read command using the register address to access Core::X86::Msr::MCG_CAP which determines the number of MCA banks. Access is read-only Reviewed-by: Naveen Krishna Chatradhi Signed-off-by: Akshay Gupta Link: https://lore.kernel.org/r/20250428063034.2145566-9-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman --- drivers/misc/amd-sbi/rmi-core.c | 114 ++++++++++++++++++++++++++++++++++++++++ include/uapi/misc/amd-apml.h | 33 ++++++++++++ 2 files changed, 147 insertions(+) (limited to 'include/uapi') diff --git a/drivers/misc/amd-sbi/rmi-core.c b/drivers/misc/amd-sbi/rmi-core.c index 471f50f0c368..171d6e871373 100644 --- a/drivers/misc/amd-sbi/rmi-core.c +++ b/drivers/misc/amd-sbi/rmi-core.c @@ -30,10 +30,16 @@ #define CPUID_WR_DATA_LEN 0x8 #define CPUID_RD_REG_LEN 0xa #define CPUID_WR_REG_LEN 0x9 +/* MSR */ +#define MSR_RD_REG_LEN 0xa +#define MSR_WR_REG_LEN 0x8 +#define MSR_RD_DATA_LEN 0x8 +#define MSR_WR_DATA_LEN 0x7 /* CPUID MSR Command Ids */ #define CPUID_MCA_CMD 0x73 #define RD_CPUID_CMD 0x91 +#define RD_MCA_CMD 0x86 /* CPUID MCAMSR mask & index */ #define CPUID_MCA_THRD_MASK GENMASK(15, 0) @@ -76,6 +82,16 @@ static inline void prepare_cpuid_input_message(struct cpu_msr_indata *input, input->ext = ext_func; } +static inline void prepare_mca_msr_input_message(struct cpu_msr_indata *input, + u8 thread_id, u32 data_in) +{ + input->rd_len = MSR_RD_DATA_LEN; + input->wr_len = MSR_WR_DATA_LEN; + input->proto_cmd = RD_MCA_CMD; + input->thread = thread_id << 1; + input->value = data_in; +} + static int sbrmi_get_rev(struct sbrmi_data *data) { unsigned int rev; @@ -171,6 +187,85 @@ exit_unlock: return ret; } +/* MCA MSR protocol */ +static int rmi_mca_msr_read(struct sbrmi_data *data, + struct apml_mcamsr_msg *msg) +{ + struct cpu_msr_outdata output = {0}; + struct cpu_msr_indata input = {0}; + int ret, val = 0; + int hw_status; + u16 thread; + + mutex_lock(&data->lock); + /* cache the rev value to identify if protocol is supported or not */ + if (!data->rev) { + ret = sbrmi_get_rev(data); + if (ret < 0) + goto exit_unlock; + } + /* MCA MSR protocol for REV 0x10 is not supported*/ + if (data->rev == 0x10) { + ret = -EOPNOTSUPP; + goto exit_unlock; + } + + thread = msg->mcamsr_in_out << CPUID_MCA_THRD_INDEX & CPUID_MCA_THRD_MASK; + + /* Thread > 127, Thread128 CS register, 1'b1 needs to be set to 1 */ + if (thread > 127) { + thread -= 128; + val = 1; + } + ret = regmap_write(data->regmap, SBRMI_THREAD128CS, val); + if (ret < 0) + goto exit_unlock; + + prepare_mca_msr_input_message(&input, thread, + msg->mcamsr_in_out & CPUID_MCA_FUNC_MASK); + + ret = regmap_bulk_write(data->regmap, CPUID_MCA_CMD, + &input, MSR_WR_REG_LEN); + if (ret < 0) + goto exit_unlock; + + /* + * For RMI Rev 0x20, new h/w status bit is introduced. which is used + * by firmware to indicate completion of commands (0x71, 0x72, 0x73). + * wait for the status bit to be set by the hardware before + * reading the data out. + */ + ret = regmap_read_poll_timeout(data->regmap, SBRMI_STATUS, hw_status, + hw_status & HW_ALERT_MASK, 500, 2000000); + if (ret) + goto exit_unlock; + + ret = regmap_bulk_read(data->regmap, CPUID_MCA_CMD, + &output, MSR_RD_REG_LEN); + if (ret < 0) + goto exit_unlock; + + ret = regmap_write(data->regmap, SBRMI_STATUS, + HW_ALERT_MASK); + if (ret < 0) + goto exit_unlock; + + if (output.num_bytes != MSR_RD_REG_LEN - 1) { + ret = -EMSGSIZE; + goto exit_unlock; + } + if (output.status) { + ret = -EPROTOTYPE; + msg->fw_ret_code = output.status; + goto exit_unlock; + } + msg->mcamsr_in_out = output.value; + +exit_unlock: + mutex_unlock(&data->lock); + return ret; +} + int rmi_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg *msg) { @@ -289,6 +384,23 @@ static int apml_cpuid_xfer(struct sbrmi_data *data, struct apml_cpuid_msg __user return copy_to_user(arg, &msg, sizeof(struct apml_cpuid_msg)); } +static int apml_mcamsr_xfer(struct sbrmi_data *data, struct apml_mcamsr_msg __user *arg) +{ + struct apml_mcamsr_msg msg = { 0 }; + int ret; + + /* Copy the structure from user */ + if (copy_from_user(&msg, arg, sizeof(struct apml_mcamsr_msg))) + return -EFAULT; + + /* MCAMSR Protocol */ + ret = rmi_mca_msr_read(data, &msg); + if (ret && ret != -EPROTOTYPE) + return ret; + + return copy_to_user(arg, &msg, sizeof(struct apml_mcamsr_msg)); +} + static long sbrmi_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) { void __user *argp = (void __user *)arg; @@ -300,6 +412,8 @@ static long sbrmi_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) return apml_mailbox_xfer(data, argp); case SBRMI_IOCTL_CPUID_CMD: return apml_cpuid_xfer(data, argp); + case SBRMI_IOCTL_MCAMSR_CMD: + return apml_mcamsr_xfer(data, argp); default: return -ENOTTY; } diff --git a/include/uapi/misc/amd-apml.h b/include/uapi/misc/amd-apml.h index bb57dc75758a..f718675d3966 100644 --- a/include/uapi/misc/amd-apml.h +++ b/include/uapi/misc/amd-apml.h @@ -43,6 +43,21 @@ struct apml_cpuid_msg { __u32 pad; }; +struct apml_mcamsr_msg { + /* + * MCAMSR input + * [0]...[3] mca msr func, + * [4][5] thread + * MCAMSR output + */ + __u64 mcamsr_in_out; + /* + * Status code for MCA/MSR access + */ + __u32 fw_ret_code; + __u32 pad; +}; + /* * AMD sideband interface base IOCTL */ @@ -85,4 +100,22 @@ struct apml_cpuid_msg { */ #define SBRMI_IOCTL_CPUID_CMD _IOWR(SB_BASE_IOCTL_NR, 1, struct apml_cpuid_msg) +/** + * DOC: SBRMI_IOCTL_MCAMSR_CMD + * + * @Parameters + * + * @struct apml_mcamsr_msg + * Pointer to the &struct apml_mcamsr_msg that will contain the protocol + * information + * + * @Description + * IOCTL command for APML messages using generic _IOWR + * The IOCTL provides userspace access to AMD sideband MCAMSR protocol + * - MCAMSR protocol to get MCA bank details for Function at thread level + * - returning "-EFAULT" if none of the above + * "-EPROTOTYPE" error is returned to provide additional error details + */ +#define SBRMI_IOCTL_MCAMSR_CMD _IOWR(SB_BASE_IOCTL_NR, 2, struct apml_mcamsr_msg) + #endif /*_AMD_APML_H_*/ -- cgit v1.2.3 From cf141287b77485ed7624ac1756b85cc801748c7c Mon Sep 17 00:00:00 2001 From: Akshay Gupta Date: Mon, 28 Apr 2025 06:30:33 +0000 Subject: misc: amd-sbi: Add support for register xfer - Provide user register access over IOCTL. Both register read and write are supported. - APML interface does not provide a synchronization method. By defining, a register access path, we use APML modules and library for all APML transactions. Without having to use external tools such as i2c-tools, which may cause race conditions. Reviewed-by: Naveen Krishna Chatradhi Signed-off-by: Akshay Gupta Link: https://lore.kernel.org/r/20250428063034.2145566-10-akshay.gupta@amd.com Signed-off-by: Greg Kroah-Hartman --- drivers/misc/amd-sbi/rmi-core.c | 29 +++++++++++++++++++++++++++++ include/uapi/misc/amd-apml.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) (limited to 'include/uapi') diff --git a/drivers/misc/amd-sbi/rmi-core.c b/drivers/misc/amd-sbi/rmi-core.c index 171d6e871373..b653a21a909e 100644 --- a/drivers/misc/amd-sbi/rmi-core.c +++ b/drivers/misc/amd-sbi/rmi-core.c @@ -350,6 +350,33 @@ exit_unlock: return ret; } +static int apml_rmi_reg_xfer(struct sbrmi_data *data, + struct apml_reg_xfer_msg __user *arg) +{ + struct apml_reg_xfer_msg msg = { 0 }; + unsigned int data_read; + int ret; + + /* Copy the structure from user */ + if (copy_from_user(&msg, arg, sizeof(struct apml_reg_xfer_msg))) + return -EFAULT; + + mutex_lock(&data->lock); + if (msg.rflag) { + ret = regmap_read(data->regmap, msg.reg_addr, &data_read); + if (!ret) + msg.data_in_out = data_read; + } else { + ret = regmap_write(data->regmap, msg.reg_addr, msg.data_in_out); + } + + mutex_unlock(&data->lock); + + if (msg.rflag && !ret) + return copy_to_user(arg, &msg, sizeof(struct apml_reg_xfer_msg)); + return ret; +} + static int apml_mailbox_xfer(struct sbrmi_data *data, struct apml_mbox_msg __user *arg) { struct apml_mbox_msg msg = { 0 }; @@ -414,6 +441,8 @@ static long sbrmi_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) return apml_cpuid_xfer(data, argp); case SBRMI_IOCTL_MCAMSR_CMD: return apml_mcamsr_xfer(data, argp); + case SBRMI_IOCTL_REG_XFER_CMD: + return apml_rmi_reg_xfer(data, argp); default: return -ENOTTY; } diff --git a/include/uapi/misc/amd-apml.h b/include/uapi/misc/amd-apml.h index f718675d3966..745b3338fc06 100644 --- a/include/uapi/misc/amd-apml.h +++ b/include/uapi/misc/amd-apml.h @@ -58,6 +58,21 @@ struct apml_mcamsr_msg { __u32 pad; }; +struct apml_reg_xfer_msg { + /* + * RMI register address offset + */ + __u16 reg_addr; + /* + * Register data for read/write + */ + __u8 data_in_out; + /* + * Register read or write + */ + __u8 rflag; +}; + /* * AMD sideband interface base IOCTL */ @@ -118,4 +133,20 @@ struct apml_mcamsr_msg { */ #define SBRMI_IOCTL_MCAMSR_CMD _IOWR(SB_BASE_IOCTL_NR, 2, struct apml_mcamsr_msg) +/** + * DOC: SBRMI_IOCTL_REG_XFER_CMD + * + * @Parameters + * + * @struct apml_reg_xfer_msg + * Pointer to the &struct apml_reg_xfer_msg that will contain the protocol + * information + * + * @Description + * IOCTL command for APML messages using generic _IOWR + * The IOCTL provides userspace access to AMD sideband register xfer protocol + * - Register xfer protocol to get/set hardware register for given offset + */ +#define SBRMI_IOCTL_REG_XFER_CMD _IOWR(SB_BASE_IOCTL_NR, 3, struct apml_reg_xfer_msg) + #endif /*_AMD_APML_H_*/ -- cgit v1.2.3 From ead7f9b8de65632ef8060b84b0c55049a33cfea1 Mon Sep 17 00:00:00 2001 From: Paul Chaignon Date: Thu, 29 May 2025 12:28:35 +0200 Subject: bpf: Fix L4 csum update on IPv6 in CHECKSUM_COMPLETE In Cilium, we use bpf_csum_diff + bpf_l4_csum_replace to, among other things, update the L4 checksum after reverse SNATing IPv6 packets. That use case is however not currently supported and leads to invalid skb->csum values in some cases. This patch adds support for IPv6 address changes in bpf_l4_csum_update via a new flag. When calling bpf_l4_csum_replace in Cilium, it ends up calling inet_proto_csum_replace_by_diff: 1: void inet_proto_csum_replace_by_diff(__sum16 *sum, struct sk_buff *skb, 2: __wsum diff, bool pseudohdr) 3: { 4: if (skb->ip_summed != CHECKSUM_PARTIAL) { 5: csum_replace_by_diff(sum, diff); 6: if (skb->ip_summed == CHECKSUM_COMPLETE && pseudohdr) 7: skb->csum = ~csum_sub(diff, skb->csum); 8: } else if (pseudohdr) { 9: *sum = ~csum_fold(csum_add(diff, csum_unfold(*sum))); 10: } 11: } The bug happens when we're in the CHECKSUM_COMPLETE state. We've just updated one of the IPv6 addresses. The helper now updates the L4 header checksum on line 5. Next, it updates skb->csum on line 7. It shouldn't. For an IPv6 packet, the updates of the IPv6 address and of the L4 checksum will cancel each other. The checksums are set such that computing a checksum over the packet including its checksum will result in a sum of 0. So the same is true here when we update the L4 checksum on line 5. We'll update it as to cancel the previous IPv6 address update. Hence skb->csum should remain untouched in this case. The same bug doesn't affect IPv4 packets because, in that case, three fields are updated: the IPv4 address, the IP checksum, and the L4 checksum. The change to the IPv4 address and one of the checksums still cancel each other in skb->csum, but we're left with one checksum update and should therefore update skb->csum accordingly. That's exactly what inet_proto_csum_replace_by_diff does. This special case for IPv6 L4 checksums is also described atop inet_proto_csum_replace16, the function we should be using in this case. This patch introduces a new bpf_l4_csum_replace flag, BPF_F_IPV6, to indicate that we're updating the L4 checksum of an IPv6 packet. When the flag is set, inet_proto_csum_replace_by_diff will skip the skb->csum update. Fixes: 7d672345ed295 ("bpf: add generic bpf_csum_diff helper") Signed-off-by: Paul Chaignon Acked-by: Daniel Borkmann Link: https://patch.msgid.link/96a6bc3a443e6f0b21ff7b7834000e17fb549e05.1748509484.git.paul.chaignon@gmail.com Signed-off-by: Jakub Kicinski --- include/uapi/linux/bpf.h | 2 ++ net/core/filter.c | 5 +++-- tools/include/uapi/linux/bpf.h | 2 ++ 3 files changed, 7 insertions(+), 2 deletions(-) (limited to 'include/uapi') diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 85180e4aaa5a..0b4a2f124d11 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2056,6 +2056,7 @@ union bpf_attr { * for updates resulting in a null checksum the value is set to * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates * that the modified header field is part of the pseudo-header. + * Flag **BPF_F_IPV6** should be set for IPv6 packets. * * This helper works in combination with **bpf_csum_diff**\ (), * which does not update the checksum in-place, but offers more @@ -6072,6 +6073,7 @@ enum { BPF_F_PSEUDO_HDR = (1ULL << 4), BPF_F_MARK_MANGLED_0 = (1ULL << 5), BPF_F_MARK_ENFORCE = (1ULL << 6), + BPF_F_IPV6 = (1ULL << 7), }; /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */ diff --git a/net/core/filter.c b/net/core/filter.c index f1de7bd8b547..327ca73f9cd7 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1968,10 +1968,11 @@ BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset, bool is_pseudo = flags & BPF_F_PSEUDO_HDR; bool is_mmzero = flags & BPF_F_MARK_MANGLED_0; bool do_mforce = flags & BPF_F_MARK_ENFORCE; + bool is_ipv6 = flags & BPF_F_IPV6; __sum16 *ptr; if (unlikely(flags & ~(BPF_F_MARK_MANGLED_0 | BPF_F_MARK_ENFORCE | - BPF_F_PSEUDO_HDR | BPF_F_HDR_FIELD_MASK))) + BPF_F_PSEUDO_HDR | BPF_F_HDR_FIELD_MASK | BPF_F_IPV6))) return -EINVAL; if (unlikely(offset > 0xffff || offset & 1)) return -EFAULT; @@ -1987,7 +1988,7 @@ BPF_CALL_5(bpf_l4_csum_replace, struct sk_buff *, skb, u32, offset, if (unlikely(from != 0)) return -EINVAL; - inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo, false); + inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo, is_ipv6); break; case 2: inet_proto_csum_replace2(ptr, skb, from, to, is_pseudo); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 85180e4aaa5a..0b4a2f124d11 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2056,6 +2056,7 @@ union bpf_attr { * for updates resulting in a null checksum the value is set to * **CSUM_MANGLED_0** instead. Flag **BPF_F_PSEUDO_HDR** indicates * that the modified header field is part of the pseudo-header. + * Flag **BPF_F_IPV6** should be set for IPv6 packets. * * This helper works in combination with **bpf_csum_diff**\ (), * which does not update the checksum in-place, but offers more @@ -6072,6 +6073,7 @@ enum { BPF_F_PSEUDO_HDR = (1ULL << 4), BPF_F_MARK_MANGLED_0 = (1ULL << 5), BPF_F_MARK_ENFORCE = (1ULL << 6), + BPF_F_IPV6 = (1ULL << 7), }; /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */ -- cgit v1.2.3 From ab03a61c66149327e022bdafa5843c6f82be267e Mon Sep 17 00:00:00 2001 From: Uday Shankar Date: Thu, 29 May 2025 17:47:10 -0600 Subject: ublk: have a per-io daemon instead of a per-queue daemon Currently, ublk_drv associates to each hardware queue (hctx) a unique task (called the queue's ubq_daemon) which is allowed to issue COMMIT_AND_FETCH commands against the hctx. If any other task attempts to do so, the command fails immediately with EINVAL. When considered together with the block layer architecture, the result is that for each CPU C on the system, there is a unique ublk server thread which is allowed to handle I/O submitted on CPU C. This can lead to suboptimal performance under imbalanced load generation. For an extreme example, suppose all the load is generated on CPUs mapping to a single ublk server thread. Then that thread may be fully utilized and become the bottleneck in the system, while other ublk server threads are totally idle. This issue can also be addressed directly in the ublk server without kernel support by having threads dequeue I/Os and pass them around to ensure even load. But this solution requires inter-thread communication at least twice for each I/O (submission and completion), which is generally a bad pattern for performance. The problem gets even worse with zero copy, as more inter-thread communication would be required to have the buffer register/unregister calls to come from the correct thread. Therefore, address this issue in ublk_drv by allowing each I/O to have its own daemon task. Two I/Os in the same queue are now allowed to be serviced by different daemon tasks - this was not possible before. Imbalanced load can then be balanced across all ublk server threads by having the ublk server threads issue FETCH_REQs in a round-robin manner. As a small toy example, consider a system with a single ublk device having 2 queues, each of depth 4. A ublk server having 4 threads could issue its FETCH_REQs against this device as follows (where each entry is the qid,tag pair that the FETCH_REQ targets): ublk server thread: T0 T1 T2 T3 0,0 0,1 0,2 0,3 1,3 1,0 1,1 1,2 This setup allows for load that is concentrated on one hctx/ublk_queue to be spread out across all ublk server threads, alleviating the issue described above. Add the new UBLK_F_PER_IO_DAEMON feature to ublk_drv, which ublk servers can use to essentially test for the presence of this change and tailor their behavior accordingly. Signed-off-by: Uday Shankar Reviewed-by: Caleb Sander Mateos Reviewed-by: Ming Lei Reviewed-by: Jens Axboe Link: https://lore.kernel.org/r/20250529-ublk_task_per_io-v8-1-e9d3b119336a@purestorage.com Signed-off-by: Jens Axboe --- drivers/block/ublk_drv.c | 111 +++++++++++++++++++++--------------------- include/uapi/linux/ublk_cmd.h | 9 ++++ 2 files changed, 65 insertions(+), 55 deletions(-) (limited to 'include/uapi') diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 6f51072776f1..c637ea010d34 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -69,7 +69,8 @@ | UBLK_F_USER_RECOVERY_FAIL_IO \ | UBLK_F_UPDATE_SIZE \ | UBLK_F_AUTO_BUF_REG \ - | UBLK_F_QUIESCE) + | UBLK_F_QUIESCE \ + | UBLK_F_PER_IO_DAEMON) #define UBLK_F_ALL_RECOVERY_FLAGS (UBLK_F_USER_RECOVERY \ | UBLK_F_USER_RECOVERY_REISSUE \ @@ -166,6 +167,8 @@ struct ublk_io { /* valid if UBLK_IO_FLAG_OWNED_BY_SRV is set */ struct request *req; }; + + struct task_struct *task; }; struct ublk_queue { @@ -173,11 +176,9 @@ struct ublk_queue { int q_depth; unsigned long flags; - struct task_struct *ubq_daemon; struct ublksrv_io_desc *io_cmd_buf; bool force_abort; - bool timeout; bool canceling; bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */ unsigned short nr_io_ready; /* how many ios setup */ @@ -1099,11 +1100,6 @@ static inline struct ublk_uring_cmd_pdu *ublk_get_uring_cmd_pdu( return io_uring_cmd_to_pdu(ioucmd, struct ublk_uring_cmd_pdu); } -static inline bool ubq_daemon_is_dying(struct ublk_queue *ubq) -{ - return !ubq->ubq_daemon || ubq->ubq_daemon->flags & PF_EXITING; -} - /* todo: handle partial completion */ static inline void __ublk_complete_rq(struct request *req) { @@ -1275,13 +1271,13 @@ static void ublk_dispatch_req(struct ublk_queue *ubq, /* * Task is exiting if either: * - * (1) current != ubq_daemon. + * (1) current != io->task. * io_uring_cmd_complete_in_task() tries to run task_work - * in a workqueue if ubq_daemon(cmd's task) is PF_EXITING. + * in a workqueue if cmd's task is PF_EXITING. * * (2) current->flags & PF_EXITING. */ - if (unlikely(current != ubq->ubq_daemon || current->flags & PF_EXITING)) { + if (unlikely(current != io->task || current->flags & PF_EXITING)) { __ublk_abort_rq(ubq, req); return; } @@ -1330,24 +1326,22 @@ static void ublk_cmd_list_tw_cb(struct io_uring_cmd *cmd, { struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); struct request *rq = pdu->req_list; - struct ublk_queue *ubq = pdu->ubq; struct request *next; do { next = rq->rq_next; rq->rq_next = NULL; - ublk_dispatch_req(ubq, rq, issue_flags); + ublk_dispatch_req(rq->mq_hctx->driver_data, rq, issue_flags); rq = next; } while (rq); } -static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l) +static void ublk_queue_cmd_list(struct ublk_io *io, struct rq_list *l) { - struct request *rq = rq_list_peek(l); - struct io_uring_cmd *cmd = ubq->ios[rq->tag].cmd; + struct io_uring_cmd *cmd = io->cmd; struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); - pdu->req_list = rq; + pdu->req_list = rq_list_peek(l); rq_list_init(l); io_uring_cmd_complete_in_task(cmd, ublk_cmd_list_tw_cb); } @@ -1355,13 +1349,10 @@ static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l) static enum blk_eh_timer_return ublk_timeout(struct request *rq) { struct ublk_queue *ubq = rq->mq_hctx->driver_data; + struct ublk_io *io = &ubq->ios[rq->tag]; if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) { - if (!ubq->timeout) { - send_sig(SIGKILL, ubq->ubq_daemon, 0); - ubq->timeout = true; - } - + send_sig(SIGKILL, io->task, 0); return BLK_EH_DONE; } @@ -1429,24 +1420,25 @@ static void ublk_queue_rqs(struct rq_list *rqlist) { struct rq_list requeue_list = { }; struct rq_list submit_list = { }; - struct ublk_queue *ubq = NULL; + struct ublk_io *io = NULL; struct request *req; while ((req = rq_list_pop(rqlist))) { struct ublk_queue *this_q = req->mq_hctx->driver_data; + struct ublk_io *this_io = &this_q->ios[req->tag]; - if (ubq && ubq != this_q && !rq_list_empty(&submit_list)) - ublk_queue_cmd_list(ubq, &submit_list); - ubq = this_q; + if (io && io->task != this_io->task && !rq_list_empty(&submit_list)) + ublk_queue_cmd_list(io, &submit_list); + io = this_io; - if (ublk_prep_req(ubq, req, true) == BLK_STS_OK) + if (ublk_prep_req(this_q, req, true) == BLK_STS_OK) rq_list_add_tail(&submit_list, req); else rq_list_add_tail(&requeue_list, req); } - if (ubq && !rq_list_empty(&submit_list)) - ublk_queue_cmd_list(ubq, &submit_list); + if (!rq_list_empty(&submit_list)) + ublk_queue_cmd_list(io, &submit_list); *rqlist = requeue_list; } @@ -1474,17 +1466,6 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) /* All old ioucmds have to be completed */ ubq->nr_io_ready = 0; - /* - * old daemon is PF_EXITING, put it now - * - * It could be NULL in case of closing one quisced device. - */ - if (ubq->ubq_daemon) - put_task_struct(ubq->ubq_daemon); - /* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */ - ubq->ubq_daemon = NULL; - ubq->timeout = false; - for (i = 0; i < ubq->q_depth; i++) { struct ublk_io *io = &ubq->ios[i]; @@ -1495,6 +1476,17 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq) io->flags &= UBLK_IO_FLAG_CANCELED; io->cmd = NULL; io->addr = 0; + + /* + * old task is PF_EXITING, put it now + * + * It could be NULL in case of closing one quiesced + * device. + */ + if (io->task) { + put_task_struct(io->task); + io->task = NULL; + } } } @@ -1516,7 +1508,7 @@ static void ublk_reset_ch_dev(struct ublk_device *ub) for (i = 0; i < ub->dev_info.nr_hw_queues; i++) ublk_queue_reinit(ub, ublk_get_queue(ub, i)); - /* set to NULL, otherwise new ubq_daemon cannot mmap the io_cmd_buf */ + /* set to NULL, otherwise new tasks cannot mmap io_cmd_buf */ ub->mm = NULL; ub->nr_queues_ready = 0; ub->nr_privileged_daemon = 0; @@ -1783,6 +1775,7 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd); struct ublk_queue *ubq = pdu->ubq; struct task_struct *task; + struct ublk_io *io; if (WARN_ON_ONCE(!ubq)) return; @@ -1791,13 +1784,14 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd, return; task = io_uring_cmd_get_task(cmd); - if (WARN_ON_ONCE(task && task != ubq->ubq_daemon)) + io = &ubq->ios[pdu->tag]; + if (WARN_ON_ONCE(task && task != io->task)) return; if (!ubq->canceling) ublk_start_cancel(ubq); - WARN_ON_ONCE(ubq->ios[pdu->tag].cmd != cmd); + WARN_ON_ONCE(io->cmd != cmd); ublk_cancel_cmd(ubq, pdu->tag, issue_flags); } @@ -1930,8 +1924,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq) { ubq->nr_io_ready++; if (ublk_queue_ready(ubq)) { - ubq->ubq_daemon = current; - get_task_struct(ubq->ubq_daemon); ub->nr_queues_ready++; if (capable(CAP_SYS_ADMIN)) @@ -2084,6 +2076,7 @@ static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq, } ublk_fill_io_cmd(io, cmd, buf_addr); + WRITE_ONCE(io->task, get_task_struct(current)); ublk_mark_io_ready(ub, ubq); out: mutex_unlock(&ub->mutex); @@ -2179,6 +2172,7 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, const struct ublksrv_io_cmd *ub_cmd) { struct ublk_device *ub = cmd->file->private_data; + struct task_struct *task; struct ublk_queue *ubq; struct ublk_io *io; u32 cmd_op = cmd->cmd_op; @@ -2193,13 +2187,14 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, goto out; ubq = ublk_get_queue(ub, ub_cmd->q_id); - if (ubq->ubq_daemon && ubq->ubq_daemon != current) - goto out; if (tag >= ubq->q_depth) goto out; io = &ubq->ios[tag]; + task = READ_ONCE(io->task); + if (task && task != current) + goto out; /* there is pending io cmd, something must be wrong */ if (io->flags & UBLK_IO_FLAG_ACTIVE) { @@ -2449,9 +2444,14 @@ static void ublk_deinit_queue(struct ublk_device *ub, int q_id) { int size = ublk_queue_cmd_buf_size(ub, q_id); struct ublk_queue *ubq = ublk_get_queue(ub, q_id); + int i; + + for (i = 0; i < ubq->q_depth; i++) { + struct ublk_io *io = &ubq->ios[i]; + if (io->task) + put_task_struct(io->task); + } - if (ubq->ubq_daemon) - put_task_struct(ubq->ubq_daemon); if (ubq->io_cmd_buf) free_pages((unsigned long)ubq->io_cmd_buf, get_order(size)); } @@ -2923,7 +2923,8 @@ static int ublk_ctrl_add_dev(const struct ublksrv_ctrl_cmd *header) ub->dev_info.flags &= UBLK_F_ALL; ub->dev_info.flags |= UBLK_F_CMD_IOCTL_ENCODE | - UBLK_F_URING_CMD_COMP_IN_TASK; + UBLK_F_URING_CMD_COMP_IN_TASK | + UBLK_F_PER_IO_DAEMON; /* GET_DATA isn't needed any more with USER_COPY or ZERO COPY */ if (ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY | @@ -3188,14 +3189,14 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub, int ublksrv_pid = (int)header->data[0]; int ret = -EINVAL; - pr_devel("%s: Waiting for new ubq_daemons(nr: %d) are ready, dev id %d...\n", - __func__, ub->dev_info.nr_hw_queues, header->dev_id); - /* wait until new ubq_daemon sending all FETCH_REQ */ + pr_devel("%s: Waiting for all FETCH_REQs, dev id %d...\n", __func__, + header->dev_id); + if (wait_for_completion_interruptible(&ub->completion)) return -EINTR; - pr_devel("%s: All new ubq_daemons(nr: %d) are ready, dev id %d\n", - __func__, ub->dev_info.nr_hw_queues, header->dev_id); + pr_devel("%s: All FETCH_REQs received, dev id %d\n", __func__, + header->dev_id); mutex_lock(&ub->mutex); if (ublk_nosrv_should_stop_dev(ub)) diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index 56c7e3fc666f..77d9d6af46da 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -272,6 +272,15 @@ */ #define UBLK_F_QUIESCE (1ULL << 12) +/* + * If this feature is set, ublk_drv supports each (qid,tag) pair having + * its own independent daemon task that is responsible for handling it. + * If it is not set, daemons are per-queue instead, so for two pairs + * (qid1,tag1) and (qid2,tag2), if qid1 == qid2, then the same task must + * be responsible for handling (qid1,tag1) and (qid2,tag2). + */ +#define UBLK_F_PER_IO_DAEMON (1ULL << 13) + /* device state */ #define UBLK_S_DEV_DEAD 0 #define UBLK_S_DEV_LIVE 1 -- cgit v1.2.3 From 69a58ef4fa77759b0e0c2f79834fa51b00a50c0b Mon Sep 17 00:00:00 2001 From: Daniele Ceraolo Spurio Date: Thu, 22 May 2025 15:54:04 -0700 Subject: drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The expected flow of operations when using PXP is to query the PXP status and wait for it to transition to "ready" before attempting to create an exec_queue. This flow is followed by the Mesa driver, but there is no guarantee that an incorrectly coded (or malicious) app will not attempt to create the queue first without querying the status. Therefore, we need to clarify what the expected behavior of the queue creation ioctl is in this scenario. Currently, the ioctl always fails with an -EBUSY code no matter the error, but for consistency it is better to distinguish between "failed to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl does. Note that, while this is a change in the return code of an ioctl, the behavior of the ioctl in this particular corner case was not clearly spec'd, so no one should have been relying on it (and we know that Mesa, which is the only known userspace for this, didn't). v2: Minor rework of the doc (Rodrigo) Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio Cc: John Harrison Cc: José Roberto de Souza Reviewed-by: José Roberto de Souza Reviewed-by: John Harrison Acked-by: Rodrigo Vivi Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com (cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8) Signed-off-by: Thomas Hellström --- drivers/gpu/drm/xe/xe_pxp.c | 8 ++++++-- include/uapi/drm/xe_drm.h | 5 +++++ 2 files changed, 11 insertions(+), 2 deletions(-) (limited to 'include/uapi') diff --git a/drivers/gpu/drm/xe/xe_pxp.c b/drivers/gpu/drm/xe/xe_pxp.c index 454ea7dc08ac..b5bc15f436fa 100644 --- a/drivers/gpu/drm/xe/xe_pxp.c +++ b/drivers/gpu/drm/xe/xe_pxp.c @@ -541,10 +541,14 @@ int xe_pxp_exec_queue_add(struct xe_pxp *pxp, struct xe_exec_queue *q) */ xe_pm_runtime_get(pxp->xe); - if (!pxp_prerequisites_done(pxp)) { - ret = -EBUSY; + /* get_readiness_status() returns 0 for in-progress and 1 for done */ + ret = xe_pxp_get_readiness_status(pxp); + if (ret <= 0) { + if (!ret) + ret = -EBUSY; goto out; } + ret = 0; wait_for_idle: /* diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index 9c08738c3b91..6a702ba7817c 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -1210,6 +1210,11 @@ struct drm_xe_vm_bind { * there is no need to explicitly set that. When a queue of type * %DRM_XE_PXP_TYPE_HWDRM is created, the PXP default HWDRM session * (%XE_PXP_HWDRM_DEFAULT_SESSION) will be started, if isn't already running. + * The user is expected to query the PXP status via the query ioctl (see + * %DRM_XE_DEVICE_QUERY_PXP_STATUS) and to wait for PXP to be ready before + * attempting to create a queue with this property. When a queue is created + * before PXP is ready, the ioctl will return -EBUSY if init is still in + * progress or -EIO if init failed. * Given that going into a power-saving state kills PXP HWDRM sessions, * runtime PM will be blocked while queues of this type are alive. * All PXP queues will be killed if a PXP invalidation event occurs. -- cgit v1.2.3 From 11fcf368506d347088e613edf6cd2604d70c454f Mon Sep 17 00:00:00 2001 From: Thomas Weißschuh Date: Fri, 6 Jun 2025 10:23:57 +0200 Subject: uapi: bitops: use UAPI-safe variant of BITS_PER_LONG again MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 1e7933a575ed ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"") did not take in account that the usage of BITS_PER_LONG in __GENMASK() was changed to __BITS_PER_LONG for UAPI-safety in commit 3c7a8e190bc5 ("uapi: introduce uapi-friendly macros for GENMASK"). BITS_PER_LONG can not be used in UAPI headers as it derives from the kernel configuration and not from the current compiler invocation. When building compat userspace code or a compat vDSO its value will be incorrect. Switch back to __BITS_PER_LONG. Fixes: 1e7933a575ed ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh Signed-off-by: Yury Norov [NVIDIA] --- include/uapi/linux/bits.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/uapi') diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h index 682b406e1067..a04afef9efca 100644 --- a/include/uapi/linux/bits.h +++ b/include/uapi/linux/bits.h @@ -4,9 +4,9 @@ #ifndef _UAPI_LINUX_BITS_H #define _UAPI_LINUX_BITS_H -#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h)))) +#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (__BITS_PER_LONG - 1 - (h)))) -#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h)))) +#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h)))) #define __GENMASK_U128(h, l) \ ((_BIT128((h)) << 1) - (_BIT128(l))) -- cgit v1.2.3 From fc92099902fbf21000554678a47654b029c15a4d Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Thu, 12 Jun 2025 12:36:06 -0300 Subject: tools headers: Synchronize linux/bits.h with the kernel sources To pick up the changes in this cset: 1e7933a575ed8af4 ("uapi: Revert "bitops: avoid integer overflow in GENMASK(_ULL)"") 5b572e8a9f3dcd6e ("bits: introduce fixed-type BIT_U*()") 19408200c094858d ("bits: introduce fixed-type GENMASK_U*()") 31299a5e02112411 ("bits: add comments and newlines to #if, #else and #endif directives") This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/linux/bits.h include/linux/bits.h Please see tools/include/uapi/README for further details. Acked-by: Vincent Mailhol Cc: I Hsin Cheng Cc: Yury Norov Cc: Adrian Hunter Cc: Ian Rogers Cc: James Clark Cc: Jiri Olsa Cc: Kan Liang Cc: Lucas De Marchi Cc: Namhyung Kim Cc: Yury Norov Link: https://lore.kernel.org/r/aEr0ZJ60EbshEy6p@x1 Signed-off-by: Arnaldo Carvalho de Melo --- include/uapi/linux/bits.h | 4 ++-- tools/include/linux/bits.h | 57 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 57 insertions(+), 4 deletions(-) (limited to 'include/uapi') diff --git a/include/uapi/linux/bits.h b/include/uapi/linux/bits.h index a04afef9efca..682b406e1067 100644 --- a/include/uapi/linux/bits.h +++ b/include/uapi/linux/bits.h @@ -4,9 +4,9 @@ #ifndef _UAPI_LINUX_BITS_H #define _UAPI_LINUX_BITS_H -#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (__BITS_PER_LONG - 1 - (h)))) +#define __GENMASK(h, l) (((~_UL(0)) << (l)) & (~_UL(0) >> (BITS_PER_LONG - 1 - (h)))) -#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h)))) +#define __GENMASK_ULL(h, l) (((~_ULL(0)) << (l)) & (~_ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h)))) #define __GENMASK_U128(h, l) \ ((_BIT128((h)) << 1) - (_BIT128(l))) diff --git a/tools/include/linux/bits.h b/tools/include/linux/bits.h index 14fd0ca9a6cd..7ad056219115 100644 --- a/tools/include/linux/bits.h +++ b/tools/include/linux/bits.h @@ -12,6 +12,7 @@ #define BIT_ULL_MASK(nr) (ULL(1) << ((nr) % BITS_PER_LONG_LONG)) #define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) #define BITS_PER_BYTE 8 +#define BITS_PER_TYPE(type) (sizeof(type) * BITS_PER_BYTE) /* * Create a contiguous bitmask starting at bit position @l and ending at @@ -19,16 +20,68 @@ * GENMASK_ULL(39, 21) gives us the 64bit vector 0x000000ffffe00000. */ #if !defined(__ASSEMBLY__) + +/* + * Missing asm support + * + * GENMASK_U*() and BIT_U*() depend on BITS_PER_TYPE() which relies on sizeof(), + * something not available in asm. Nevertheless, fixed width integers is a C + * concept. Assembly code can rely on the long and long long versions instead. + */ + #include #include +#include + #define GENMASK_INPUT_CHECK(h, l) BUILD_BUG_ON_ZERO(const_true((l) > (h))) -#else + +/* + * Generate a mask for the specified type @t. Additional checks are made to + * guarantee the value returned fits in that type, relying on + * -Wshift-count-overflow compiler check to detect incompatible arguments. + * For example, all these create build errors or warnings: + * + * - GENMASK(15, 20): wrong argument order + * - GENMASK(72, 15): doesn't fit unsigned long + * - GENMASK_U32(33, 15): doesn't fit in a u32 + */ +#define GENMASK_TYPE(t, h, l) \ + ((t)(GENMASK_INPUT_CHECK(h, l) + \ + (type_max(t) << (l) & \ + type_max(t) >> (BITS_PER_TYPE(t) - 1 - (h))))) + +#define GENMASK_U8(h, l) GENMASK_TYPE(u8, h, l) +#define GENMASK_U16(h, l) GENMASK_TYPE(u16, h, l) +#define GENMASK_U32(h, l) GENMASK_TYPE(u32, h, l) +#define GENMASK_U64(h, l) GENMASK_TYPE(u64, h, l) + +/* + * Fixed-type variants of BIT(), with additional checks like GENMASK_TYPE(). The + * following examples generate compiler warnings due to -Wshift-count-overflow: + * + * - BIT_U8(8) + * - BIT_U32(-1) + * - BIT_U32(40) + */ +#define BIT_INPUT_CHECK(type, nr) \ + BUILD_BUG_ON_ZERO(const_true((nr) >= BITS_PER_TYPE(type))) + +#define BIT_TYPE(type, nr) ((type)(BIT_INPUT_CHECK(type, nr) + BIT_ULL(nr))) + +#define BIT_U8(nr) BIT_TYPE(u8, nr) +#define BIT_U16(nr) BIT_TYPE(u16, nr) +#define BIT_U32(nr) BIT_TYPE(u32, nr) +#define BIT_U64(nr) BIT_TYPE(u64, nr) + +#else /* defined(__ASSEMBLY__) */ + /* * BUILD_BUG_ON_ZERO is not available in h files included from asm files, * disable the input check if that is the case. */ #define GENMASK_INPUT_CHECK(h, l) 0 -#endif + +#endif /* !defined(__ASSEMBLY__) */ #define GENMASK(h, l) \ (GENMASK_INPUT_CHECK(h, l) + __GENMASK(h, l)) -- cgit v1.2.3 From c6d732c38f93c4aebd204a5656583142289c3a2e Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 17 Jun 2025 13:22:40 -0700 Subject: net: ethtool: remove duplicate defines for family info Commit under fixes switched to uAPI generation from the YAML spec. A number of custom defines were left behind, mostly for commands very hard to express in YAML spec. Among what was left behind was the name and version of the generic netlink family. Problem is that the codegen always outputs those values so we ended up with a duplicated, differently named set of defines. Provide naming info in YAML and remove the incorrect defines. Fixes: 8d0580c6ebdd ("ethtool: regenerate uapi header from the spec") Acked-by: Stanislav Fomichev Link: https://patch.msgid.link/20250617202240.811179-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/netlink/specs/ethtool.yaml | 3 +++ include/uapi/linux/ethtool_netlink.h | 4 ---- include/uapi/linux/ethtool_netlink_generated.h | 4 ++-- 3 files changed, 5 insertions(+), 6 deletions(-) (limited to 'include/uapi') diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml index 9f98715a6512..72a076b0e1b5 100644 --- a/Documentation/netlink/specs/ethtool.yaml +++ b/Documentation/netlink/specs/ethtool.yaml @@ -7,6 +7,9 @@ protocol: genetlink-legacy doc: Partial family for Ethtool Netlink. uapi-header: linux/ethtool_netlink_generated.h +c-family-name: ethtool-genl-name +c-version-name: ethtool-genl-version + definitions: - name: udp-tunnel-type diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h index 9ff72cfb2e98..09a75bdb6560 100644 --- a/include/uapi/linux/ethtool_netlink.h +++ b/include/uapi/linux/ethtool_netlink.h @@ -208,10 +208,6 @@ enum { ETHTOOL_A_STATS_PHY_MAX = (__ETHTOOL_A_STATS_PHY_CNT - 1) }; -/* generic netlink info */ -#define ETHTOOL_GENL_NAME "ethtool" -#define ETHTOOL_GENL_VERSION 1 - #define ETHTOOL_MCGRP_MONITOR_NAME "monitor" #endif /* _UAPI_LINUX_ETHTOOL_NETLINK_H_ */ diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h index 9a02f579de22..aa8ab5227c1e 100644 --- a/include/uapi/linux/ethtool_netlink_generated.h +++ b/include/uapi/linux/ethtool_netlink_generated.h @@ -6,8 +6,8 @@ #ifndef _UAPI_LINUX_ETHTOOL_NETLINK_GENERATED_H #define _UAPI_LINUX_ETHTOOL_NETLINK_GENERATED_H -#define ETHTOOL_FAMILY_NAME "ethtool" -#define ETHTOOL_FAMILY_VERSION 1 +#define ETHTOOL_GENL_NAME "ethtool" +#define ETHTOOL_GENL_VERSION 1 enum { ETHTOOL_UDP_TUNNEL_TYPE_VXLAN, -- cgit v1.2.3 From cf207eac06f661fb692f405d5ab8230df884ee52 Mon Sep 17 00:00:00 2001 From: Binbin Wu Date: Tue, 10 Jun 2025 10:14:20 +0800 Subject: KVM: TDX: Handle TDG.VP.VMCALL Handle TDVMCALL for GetQuote to generate a TD-Quote. GetQuote is a doorbell-like interface used by TDX guests to request VMM to generate a TD-Quote signed by a service hosting TD-Quoting Enclave operating on the host. A TDX guest passes a TD Report (TDREPORT_STRUCT) in a shared-memory area as parameter. Host VMM can access it and queue the operation for a service hosting TD-Quoting enclave. When completed, the Quote is returned via the same shared-memory area. KVM only checks the GPA from the TDX guest has the shared-bit set and drops the shared-bit before exiting to userspace to avoid bleeding the shared-bit into KVM's exit ABI. KVM forwards the request to userspace VMM (e.g. QEMU) and userspace VMM queues the operation asynchronously. KVM sets the return code according to the 'ret' field set by userspace to notify the TDX guest whether the request has been queued successfully or not. When the request has been queued successfully, the TDX guest can poll the status field in the shared-memory area to check whether the Quote generation is completed or not. When completed, the generated Quote is returned via the same buffer. Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is required to handle the KVM exit reason as the initial support for TDX, by reentering KVM to ensure that the TDVMCALL is complete. While at it, add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN. Signed-off-by: Binbin Wu Tested-by: Mikko Ylinen Acked-by: Kai Huang [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 49 +++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/vmx/tdx.c | 32 +++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 17 +++++++++++++++ 3 files changed, 97 insertions(+), 1 deletion(-) (limited to 'include/uapi') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 1bd2d42e6424..115ec3c2b641 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6645,7 +6645,8 @@ to the byte array. .. note:: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, - KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding + KVM_EXIT_EPR, KVM_EXIT_HYPERCALL, KVM_EXIT_TDX, + KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. @@ -7174,6 +7175,52 @@ The valid value for 'flags' is: - KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid in VMCS. It would run into unknown result if resume the target VM. +:: + + /* KVM_EXIT_TDX */ + struct { + __u64 flags; + __u64 nr; + union { + struct { + u64 ret; + u64 data[5]; + } unknown; + struct { + u64 ret; + u64 gpa; + u64 size; + } get_quote; + }; + } tdx; + +Process a TDVMCALL from the guest. KVM forwards select TDVMCALL based +on the Guest-Hypervisor Communication Interface (GHCI) specification; +KVM bridges these requests to the userspace VMM with minimal changes, +placing the inputs in the union and copying them back to the guest +on re-entry. + +Flags are currently always zero, whereas ``nr`` contains the TDVMCALL +number from register R11. The remaining field of the union provide the +inputs and outputs of the TDVMCALL. Currently the following values of +``nr`` are defined: + +* ``TDVMCALL_GET_QUOTE``: the guest has requested to generate a TD-Quote +signed by a service hosting TD-Quoting Enclave operating on the host. +Parameters and return value are in the ``get_quote`` field of the union. +The ``gpa`` field and ``size`` specify the guest physical address +(without the shared bit set) and the size of a shared-memory buffer, in +which the TDX guest passes a TD Report. The ``ret`` field represents +the return value of the GetQuote request. When the request has been +queued successfully, the TDX guest can poll the status field in the +shared-memory area to check whether the Quote generation is completed or +not. When completed, the generated Quote is returned via the same buffer. + +KVM may add support for more values in the future that may cause a userspace +exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case, +it will enter with output fields already valid; in the common case, the +``unknown.ret`` field of the union will be ``TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED``. +Userspace need not do anything if it does not wish to support a TDVMCALL. :: /* Fix the size of the union. */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5d100c240ab3..b619a3478983 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1465,6 +1465,36 @@ static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu) return 1; } +static int tdx_complete_simple(struct kvm_vcpu *vcpu) +{ + tdvmcall_set_return_code(vcpu, vcpu->run->tdx.unknown.ret); + return 1; +} + +static int tdx_get_quote(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + u64 gpa = tdx->vp_enter_args.r12; + u64 size = tdx->vp_enter_args.r13; + + /* The gpa of buffer must have shared bit set. */ + if (vt_is_tdx_private_gpa(vcpu->kvm, gpa)) { + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; + } + + vcpu->run->exit_reason = KVM_EXIT_TDX; + vcpu->run->tdx.flags = 0; + vcpu->run->tdx.nr = TDVMCALL_GET_QUOTE; + vcpu->run->tdx.get_quote.ret = TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED; + vcpu->run->tdx.get_quote.gpa = gpa & ~gfn_to_gpa(kvm_gfn_direct_bits(tdx->vcpu.kvm)); + vcpu->run->tdx.get_quote.size = size; + + vcpu->arch.complete_userspace_io = tdx_complete_simple; + + return 0; +} + static int handle_tdvmcall(struct kvm_vcpu *vcpu) { switch (tdvmcall_leaf(vcpu)) { @@ -1474,6 +1504,8 @@ static int handle_tdvmcall(struct kvm_vcpu *vcpu) return tdx_report_fatal_error(vcpu); case TDVMCALL_GET_TD_VM_CALL_INFO: return tdx_get_td_vm_call_info(vcpu); + case TDVMCALL_GET_QUOTE: + return tdx_get_quote(vcpu); default: break; } diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d00b85cb168c..e23e7286ad1a 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -178,6 +178,7 @@ struct kvm_xen_exit { #define KVM_EXIT_NOTIFY 37 #define KVM_EXIT_LOONGARCH_IOCSR 38 #define KVM_EXIT_MEMORY_FAULT 39 +#define KVM_EXIT_TDX 40 /* For KVM_EXIT_INTERNAL_ERROR */ /* Emulate instruction failed. */ @@ -447,6 +448,22 @@ struct kvm_run { __u64 gpa; __u64 size; } memory_fault; + /* KVM_EXIT_TDX */ + struct { + __u64 flags; + __u64 nr; + union { + struct { + __u64 ret; + __u64 data[5]; + } unknown; + struct { + __u64 ret; + __u64 gpa; + __u64 size; + } get_quote; + }; + } tdx; /* Fix the size of the union. */ char padding[256]; }; -- cgit v1.2.3 From 25e8b1dd4883e6c251c3db5b347f3c8ae4ade921 Mon Sep 17 00:00:00 2001 From: Binbin Wu Date: Tue, 10 Jun 2025 10:14:21 +0800 Subject: KVM: TDX: Exit to userspace for GetTdVmCallInfo Exit to userspace for TDG.VP.VMCALL via KVM_EXIT_TDX, to allow userspace to provide information about the support of TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API. GHCI spec defines the GHCI base TDVMCALLs: , , , , <#VE.RequestMMIO>, , , and . They must be supported by VMM to support TDX guests. For GetTdVmCallInfo - When leaf (r12) to enumerate TDVMCALL functionality is set to 0, successful execution indicates all GHCI base TDVMCALLs listed above are supported. Update the KVM TDX document with the set of the GHCI base APIs. - When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it indicates the TDX guest is querying the supported TDVMCALLs beyond the GHCI base TDVMCALLs. Exit to userspace to let userspace set the TDVMCALL sub-function bit(s) accordingly to the leaf outputs. KVM could set the TDVMCALL bit(s) supported by itself when the TDVMCALLs don't need support from userspace after returning from userspace and before entering guest. Currently, no such TDVMCALLs implemented, KVM just sets the values returned from userspace. Suggested-by: Paolo Bonzini Signed-off-by: Binbin Wu [Adjust userspace API. - Paolo] Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 10 ++++++++++ arch/x86/kvm/vmx/tdx.c | 43 ++++++++++++++++++++++++++++++++++++++---- include/uapi/linux/kvm.h | 5 +++++ 3 files changed, 54 insertions(+), 4 deletions(-) (limited to 'include/uapi') diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 115ec3c2b641..9abf93ee5f65 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7191,6 +7191,11 @@ The valid value for 'flags' is: u64 gpa; u64 size; } get_quote; + struct { + u64 ret; + u64 leaf; + u64 r11, r12, r13, r14; + } get_tdvmcall_info; }; } tdx; @@ -7216,6 +7221,11 @@ queued successfully, the TDX guest can poll the status field in the shared-memory area to check whether the Quote generation is completed or not. When completed, the generated Quote is returned via the same buffer. +* ``TDVMCALL_GET_TD_VM_CALL_INFO``: the guest has requested the support +status of TDVMCALLs. The output values for the given leaf should be +placed in fields from ``r11`` to ``r14`` of the ``get_tdvmcall_info`` +field of the union. + KVM may add support for more values in the future that may cause a userspace exit, even without calls to ``KVM_ENABLE_CAP`` or similar. In this case, it will enter with output fields already valid; in the common case, the diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b619a3478983..1ad20c273f3b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1451,18 +1451,53 @@ error: return 1; } +static int tdx_complete_get_td_vm_call_info(struct kvm_vcpu *vcpu) +{ + struct vcpu_tdx *tdx = to_tdx(vcpu); + + tdvmcall_set_return_code(vcpu, vcpu->run->tdx.get_tdvmcall_info.ret); + + /* + * For now, there is no TDVMCALL beyond GHCI base API supported by KVM + * directly without the support from userspace, just set the value + * returned from userspace. + */ + tdx->vp_enter_args.r11 = vcpu->run->tdx.get_tdvmcall_info.r11; + tdx->vp_enter_args.r12 = vcpu->run->tdx.get_tdvmcall_info.r12; + tdx->vp_enter_args.r13 = vcpu->run->tdx.get_tdvmcall_info.r13; + tdx->vp_enter_args.r14 = vcpu->run->tdx.get_tdvmcall_info.r14; + + return 1; +} + static int tdx_get_td_vm_call_info(struct kvm_vcpu *vcpu) { struct vcpu_tdx *tdx = to_tdx(vcpu); - if (tdx->vp_enter_args.r12) - tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); - else { + switch (tdx->vp_enter_args.r12) { + case 0: tdx->vp_enter_args.r11 = 0; + tdx->vp_enter_args.r12 = 0; tdx->vp_enter_args.r13 = 0; tdx->vp_enter_args.r14 = 0; + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_SUCCESS); + return 1; + case 1: + vcpu->run->tdx.get_tdvmcall_info.leaf = tdx->vp_enter_args.r12; + vcpu->run->exit_reason = KVM_EXIT_TDX; + vcpu->run->tdx.flags = 0; + vcpu->run->tdx.nr = TDVMCALL_GET_TD_VM_CALL_INFO; + vcpu->run->tdx.get_tdvmcall_info.ret = TDVMCALL_STATUS_SUCCESS; + vcpu->run->tdx.get_tdvmcall_info.r11 = 0; + vcpu->run->tdx.get_tdvmcall_info.r12 = 0; + vcpu->run->tdx.get_tdvmcall_info.r13 = 0; + vcpu->run->tdx.get_tdvmcall_info.r14 = 0; + vcpu->arch.complete_userspace_io = tdx_complete_get_td_vm_call_info; + return 0; + default: + tdvmcall_set_return_code(vcpu, TDVMCALL_STATUS_INVALID_OPERAND); + return 1; } - return 1; } static int tdx_complete_simple(struct kvm_vcpu *vcpu) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index e23e7286ad1a..37891580d05d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -462,6 +462,11 @@ struct kvm_run { __u64 gpa; __u64 size; } get_quote; + struct { + __u64 ret; + __u64 leaf; + __u64 r11, r12, r13, r14; + } get_tdvmcall_info; }; } tdx; /* Fix the size of the union. */ -- cgit v1.2.3 From 22bbc1dcd0d6785fb390c41f0dd5b5e218d23bdd Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Mon, 23 Jun 2025 12:00:53 +0200 Subject: vsock/uapi: fix linux/vm_sockets.h userspace compilation errors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If a userspace application just include will fail to build with the following errors: /usr/include/linux/vm_sockets.h:182:39: error: invalid application of ‘sizeof’ to incomplete type ‘struct sockaddr’ 182 | unsigned char svm_zero[sizeof(struct sockaddr) - | ^~~~~~ /usr/include/linux/vm_sockets.h:183:39: error: ‘sa_family_t’ undeclared here (not in a function) 183 | sizeof(sa_family_t) - | Include for userspace (guarded by ifndef __KERNEL__) where `struct sockaddr` and `sa_family_t` are defined. We already do something similar in and . Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Reported-by: Daan De Meyer Signed-off-by: Stefano Garzarella Link: https://patch.msgid.link/20250623100053.40979-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski --- include/uapi/linux/vm_sockets.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/uapi') diff --git a/include/uapi/linux/vm_sockets.h b/include/uapi/linux/vm_sockets.h index ed07181d4eff..e05280e41522 100644 --- a/include/uapi/linux/vm_sockets.h +++ b/include/uapi/linux/vm_sockets.h @@ -17,6 +17,10 @@ #ifndef _UAPI_VM_SOCKETS_H #define _UAPI_VM_SOCKETS_H +#ifndef __KERNEL__ +#include /* for struct sockaddr and sa_family_t */ +#endif + #include #include -- cgit v1.2.3 From 9e6dd4c256d0774701637b958ba682eff4991277 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 24 Jun 2025 14:09:59 -0700 Subject: netlink: specs: mptcp: replace underscores with dashes in names We're trying to add a strict regexp for the name format in the spec. Underscores will not be allowed, dashes should be used instead. This makes no difference to C (codegen, if used, replaces special chars in names) but it gives more uniform naming in Python. Fixes: bc8aeb2045e2 ("Documentation: netlink: add a YAML spec for mptcp") Reviewed-by: Davide Caratti Reviewed-by: Donald Hunter Reviewed-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20250624211002.3475021-8-kuba@kernel.org Signed-off-by: Jakub Kicinski --- Documentation/netlink/specs/mptcp_pm.yaml | 8 ++++---- include/uapi/linux/mptcp_pm.h | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) (limited to 'include/uapi') diff --git a/Documentation/netlink/specs/mptcp_pm.yaml b/Documentation/netlink/specs/mptcp_pm.yaml index dfd017780d2f..fb57860fe778 100644 --- a/Documentation/netlink/specs/mptcp_pm.yaml +++ b/Documentation/netlink/specs/mptcp_pm.yaml @@ -57,21 +57,21 @@ definitions: doc: >- A new subflow has been established. 'error' should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - daddr6, sport, dport, backup, if_idx [, error]. + daddr6, sport, dport, backup, if-idx [, error]. - name: sub-closed doc: >- A subflow has been closed. An error (copy of sk_err) could be set if an error has been detected for this subflow. Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - daddr6, sport, dport, backup, if_idx [, error]. + daddr6, sport, dport, backup, if-idx [, error]. - name: sub-priority value: 13 doc: >- The priority of a subflow has changed. 'error' should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - daddr6, sport, dport, backup, if_idx [, error]. + daddr6, sport, dport, backup, if-idx [, error]. - name: listener-created value: 15 @@ -255,7 +255,7 @@ attribute-sets: name: timeout type: u32 - - name: if_idx + name: if-idx type: u32 - name: reset-reason diff --git a/include/uapi/linux/mptcp_pm.h b/include/uapi/linux/mptcp_pm.h index 84fa8a21dfd0..6ac84b2f636c 100644 --- a/include/uapi/linux/mptcp_pm.h +++ b/include/uapi/linux/mptcp_pm.h @@ -27,14 +27,14 @@ * token, rem_id. * @MPTCP_EVENT_SUB_ESTABLISHED: A new subflow has been established. 'error' * should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | - * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error]. + * saddr6, daddr4 | daddr6, sport, dport, backup, if-idx [, error]. * @MPTCP_EVENT_SUB_CLOSED: A subflow has been closed. An error (copy of * sk_err) could be set if an error has been detected for this subflow. * Attributes: token, family, loc_id, rem_id, saddr4 | saddr6, daddr4 | - * daddr6, sport, dport, backup, if_idx [, error]. + * daddr6, sport, dport, backup, if-idx [, error]. * @MPTCP_EVENT_SUB_PRIORITY: The priority of a subflow has changed. 'error' * should not be set. Attributes: token, family, loc_id, rem_id, saddr4 | - * saddr6, daddr4 | daddr6, sport, dport, backup, if_idx [, error]. + * saddr6, daddr4 | daddr6, sport, dport, backup, if-idx [, error]. * @MPTCP_EVENT_LISTENER_CREATED: A new PM listener is created. Attributes: * family, sport, saddr4 | saddr6. * @MPTCP_EVENT_LISTENER_CLOSED: A PM listener is closed. Attributes: family, -- cgit v1.2.3