kernel - Hosts the 0x221E linux distro kernel.

Age	Commit message (Collapse)	Author
2025-01-17	dm-table: atomic writes support	John Garry
	Support stacking atomic write limits for DM devices. All the pre-existing code in blk_stack_atomic_writes_limits() already takes care of finding the aggregrate limits from the bottom devices. Feature flag DM_TARGET_ATOMIC_WRITES is introduced so that atomic writes can be enabled on personalities selectively. This is to ensure that atomic writes are only enabled when verified to be working properly (for a specific personality). In addition, it just may not make sense to enable atomic writes on some personalities (so this flag also helps there). Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-01-16	net: mdio: add definition for clock stop capable bit	Russell King (Oracle)
	Add a definition for the clock stop capable bit in the PCS MMD. This bit indicates whether the MAC is able to stop the transmit xMII clock while it is signalling LPI. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E1tYADb-0014PV-6T@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16	PCI: Add TLP Prefix reading to pcie_read_tlp_log()	Ilpo Järvinen
	pcie_read_tlp_log() handles only 4 Header Log DWORDs but TLP Prefix Log (PCIe r6.1 secs 7.8.4.12 & 7.9.14.13) may also be present. Generalize pcie_read_tlp_log() and struct pcie_tlp_log to also handle TLP Prefix Log. The relevant registers are formatted identically in AER and DPC Capability, but has these variations: a) The offsets of TLP Prefix Log registers vary. b) DPC RP PIO TLP Prefix Log register can be < 4 DWORDs. c) AER TLP Prefix Log Present (PCIe r6.1 sec 7.8.4.7) can indicate Prefix Log is not present. Therefore callers must pass the offset of the TLP Prefix Log register and the entire length to pcie_read_tlp_log() to be able to read the correct number of TLP Prefix DWORDs from the correct offset. Link: https://lore.kernel.org/r/20250114170840.1633-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: squash ternary fix from https://lore.kernel.org/r/20250116172019.88116-1-colin.i.king@gmail.com] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-01-15	net: ethtool: add support for configuring hds-thresh	Taehee Yoo
	The hds-thresh option configures the threshold value of the header-data-split. If a received packet size is larger than this threshold value, a packet will be split into header and payload. The header indicates TCP and UDP header, but it depends on driver spec. The bnxt_en driver supports HDS(Header-Data-Split) configuration at FW level, affecting TCP and UDP too. So, If hds-thresh is set, it affects UDP and TCP packets. Example: # ethtool -G <interface name> hds-thresh <value> # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256 # ethtool -g enp14s0f0np0 Ring parameters for enp14s0f0np0: Pre-set maximums: ... HDS thresh: 1023 Current hardware settings: ... TCP data split: on HDS thresh: 256 The default/min/max values are not defined in the ethtool so the drivers should define themself. The 0 value means that all TCP/UDP packets' header and payload will be split. Tested-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250114142852.3364986-3-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15	Merge tag 'loongarch-kvm-6.14' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.14 1. Clear LLBCTL if secondary mmu mapping changed. 2. Add hypercall service support for usermode VMM. This is a really small changeset, because the Chinese New Year (Spring Festival) is coming. Happy New Year!
2025-01-15	Input: allocate keycode for phone linking	Illia Ostapyshyn
	The F11 key on the new Lenovo Thinkpad T14 Gen 5, T16 Gen 3, and P14s Gen 5 laptops includes a symbol showing a smartphone and a laptop chained together. According to the user manual, it starts the Microsoft Phone Link software used to connect to Android/iOS devices and relay messages/calls or sync data. As there are no suitable keycodes for this action, introduce a new one. Signed-off-by: Illia Ostapyshyn <illia@yshyn.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://lore.kernel.org/r/20241114173930.44983-2-illia@yshyn.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-14	PCI: Store number of supported End-End TLP Prefixes	Ilpo Järvinen
	eetlp_prefix_path in the struct pci_dev tells if End-End TLP Prefixes are supported by the path or not, and the value is only calculated if CONFIG_PCI_PASID is set. The Max End-End TLP Prefixes field in the Device Capabilities Register 2 also tells how many (1-4) End-End TLP Prefixes are supported (PCIe r6.2 sec 7.5.3.15). The number of supported End-End Prefixes is useful for reading correct number of DWORDs from TLP Prefix Log register in AER capability (PCIe r6.2 sec 7.8.4.12). Replace eetlp_prefix_path with eetlp_prefix_max and determine the number of supported End-End Prefixes regardless of CONFIG_PCI_PASID so that an upcoming commit generalizing TLP Prefix Log register reading does not have to read extra DWORDs for End-End Prefixes that never will be there. The value stored into eetlp_prefix_max is directly derived from device's Max End-End TLP Prefixes and does not consider limitations imposed by bridges or the Root Port beyond supported/not supported flags. This is intentional for two reasons: 1) PCIe r6.2 spec sections 2.2.10.4 & 6.2.4.4 indicate that a TLP is malformed only if the number of prefixes exceed the number of Max End-End TLP Prefixes, which seems to be the case even if the device could never receive that many prefixes due to smaller maximum imposed by a bridge or the Root Port. If TLP parsing is later added, this distinction is significant in interpreting what is logged by the TLP Prefix Log registers and the value matching to the Malformed TLP threshold is going to be more useful. 2) TLP Prefix handling happens autonomously on a low layer and the value in eetlp_prefix_max is not programmed anywhere by the kernel (i.e., there is no limiter OS can control to prevent sending more than N TLP Prefixes). Link: https://lore.kernel.org/r/20250114170840.1633-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
2025-01-14	tcp: add LINUX_MIB_PAWS_OLD_ACK SNMP counter	Eric Dumazet
	Prior patch in the series added TCP_RFC7323_PAWS_ACK drop reason. This patch adds the corresponding SNMP counter, for folks using nstat instead of tracing for TCP diagnostics. nstat -az \| grep PAWSOldAck Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Tested-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20250113135558.3180360-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-14	Merge tag 'nf-next-25-01-11' of ↵	Paolo Abeni
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains a small batch of Netfilter/IPVS updates for net-next: 1) Remove unused genmask parameter in nf_tables_addchain() 2) Speed up reads from /proc/net/ip_vs_conn, from Florian Westphal. 3) Skip empty buckets in hashlimit to avoid atomic operations that results in false positive reports by syzbot with lockdep enabled, patch from Eric Dumazet. 4) Add conntrack event timestamps available via ctnetlink, from Florian Westphal. netfilter pull request 25-01-11 * tag 'nf-next-25-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: conntrack: add conntrack event timestamp netfilter: xt_hashlimit: htable_selective_cleanup() optimization ipvs: speed up reads from ip_vs_conn proc file netfilter: nf_tables: remove the genmask parameter ==================== Link: https://patch.msgid.link/20250111230800.67349-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-14	net: ethtool: add support for structured PHY statistics	Jakub Kicinski
	Introduce a new way to report PHY statistics in a structured and standardized format using the netlink API. This new method does not replace the old driver-specific stats, which can still be accessed with `ethtool -S <eth name>`. The structured stats are available with `ethtool -S <eth name> --all-groups`. This new method makes it easier to diagnose problems by organizing stats in a consistent and documented way. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-13	md: reintroduce md-linear	Yu Kuai
	THe md-linear is removed by commit 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") because it has been marked as deprecated for a long time. However, md-linear is used widely for underlying disks with different size, sadly we didn't know this until now, and it's true useful to create partitions and assemble multiple raid and then append one to the other. People have to use dm-linear in this case now, however, they will prefer to minimize the number of involved modules. Fixes: 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") Cc: stable@vger.kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250102112841.1227111-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13	wifi: cfg80211: Add support for controlling EPCS	Ilan Peer
	Add support for configuring Emergency Preparedness Communication Services (EPCS) for station mode. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250102161730.ea54ac94445c.I11d750188bc0871e13e86146a3b5cc048d853e69@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	wifi: cfg80211: Add support for dynamic addition/removal of links	Ilan Peer
	Add support for requesting dynamic addition/removal of links to the current MLO association. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250102161730.cef23352f2a2.I79c849974c494cb1cbf9e1b22a5d2d37395ff5ac@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	wifi: nl80211: permit userspace to pass supported selectors	Benjamin Berg
	Currently the SAE_H2E selector already exists, which needs to be implemented by the SME. As new such selectors might be added in the future, add a feature to permit userspace to report a selector as supported. If not given, the kernel should assume that userspace does support SAE_H2E. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20250101070249.fe67b871cc39.Ieb98390328927e998e612345a58b6dbc00b0e3a2@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-01-13	Merge 6.13-rc4 into char-misc-next	Greg Kroah-Hartman
	We need the IIO fixes in here as well, and it resolves a merge conflict in: drivers/iio/adc/ti-ads1119.c Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-13	Merge 6.13-rc7 into usb-next	Greg Kroah-Hartman
	We need the USB fixes in here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-12	delayacct: add delay min to record delay peak	Wang Yaxin
	Delay accounting can now calculate the average delay of processes, detect the overall system load, and also record the 'delay max' to identify potential abnormal delays. However, 'delay min' can help us identify another useful delay peak. By comparing the difference between 'delay max' and 'delay min', we can understand the optimization space for latency, providing a reference for the optimization of latency performance. Use case ========= bash-4.4# ./getdelays -d -t 242 print delayacct stats ON TGID 242 CPU count real total virtual total delay total delay average delay max delay min 39 156000000 156576579 2111069 0.054ms 0.212296ms 0.031307ms IO count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms SWAP count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms RECLAIM count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms THRASHING count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms COMPACT count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms WPCOPY count delay total delay average delay max delay min 156 11215873 0.072ms 0.207403ms 0.033913ms IRQ count delay total delay average delay max delay min 0 0 0.000ms 0.000000ms 0.000000ms Link: https://lkml.kernel.org/r/20241220173105906EOdsPhzjMLYNJJBqgz1ga@zte.com.cn Co-developed-by: Wang Yong <wang.yong12@zte.com.cn> Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Co-developed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Co-developed-by: Kun Jiang <jiang.kun2@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: tuqiang <tu.qiang35@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	delayacct: add delay max to record delay peak	Wang Yaxin
	Introduce the use cases of delay max, which can help quickly detect potential abnormal delays in the system and record the types and specific details of delay spikes. Problem ======== Delay accounting can track the average delay of processes to show system workload. However, when a process experiences a significant delay, maybe a delay spike, which adversely affects performance, getdelays can only display the average system delay over a period of time. Yet, average delay is unhelpful for diagnosing delay peak. It is not even possible to determine which type of delay has spiked, as this information might be masked by the average delay. Solution ========= the 'delay max' can display delay peak since the system's startup, which can record potential abnormal delays over time, including the type of delay and the maximum delay. This is helpful for quickly identifying crash caused by delay. Use case ========= bash# ./getdelays -d -p 244 print delayacct stats ON PID 244 CPU count real total virtual total delay total delay average delay max 68 192000000 213676651 705643 0.010ms 0.306381ms IO count delay total delay average delay max 0 0 0.000ms 0.000000ms SWAP count delay total delay average delay max 0 0 0.000ms 0.000000ms RECLAIM count delay total delay average delay max 0 0 0.000ms 0.000000ms THRASHING count delay total delay average delay max 0 0 0.000ms 0.000000ms COMPACT count delay total delay average delay max 0 0 0.000ms 0.000000ms WPCOPY count delay total delay average delay max 235 15648284 0.067ms 0.263842ms IRQ count delay total delay average delay max 0 0 0.000ms 0.000000ms [wang.yaxin@zte.com.cn: update docs and fix some spelling errors] Link: https://lkml.kernel.org/r/20241213192700771XKZ8H30OtHSeziGqRVMs0@zte.com.cn Link: https://lkml.kernel.org/r/20241203164848805CS62CQPQWG9GLdQj2_BxS@zte.com.cn Co-developed-by: Wang Yong <wang.yong12@zte.com.cn> Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Co-developed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Co-developed-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Fan Yu <fan.yu9@zte.com.cn> Cc: Peilin He <he.peilin@zte.com.cn> Cc: tuqiang <tu.qiang35@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: ye xingchen <ye.xingchen@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-10	io_uring: expose read/write attribute capability	Anuj Gupta
	After commit 9a213d3b80c0, we can pass additional attributes along with read/write. However, userspace doesn't know that. Add a new feature flag IORING_FEAT_RW_ATTR, to notify the userspace that the kernel has this ability. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20241205062109.1788-1-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-10	Merge tag 'ipsec-next-2025-01-09' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-2025-01-09 1) Implement the AGGFRAG protocol and basic IP-TFS (RFC9347) functionality. From Christian Hopps. 2) Support ESN context update to hardware for TX. From Jianbo Liu. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-01-09	fs: add STATX_DIO_READ_ALIGN	Christoph Hellwig
	Add a separate dio read align field, as many out of place write file systems can easily do reads aligned to the device sector size, but require bigger alignment for writes. This is usually papered over by falling back to buffered I/O for smaller writes and doing read-modify-write cycles, but performance for this sucks, so applications benefit from knowing the actual write alignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250109083109.1441561-3-hch@lst.de Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09	fs: reformat the statx definition	Christoph Hellwig
	The comments after the declaration are becoming rather unreadable with long enough comments. Move them into lines of their own. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250109083109.1441561-2-hch@lst.de Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-09	netfilter: conntrack: add conntrack event timestamp	Florian Westphal
	Nadia Pinaeva writes: I am working on a tool that allows collecting network performance metrics by using conntrack events. Start time of a conntrack entry is used to evaluate seen_reply latency, therefore the sooner it is timestamped, the better the precision is. In particular, when using this tool to compare the performance of the same feature implemented using iptables/nftables/OVS it is crucial to have the entry timestamped earlier to see any difference. At this time, conntrack events can only get timestamped at recv time in userspace, so there can be some delay between the event being generated and the userspace process consuming the message. There is sys/net/netfilter/nf_conntrack_timestamp, which adds a 64bit timestamp (ns resolution) that records start and stop times, but its not suited for this either, start time is the 'hashtable insertion time', not 'conntrack allocation time'. There is concern that moving the start-time moment to conntrack allocation will add overhead in case of flooding, where conntrack entries are allocated and released right away without getting inserted into the hashtable. Also, even if this was changed it would not with events other than new (start time) and destroy (stop time). Pablo suggested to add new CTA_TIMESTAMP_EVENT, this adds this feature. The timestamp is recorded in case both events are requested and the sys/net/netfilter/nf_conntrack_timestamp toggle is enabled. Reported-by: Nadia Pinaeva <n.m.pinaeva@gmail.com> Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-09	netlink: add IPv6 anycast join/leave notifications	Yuyang Huang
	This change introduces a mechanism for notifying userspace applications about changes to IPv6 anycast addresses via netlink. It includes: * Addition and deletion of IPv6 anycast addresses are reported using RTM_NEWANYCAST and RTM_DELANYCAST. * A new netlink group (RTNLGRP_IPV6_ACADDR) for subscribing to these notifications. This enables user space applications(e.g. ip monitor) to efficiently track anycast addresses through netlink messages, improving metrics collection and system monitoring. It also unlocks the potential for advanced anycast management in user space, such as hardware offload control and fine grained network control. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250107114355.1766086-1-yuyanghuang@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-08	ntsync: Introduce alertable waits.	Elizabeth Figura
	NT waits can optionally be made "alertable". This is a special channel for thread wakeup that is mildly similar to SIGIO. A thread has an internal single bit of "alerted" state, and if a thread is alerted while an alertable wait, the wait will return a special value, consume the "alerted" state, and will not consume any of its objects. Alerts are implemented using events; the user-space NT emulator is expected to create an internal ntsync event for each thread and pass that event to wait functions. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20241213193511.457338-16-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_EVENT_READ.	Elizabeth Figura
	This corresponds to the NT syscall NtQueryEvent(). This returns the signaled state of the event and whether it is manual-reset. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-15-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_MUTEX_READ.	Elizabeth Figura
	This corresponds to the NT syscall NtQueryMutant(). This returns the recursion count, owner, and abandoned state of the mutex. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-14-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_SEM_READ.	Elizabeth Figura
	This corresponds to the NT syscall NtQuerySemaphore(). This returns the current count and maximum count of the semaphore. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-13-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.	Elizabeth Figura
	This corresponds to the NT syscall NtPulseEvent(). This wakes up any waiters as if the event had been set, but does not set the event, instead resetting it if it had been signalled. Thus, for a manual-reset event, all waiters are woken, whereas for an auto-reset event, at most one waiter is woken. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20241213193511.457338-12-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_EVENT_RESET.	Elizabeth Figura
	This corresponds to the NT syscall NtResetEvent(). This sets the event to the unsignaled state, and returns its previous state. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20241213193511.457338-11-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_EVENT_SET.	Elizabeth Figura
	This corresponds to the NT syscall NtSetEvent(). This sets the event to the signaled state, and returns its previous state. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-10-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.	Elizabeth Figura
	This correspond to the NT syscall NtCreateEvent(). An NT event holds a single bit of state denoting whether it is signaled or unsignaled. There are two types of events: manual-reset and automatic-reset. When an automatic-reset event is acquired via a wait function, its state is reset to unsignaled. Manual-reset events are not affected by wait functions. Whether the event is manual-reset, and its initial state, are specified at creation time. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-9-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.	Elizabeth Figura
	This does not correspond to any NT syscall. Rather, when a thread dies, it should be called by the NT emulator for each mutex, with the TID of the dying thread. NT mutexes are robust (in the pthread sense). When an NT thread dies, any mutexes it owned are immediately released. Acquisition of those mutexes by other threads will return a special value indicating that the mutex was abandoned, like EOWNERDEAD returned from pthread_mutex_lock(), and EOWNERDEAD is indeed used here for that purpose. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-8-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.	Elizabeth Figura
	This corresponds to the NT syscall NtReleaseMutant(). This syscall decrements the mutex's recursion count by one, and returns the previous value. If the mutex is not owned by the current task, the function instead fails and returns -EPERM. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-7-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.	Elizabeth Figura
	This corresponds to the NT syscall NtCreateMutant(). An NT mutex is recursive, with a 32-bit recursion counter. When acquired via NtWaitForMultipleObjects(), the recursion counter is incremented by one. The OS records the thread which acquired it. The OS records the thread which acquired it. However, in order to keep this driver self-contained, the owning thread ID is managed by user-space, and passed as a parameter to all relevant ioctls. The initial owner and recursion count, if any, are specified when the mutex is created. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-6-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_WAIT_ALL.	Elizabeth Figura
	This is similar to NTSYNC_IOC_WAIT_ANY, but waits until all of the objects are simultaneously signaled, and then acquires all of them as a single atomic operation. Because acquisition of multiple objects is atomic, some complex locking is required. We cannot simply spin-lock multiple objects simultaneously, as that may disable preëmption for a problematically long time. Instead, modifying any object which may be involved in a wait-all operation takes a device-wide sleeping mutex, "wait_all_lock", instead of the normal object spinlock. Because wait-for-all is a rare operation, in order to optimize wait-for-any, this lock is only taken when necessary. "all_hint" is used to mark objects which are involved in a wait-for-all operation, and if an object is not, only its spinlock is taken. The locking scheme used here was written by Peter Zijlstra. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-5-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Introduce NTSYNC_IOC_WAIT_ANY.	Elizabeth Figura
	This corresponds to part of the functionality of the NT syscall NtWaitForMultipleObjects(). Specifically, it implements the behaviour where the third argument (wait_any) is TRUE, and it does not handle alertable waits. Those features have been split out into separate patches to ease review. This patch therefore implements the wait/wake infrastructure which comprises the core of ntsync's functionality. NTSYNC_IOC_WAIT_ANY is a vectored wait function similar to poll(). Unlike poll(), it "consumes" objects when they are signaled. For semaphores, this means decreasing one from the internal counter. At most one object can be consumed by this function. This wait/wake model is fundamentally different from that used anywhere else in the kernel, and for that reason ntsync does not use any existing infrastructure, such as futexes, kernel mutexes or semaphores, or wait_event(). Up to 64 objects can be waited on at once. As soon as one is signaled, the object with the lowest index is consumed, and that index is returned via the "index" field. A timeout is supported. The timeout is passed as a u64 nanosecond value, which represents absolute time measured against either the MONOTONIC or REALTIME clock (controlled by the flags argument). If U64_MAX is passed, the ioctl waits indefinitely. This ioctl validates that all objects belong to the relevant device. This is not necessary for any technical reason related to NTSYNC_IOC_WAIT_ANY, but will be necessary for NTSYNC_IOC_WAIT_ALL introduced in the following patch. Some padding fields are added for alignment and for fields which will be added in future patches (split out to ease review). Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-4-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Rename NTSYNC_IOC_SEM_POST to NTSYNC_IOC_SEM_RELEASE.	Elizabeth Figura
	Use the more common "release" terminology, which is also the term used by NT, instead of "post" (which is used by POSIX). Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-3-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	ntsync: Return the fd from NTSYNC_IOC_CREATE_SEM.	Elizabeth Figura
	Simplify the user API a bit by returning the fd as return value from the ioctl instead of through the argument pointer. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-2-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	drivers pps: add PPS generators support	Rodolfo Giometti
	Sometimes one needs to be able not only to catch PPS signals but to produce them also. For example, running a distributed simulation, which requires computers' clock to be synchronized very tightly. This patch adds PPS generators class in order to have a well-defined interface for these devices. Signed-off-by: Rodolfo Giometti <giometti@enneenne.com> Link: https://lore.kernel.org/r/20241108073115.759039-2-giometti@enneenne.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08	vduse: relicense under GPL-2.0 OR BSD-3-Clause	Yongji Xie
	Dual-license the vduse kernel header file to dual GPL-2.0 OR BSD-3-Clause license to make it possible to ship it with DPDK (under BSD-3-Clause) for older distros. Signed-off-by: Yongji Xie <xieyongji@bytedance.com> Message-Id: <20241119074238.38299-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-07	Merge tag 'for-netdev' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-01-07 We've added 7 non-merge commits during the last 32 day(s) which contain a total of 11 files changed, 190 insertions(+), 103 deletions(-). The main changes are: 1) Migrate the test_xdp_meta.sh BPF selftest into test_progs framework, from Bastien Curutchet. 2) Add ability to configure head/tailroom for netkit devices, from Daniel Borkmann. 3) Fixes and improvements to the xdp_hw_metadata selftest, from Song Yoong Siang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Extend netkit tests to validate set {head,tail}room netkit: Add add netkit {head,tail}room to rt_link.yaml netkit: Allow for configuring needed_{head,tail}room selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c selftests/bpf: test_xdp_meta: Rename BPF sections selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata ==================== Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-06	netkit: Allow for configuring needed_{head,tail}room	Daniel Borkmann
	Allow the user to configure needed_{head,tail}room for both netkit devices. The idea is similar to 163e529200af ("veth: implement ndo_set_rx_headroom") with the difference that the two parameters can be specified upon device creation. By default the current behavior stays as is which is needed_{head,tail}room is 0. In case of Cilium, for example, the netkit devices are not enslaved into a bridge or openvswitch device (rather, BPF-based redirection is used out of tcx), and as such these parameters are not propagated into the Pod's netns via peer device. Given Cilium can run in vxlan/geneve tunneling mode (needed_headroom) and/or be used in combination with WireGuard (needed_{head,tail}room), allow the Cilium CNI plugin to specify these two upon netkit device creation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/bpf/20241220234658.490686-1-daniel@iogearbox.net
2025-01-04	Merge branch 'vfs-6.14.uncached_buffered_io'	Christian Brauner
	Bring in the VFS changes for uncached buffered io. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-04	fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag	Jens Axboe
	If a file system supports uncached buffered IO, it may set FOP_DONTCACHE and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20241220154831.1086649-8-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-6.13-rc6). No conflicts. Adjacent changes: include/linux/if_vlan.h f91a5b808938 ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK") 3f330db30638 ("net: reformat kdoc return statements") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-03	Merge tag 'net-6.13-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireles and netfilter. Nothing major here. Over the last two weeks we gathered only around two-thirds of our normal weekly fix count, but delaying sending these until -rc7 seemed like a really bad idea. AFAIK we have no bugs under investigation. One or two reverts for stuff for which we haven't gotten a proper fix will likely come in the next PR. Current release - fix to a fix: - netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext - eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup Previous releases - regressions: - net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets - mptcp: - fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust - fix TCP options overflow - prevent excessive coalescing on receive, fix throughput - net: fix memory leak in tcp_conn_request() if map insertion fails - wifi: cw1200: fix potential NULL dereference after conversion to GPIO descriptors - phy: micrel: dynamically control external clock of KSZ PHY, fix suspend behavior Previous releases - always broken: - af_packet: fix VLAN handling with MSG_PEEK - net: restrict SO_REUSEPORT to inet sockets - netdev-genl: avoid empty messages in NAPI get - dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X - eth: - gve: XDP fixes around transmit, queue wakeup etc. - ti: icssg-prueth: fix firmware load sequence to prevent time jump which breaks timesync related operations Misc: - netlink: specs: mptcp: add missing attr and improve documentation" * tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init net: ti: icssg-prueth: Fix firmware load sequence. mptcp: prevent excessive coalescing on receive mptcp: don't always assume copied data in mptcp_cleanup_rbuf() mptcp: fix recvbuffer adjust on sleeping rcvmsg ila: serialize calls to nf_register_net_hooks() af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK af_packet: fix vlan_get_tci() vs MSG_PEEK net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init() net: restrict SO_REUSEPORT to inet sockets net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets net: sfc: Correct key_len for efx_tc_ct_zone_ht_params net: wwan: t7xx: Fix FSM command timeout issue sky2: Add device ID 11ab:4373 for Marvell 88E8075 mptcp: fix TCP options overflow. net: mv643xx_eth: fix an OF node reference leak gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup eth: bcmsysport: fix call balance of priv->clk handling routines net: llc: reset skb->transport_header netlink: specs: mptcp: fix missing doc ...
2024-12-27	netlink: specs: mptcp: clearly mention attributes	Matthieu Baerts (NGI0)
	The rendered version of the MPTCP events [1] looked strange, because the whole content of the 'doc' was displayed in the same block. It was then not clear that the first words, not even ended by a period, were the attributes that are defined when such events are emitted. These attributes have now been moved to the end, prefixed by 'Attributes:' and ended with a period. Note that '>-' has been added after 'doc:' to allow ':' in the text below. The documentation in the UAPI header has been auto-generated by: ./tools/net/ynl/ynl-regen.sh Link: https://docs.kernel.org/networking/netlink_spec/mptcp_pm.html#event-type [1] Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-2-e54f2db3f844@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-27	netlink: specs: mptcp: add missing 'server-side' attr	Matthieu Baerts (NGI0)
	This attribute is added with the 'created' and 'established' events, but the documentation didn't mention it. The documentation in the UAPI header has been auto-generated by: ./tools/net/ynl/ynl-regen.sh Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-1-e54f2db3f844@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-27	Merge tag 'hardening-v6.13-rc5' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fix from Kees Cook: - stddef: make __struct_group() UAPI C++-friendly (Alexander Lobakin) * tag 'hardening-v6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: stddef: make __struct_group() UAPI C++-friendly