summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2021-11-02Merge tag 'net-next-for-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ...
2021-11-01Merge tag 'audit-pr-20211101' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Add some additional audit logging to capture the openat2() syscall open_how struct info. Previous variations of the open()/openat() syscalls allowed audit admins to inspect the syscall args to get the information contained in the new open_how struct used in openat2()" * tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: return early if the filter rule has a lower priority audit: add OPENAT2 record to list "how" info audit: add support for the openat2 syscall audit: replace magic audit syscall class numbers with macros lsm_audit: avoid overloading the "key" audit field audit: Convert to SPDX identifier audit: rename struct node to struct audit_node to prevent future name collisions
2021-11-01Merge tag 'selinux-pr-20211101' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add LSM/SELinux/Smack controls and auditing for io-uring. As usual, the individual commit descriptions have more detail, but we were basically missing two things which we're adding here: + establishment of a proper audit context so that auditing of io-uring ops works similarly to how it does for syscalls (with some io-uring additions because io-uring ops are *not* syscalls) + additional LSM hooks to enable access control points for some of the more unusual io-uring features, e.g. credential overrides. The additional audit callouts and LSM hooks were done in conjunction with the io-uring folks, based on conversations and RFC patches earlier in the year. - Fixup the binder credential handling so that the proper credentials are used in the LSM hooks; the commit description and the code comment which is removed in these patches are helpful to understand the background and why this is the proper fix. - Enable SELinux genfscon policy support for securityfs, allowing improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA. * tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: security: Return xattr name from security_dentry_init_security() selinux: fix a sock regression in selinux_ip_postroute_compat() binder: use cred instead of task for getsecid binder: use cred instead of task for selinux checks binder: use euid from cred instead of using task LSM: Avoid warnings about potentially unused hook variables selinux: fix all of the W=1 build warnings selinux: make better use of the nf_hook_state passed to the NF hooks selinux: fix race condition when computing ocontext SIDs selinux: remove unneeded ipv6 hook wrappers selinux: remove the SELinux lockdown implementation selinux: enable genfscon labeling for securityfs Smack: Brutalist io_uring support selinux: add support for the io_uring access controls lsm,io_uring: add LSM hooks to io_uring io_uring: convert io_uring to the secure anon inode interface fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() audit: add filtering for io_uring records audit,io_uring,io-wq: add some basic audit support to io_uring audit: prepare audit_context for use in calling contexts beyond syscalls
2021-11-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-11-01 We've added 181 non-merge commits during the last 28 day(s) which contain a total of 280 files changed, 11791 insertions(+), 5879 deletions(-). The main changes are: 1) Fix bpf verifier propagation of 64-bit bounds, from Alexei. 2) Parallelize bpf test_progs, from Yucong and Andrii. 3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus. 4) Improve bpf selftests on s390, from Ilya. 5) bloomfilter bpf map type, from Joanne. 6) Big improvements to JIT tests especially on Mips, from Johan. 7) Support kernel module function calls from bpf, from Kumar. 8) Support typeless and weak ksym in light skeleton, from Kumar. 9) Disallow unprivileged bpf by default, from Pawan. 10) BTF_KIND_DECL_TAG support, from Yonghong. 11) Various bpftool cleanups, from Quentin. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits) libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose bpf: Factor out helpers for ctx access checking bpf: Factor out a helper to prepare trampoline for struct_ops prog selftests, bpf: Fix broken riscv build riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h tools, build: Add RISC-V to HOSTARCH parsing riscv, bpf: Increase the maximum number of iterations selftests, bpf: Add one test for sockmap with strparser selftests, bpf: Fix test_txmsg_ingress_parser error ... ==================== Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01net: ndisc: introduce ndisc_evict_nocarrier sysctl parameterJames Prestwood
In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for wireless roams the neighbor discovery cache can and should remain intact since the underlying network has not changed. This patch introduces a sysctl option ndisc_evict_nocarrier which can be disabled by a wireless supplicant during a roam. This allows packets to be sent after a roam immediately without having to wait for neighbor discovery. A user reported roughly a 1 second delay after a roam before packets could be sent out (note, on IPv4). This delay was due to the ARP cache being cleared. During testing of this same scenario using IPv6 no delay was noticed, but regardless there is no reason to clear the ndisc cache for wireless roams. Signed-off-by: James Prestwood <prestwoj@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01net: arp: introduce arp_evict_nocarrier sysctl parameterJames Prestwood
This change introduces a new sysctl parameter, arp_evict_nocarrier. When set (default) the ARP cache will be cleared on a NOCARRIER event. This new option has been defaulted to '1' which maintains existing behavior. Clearing the ARP cache on NOCARRIER is relatively new, introduced by: commit 859bd2ef1fc1110a8031b967ee656c53a6260a76 Author: David Ahern <dsahern@gmail.com> Date: Thu Oct 11 20:33:49 2018 -0700 net: Evict neighbor entries on carrier down The reason for this changes is to prevent the ARP cache from being cleared when a wireless device roams. Specifically for wireless roams the ARP cache should not be cleared because the underlying network has not changed. Clearing the ARP cache in this case can introduce significant delays sending out packets after a roam. A user reported such a situation here: https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/ After some investigation it was found that the kernel was holding onto packets until ARP finished which resulted in this 1 second delay. It was also found that the first ARP who-has was never responded to, which is actually what caues the delay. This change is more or less working around this behavior, but again, there is no reason to clear the cache on a roam anyways. As for the unanswered who-has, we know the packet made it OTA since it was seen while monitoring. Why it never received a response is unknown. In any case, since this is a problem on the AP side of things all that can be done is to work around it until it is solved. Some background on testing/reproducing the packet delay: Hardware: - 2 access points configured for Fast BSS Transition (Though I don't see why regular reassociation wouldn't have the same behavior) - Wireless station running IWD as supplicant - A device on network able to respond to pings (I used one of the APs) Procedure: - Connect to first AP - Ping once to establish an ARP entry - Start a tcpdump - Roam to second AP - Wait for operstate UP event, and note the timestamp - Start pinging Results: Below is the tcpdump after UP. It was recorded the interface went UP at 10:42:01.432875. 10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46 10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64 10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64 10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64 10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64 10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64 10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64 10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64 10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64 10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64 10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64 10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64 10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64 You can see the first ARP who-has went out very quickly after UP, but was never responded to. Nearly a second later the kernel retries and gets a response. Only then do the ping packets go out. If an ARP entry is manually added prior to UP (after the cache is cleared) it is seen that the first ping is never responded to, so its not only an issue with ARP but with data packets in general. As mentioned prior, the wireless interface was also monitored to verify the ping/ARP packet made it OTA which was observed to be true. Signed-off-by: James Prestwood <prestwoj@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01Merge tag 'for-linus-5.16-1' of https://github.com/cminyard/linux-ipmiLinus Torvalds
Pull IPMI driver updates from Corey Minyard: "A new type of low-level IPMI driver is added for direct communication over the IPMI message bus without a BMC between the driver and the bus. Other than that, lots of little bug fixes and enhancements" * tag 'for-linus-5.16-1' of https://github.com/cminyard/linux-ipmi: ipmi: kcs_bmc: Fix a memory leak in the error handling path of 'kcs_bmc_serio_add_device()' char: ipmi: replace snprintf in show functions with sysfs_emit ipmi: ipmb: fix dependencies to eliminate build error ipmi:ipmb: Add OF support ipmi: bt: Add ast2600 compatible string ipmi: bt-bmc: Use registers directly ipmi: ipmb: Fix off-by-one size check on rcvlen ipmi:ssif: Use depends on, not select, for I2C ipmi: Add docs for the IPMI IPMB driver ipmi: Add docs for IPMB direct addressing ipmi:ipmb: Add initial support for IPMI over IPMB ipmi: Add support for IPMB direct messages ipmi: Export ipmb_checksum() ipmi: Fix a typo ipmi: Check error code before processing BMC response ipmi:devintf: Return a proper error when recv buffer too small ipmi: Disable some operations during a panic ipmi:watchdog: Set panic count to proper value on a panic
2021-11-01Merge tag 'media/v5.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - New driver for SK Hynix Hi-846 8M pixel camera - New driver for the ov13b10 camera - New driver for Renesas R-Car ISP - mtk-vcodec gained support for version 2 of decoder firmware ABI - The legacy sir_ir driver got removed - videobuf2: the vb2_mem_ops kAPI had some improvements - lots of cleanups, fixes and new features at device drivers * tag 'media/v5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (328 commits) media: venus: core: Add sdm660 DT compatible and resource struct media: dt-bindings: media: venus: Add sdm660 dt schema media: venus: vdec: decoded picture buffer handling during reconfig sequence media: venus: Handle fatal errors during encoding and decoding media: venus: helpers: Add helper to mark fatal vb2 error media: venus: hfi: Check for sys error on session hfi functions media: venus: Make sys_error flag an atomic bitops media: venus: venc: Use pmruntime autosuspend media: allegro: write vui parameters for HEVC media: allegro: nal-hevc: implement generator for vui media: allegro: write correct colorspace into SPS media: allegro: extract nal value lookup functions to header media: allegro: correctly scale the bit rate in SPS media: allegro: remove external QP table media: allegro: fix row and column in response message media: allegro: add control to disable encoder buffer media: allegro: add encoder buffer support media: allegro: add pm_runtime support media: allegro: lookup VCU settings media: allegro: fix module removal if initialization failed ...
2021-11-01Merge tag 'overflow-v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull overflow updates from Kees Cook: "The end goal of the current buffer overflow detection work[0] is to gain full compile-time and run-time coverage of all detectable buffer overflows seen via array indexing or memcpy(), memmove(), and memset(). The str*() family of functions already have full coverage. While much of the work for these changes have been on-going for many releases (i.e. 0-element and 1-element array replacements, as well as avoiding false positives and fixing discovered overflows[1]), this series contains the foundational elements of several related buffer overflow detection improvements by providing new common helpers and FORTIFY_SOURCE changes needed to gain the introspection required for compiler visibility into array sizes. Also included are a handful of already Acked instances using the helpers (or related clean-ups), with many more waiting at the ready to be taken via subsystem-specific trees[2]. The new helpers are: - struct_group() for gaining struct member range introspection - memset_after() and memset_startat() for clearing to the end of structures - DECLARE_FLEX_ARRAY() for using flex arrays in unions or alone in structs Also included is the beginning of the refactoring of FORTIFY_SOURCE to support memcpy() introspection, fix missing and regressed coverage under GCC, and to prepare to fix the currently broken Clang support. Finishing this work is part of the larger series[0], but depends on all the false positives and buffer overflow bug fixes to have landed already and those that depend on this series to land. As part of the FORTIFY_SOURCE refactoring, a set of both a compile-time and run-time tests are added for FORTIFY_SOURCE and the mem*()-family functions respectively. The compile time tests have found a legitimate (though corner-case) bug[6] already. Please note that the appearance of "panic" and "BUG" in the FORTIFY_SOURCE refactoring are the result of relocating existing code, and no new use of those code-paths are expected nor desired. Finally, there are two tree-wide conversions for 0-element arrays and flexible array unions to gain sane compiler introspection coverage that result in no known object code differences. After this series (and the changes that have now landed via netdev and usb), we are very close to finally being able to build with -Warray-bounds and -Wzero-length-bounds. However, due corner cases in GCC[3] and Clang[4], I have not included the last two patches that turn on these options, as I don't want to introduce any known warnings to the build. Hopefully these can be solved soon" Link: https://lore.kernel.org/lkml/20210818060533.3569517-1-keescook@chromium.org/ [0] Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=FORTIFY_SOURCE [1] Link: https://lore.kernel.org/lkml/202108220107.3E26FE6C9C@keescook/ [2] Link: https://lore.kernel.org/lkml/3ab153ec-2798-da4c-f7b1-81b0ac8b0c5b@roeck-us.net/ [3] Link: https://bugs.llvm.org/show_bug.cgi?id=51682 [4] Link: https://lore.kernel.org/lkml/202109051257.29B29745C0@keescook/ [5] Link: https://lore.kernel.org/lkml/20211020200039.170424-1-keescook@chromium.org/ [6] * tag 'overflow-v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) fortify: strlen: Avoid shadowing previous locals compiler-gcc.h: Define __SANITIZE_ADDRESS__ under hwaddress sanitizer treewide: Replace 0-element memcpy() destinations with flexible arrays treewide: Replace open-coded flex arrays in unions stddef: Introduce DECLARE_FLEX_ARRAY() helper btrfs: Use memset_startat() to clear end of struct string.h: Introduce memset_startat() for wiping trailing members and padding xfrm: Use memset_after() to clear padding string.h: Introduce memset_after() for wiping trailing members/padding lib: Introduce CONFIG_MEMCPY_KUNIT_TEST fortify: Add compile-time FORTIFY_SOURCE tests fortify: Allow strlen() and strnlen() to pass compile-time known lengths fortify: Prepare to improve strnlen() and strlen() warnings fortify: Fix dropped strcpy() compile-time write overflow check fortify: Explicitly disable Clang support fortify: Move remaining fortify helpers into fortify-string.h lib/string: Move helper functions out of string.c compiler_types.h: Remove __compiletime_object_size() cm4000_cs: Use struct_group() to zero struct cm4000_dev region can: flexcan: Use struct_group() to zero struct flexcan_regs regions ...
2021-11-01bpf: Add alignment padding for "map_extra" + consolidate holesJoanne Koong
This patch makes 2 changes regarding alignment padding for the "map_extra" field. 1) In the kernel header, "map_extra" and "btf_value_type_id" are rearranged to consolidate the hole. Before: struct bpf_map { ... u32 max_entries; /* 36 4 */ u32 map_flags; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ u64 map_extra; /* 48 8 */ int spin_lock_off; /* 56 4 */ int timer_off; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ u32 id; /* 64 4 */ int numa_node; /* 68 4 */ ... bool frozen; /* 117 1 */ /* XXX 10 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 2, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */ } __attribute__((__aligned__(64))); After: struct bpf_map { ... u32 max_entries; /* 36 4 */ u64 map_extra; /* 40 8 */ u32 map_flags; /* 48 4 */ int spin_lock_off; /* 52 4 */ int timer_off; /* 56 4 */ u32 id; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ int numa_node; /* 64 4 */ ... bool frozen /* 113 1 */ /* XXX 14 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ ... struct work_struct work; /* 144 72 */ /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */ struct mutex freeze_mutex; /* 216 144 */ /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */ u64 writecnt; /* 360 8 */ /* size: 384, cachelines: 6, members: 26 */ /* sum members: 354, holes: 1, sum holes: 14 */ /* padding: 16 */ /* forced alignments: 2, forced holes: 1, sum forced holes: 14 */ } __attribute__((__aligned__(64))); 2) Add alignment padding to the bpf_map_info struct More details can be found in commit 36f9814a494a ("bpf: fix uapi hole for 32 bit compat applications") Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029224909.1721024-3-joannekoong@fb.com
2021-11-01Merge tag 'locking-core-2021-10-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: - Move futex code into kernel/futex/ and split up the kitchen sink into seperate files to make integration of sys_futex_waitv() simpler. - Add a new sys_futex_waitv() syscall which allows to wait on multiple futexes. The main use case is emulating Windows' WaitForMultipleObjects which allows Wine to improve the performance of Windows Games. Also native Linux games can benefit from this interface as this is a common wait pattern for this kind of applications. - Add context to ww_mutex_trylock() to provide a path for i915 to rework their eviction code step by step without making lockdep upset until the final steps of rework are completed. It's also useful for regulator and TTM to avoid dropping locks in the non contended path. - Lockdep and might_sleep() cleanups and improvements - A few improvements for the RT substitutions. - The usual small improvements and cleanups. * tag 'locking-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) locking: Remove spin_lock_flags() etc locking/rwsem: Fix comments about reader optimistic lock stealing conditions locking: Remove rcu_read_{,un}lock() for preempt_{dis,en}able() locking/rwsem: Disable preemption for spinning region docs: futex: Fix kernel-doc references futex: Fix PREEMPT_RT build futex2: Documentation: Document sys_futex_waitv() uAPI selftests: futex: Test sys_futex_waitv() wouldblock selftests: futex: Test sys_futex_waitv() timeout selftests: futex: Add sys_futex_waitv() test futex,arm: Wire up sys_futex_waitv() futex,x86: Wire up sys_futex_waitv() futex: Implement sys_futex_waitv() futex: Simplify double_lock_hb() futex: Split out wait/wake futex: Split out requeue futex: Rename mark_wake_futex() futex: Rename: match_futex() futex: Rename: hb_waiter_{inc,dec,pending}() futex: Split out PI futex ...
2021-11-01Merge tag 'perf-core-2021-10-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Thomas Gleixner: "Core: - Allow ftrace to instrument parts of the perf core code - Add a new mem_hops field to perf_mem_data_src which allows to represent intra-node/package or inter-node/off-package details to prepare for next generation systems which have more hieararchy within the node/pacakge level. Tools: - Update for the new mem_hops field in perf_mem_data_src Arch: - A set of constraints fixes for the Intel uncore PMU - The usual set of small fixes and improvements for x86 and PPC" * tag 'perf-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix ICL/SPR INST_RETIRED.PREC_DIST encodings powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses tools/perf: Add mem_hops field in perf_mem_data_src structure perf: Add mem_hops field in perf_mem_data_src structure perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove an extra line perf/core: Allow ftrace for functions in kernel/event/core.c perf/x86: Add new event for AUX output counter index perf/x86: Add compiler barrier after updating BTS perf/x86/intel/uncore: Fix Intel SPR M3UPI event constraints perf/x86/intel/uncore: Fix Intel SPR M2PCIE event constraints perf/x86/intel/uncore: Fix Intel SPR IIO event constraints perf/x86/intel/uncore: Fix Intel SPR CHA event constraints perf/x86/intel/uncore: Fix Intel ICX IIO event constraints perf/x86/intel/uncore: Fix invalid unit check perf/x86/intel/uncore: Support extra IMC channel on Ice Lake server
2021-11-01Merge tag 'for-5.16-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The updates this time are more under the hood and enhancing existing features (subpage with compression and zoned namespaces). Performance related: - misc small inode logging improvements (+3% throughput, -11% latency on sample dbench workload) - more efficient directory logging: bulk item insertion, less tree searches and locking - speed up bulk insertion of items into a b-tree, which is used when logging directories, when running delayed items for directories (fsync and transaction commits) and when running the slow path (full sync) of an fsync (bulk creation run time -4%, deletion -12%) Core: - continued subpage support - make defragmentation work - make compression write work - zoned mode - support ZNS (zoned namespaces), zone capacity is number of usable blocks in each zone - add dedicated block group (zoned) for relocation, to prevent out of order writes in some cases - greedy block group reclaim, pick the ones with least usable space first - preparatory work for send protocol updates - error handling improvements - cleanups and refactoring Fixes: - lockdep warnings - in show_devname callback, on seeding device - device delete on loop device due to conversions to workqueues - fix deadlock between chunk allocation and chunk btree modifications - fix tracking of missing device count and status" * tag 'for-5.16-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (140 commits) btrfs: remove root argument from check_item_in_log() btrfs: remove root argument from add_link() btrfs: remove root argument from btrfs_unlink_inode() btrfs: remove root argument from drop_one_dir_item() btrfs: clear MISSING device status bit in btrfs_close_one_device btrfs: call btrfs_check_rw_degradable only if there is a missing device btrfs: send: prepare for v2 protocol btrfs: fix comment about sector sizes supported in 64K systems btrfs: update device path inode time instead of bd_inode fs: export an inode_update_time helper btrfs: fix deadlock when defragging transparent huge pages btrfs: sysfs: convert scnprintf and snprintf to sysfs_emit btrfs: make btrfs_super_block size match BTRFS_SUPER_INFO_SIZE btrfs: update comments for chunk allocation -ENOSPC cases btrfs: fix deadlock between chunk allocation and chunk btree modifications btrfs: zoned: use greedy gc for auto reclaim btrfs: check-integrity: stop storing the block device name in btrfsic_dev_state btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls btrfs: add a btrfs_get_dev_args_from_path helper btrfs: handle device lookup with btrfs_dev_lookup_args ...
2021-11-01Merge tag 'for-5.16/cdrom-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull CDROM updates from Jens Axboe: "On behalf of Phillip, here are the CDROM updates for the 5.16-rc1 merge window: - Add ioctl for improved media change detection (Lukas) - Reformat some documentation (Phillip) - Redundant variable removal (luo)" * tag 'for-5.16/cdrom-2021-10-29' of git://git.kernel.dk/linux-block: cdrom: Remove redundant variable and its assignment cdrom: docs: reformat table in Documentation/userspace-api/ioctl/cdrom.rst drivers/cdrom: improved ioctl for media change detection
2021-11-01Merge tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Light on new features - basically just the hybrid mode support. Outside of that it's just fixes, cleanups, and performance improvements. In detail: - Add ring related information to the fdinfo output (Hao) - Hybrid async mode (Hao) - Support for batched issue on block (me) - sqe error trace improvement (me) - IOPOLL efficiency improvements (Pavel) - submit state cleanups and improvements (Pavel) - Completion side improvements (Pavel) - Drain improvements (Pavel) - Buffer selection cleanups (Pavel) - Fixed file node improvements (Pavel) - io-wq setup cancelation fix (Pavel) - Various other performance improvements and cleanups (Pavel) - Misc fixes (Arnd, Bixuan, Changcheng, Hao, me, Noah)" * tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block: (97 commits) io-wq: remove worker to owner tw dependency io_uring: harder fdinfo sq/cq ring iterating io_uring: don't assign write hint in the read path io_uring: clusterise ki_flags access in rw_prep io_uring: kill unused param from io_file_supports_nowait io_uring: clean up timeout async_data allocation io_uring: don't try io-wq polling if not supported io_uring: check if opcode needs poll first on arming io_uring: clean iowq submit work cancellation io_uring: clean io_wq_submit_work()'s main loop io-wq: use helper for worker refcounting io_uring: implement async hybrid mode for pollable requests io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR()) io_uring: split logic of force_nonblock io_uring: warning about unused-but-set parameter io_uring: inform block layer of how many requests we are submitting io_uring: simplify io_file_supports_nowait() io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags io_uring: arm poll for non-nowait files fs/io_uring: Prioritise checking faster conditions first in io_write ...
2021-11-01Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block driver updates from Jens Axboe: - paride driver cleanups (Christoph) - Remove cryptoloop support (Christoph) - null_blk poll support (me) - Now that add_disk() supports proper error handling, add it to various drivers (Luis) - Make ataflop actually work again (Michael) - s390 dasd fixes (Stefan, Heiko) - nbd fixes (Yu, Ye) - Remove redundant wq flush in mtip32xx (Christophe) - NVMe updates - fix a multipath partition scanning deadlock (Hannes Reinecke) - generate uevent once a multipath namespace is operational again (Hannes Reinecke) - support unique discovery controller NQNs (Hannes Reinecke) - fix use-after-free when a port is removed (Israel Rukshin) - clear shadow doorbell memory on resets (Keith Busch) - use struct_size (Len Baker) - add error handling support for add_disk (Luis Chamberlain) - limit the maximal queue size for RDMA controllers (Max Gurtovoy) - use a few more symbolic names (Max Gurtovoy) - fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy) - add support for ->map_queues on FC (Saurav Kashyap) - support the current discovery subsystem entry (Hannes Reinecke) - use flex_array_size and struct_size (Len Baker) - bcache fixes (Christoph, Coly, Chao, Lin, Qing) - MD updates (Christoph, Guoqing, Xiao) - Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye) * tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits) null_blk: Fix handling of submit_queues and poll_queues attributes block: ataflop: Fix warning comparing pointer to 0 bcache: replace snprintf in show functions with sysfs_emit bcache: move uapi header bcache.h to bcache code directory nvmet: use flex_array_size and struct_size nvmet: register discovery subsystem as 'current' nvmet: switch check for subsystem type nvme: add new discovery log page entry definitions block: ataflop: more blk-mq refactoring fixes block: remove support for cryptoloop and the xor transfer mtd: add add_disk() error handling rnbd: add error handling support for add_disk() um/drivers/ubd_kern: add error handling support for add_disk() m68k/emu/nfblock: add error handling support for add_disk() xen-blkfront: add error handling support for add_disk() bcache: add error handling support for add_disk() dm: add add_disk() error handling block: aoe: fixup coccinelle warnings nvmet: use struct_size over open coded arithmetic nvme: drop scan_lock and always kick requeue list when removing namespaces ...
2021-11-01amt: add control plane of amt interfaceTaehee Yoo
It adds definitions and control plane code for AMT. this is very similar to udp tunneling interfaces such as gtp, vxlan, etc. In the next patch, data plane code will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Use array_size() in ebtables, from Gustavo A. R. Silva. 2) Attach IPS_ASSURED to internal UDP stream state, reported by Maciej Zenczykowski. 3) Add NFT_META_IFTYPE to match on the interface type either from ingress or egress. 4) Generalize pktinfo->tprot_set to flags field. 5) Allow to match on inner headers / payload data. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01netfilter: nft_payload: support for inner header matching / manglingPablo Neira Ayuso
Allow to match and mangle on inner headers / payload data after the transport header. There is a new field in the pktinfo structure that stores the inner header offset which is calculated only when requested. Only TCP and UDP supported at this stage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-01netfilter: nft_meta: add NFT_META_IFTYPEPablo Neira Ayuso
Generalize NFT_META_IIFTYPE to NFT_META_IFTYPE which allows you to match on the interface type of the skb->dev field. This field is used by the netdev family to add an implicit dependency to skip non-ethernet packets when matching on layer 3 and 4 TCP/IP header fields. For backward compatibility, add the NFT_META_IIFTYPE alias to NFT_META_IFTYPE. Add __NFT_META_IIFTYPE, to be used by userspace in the future to match specifically on the iiftype. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-10-29bcache: move uapi header bcache.h to bcache code directoryColy Li
The header file include/uapi/linux/bcache.h is not really a user space API heaer. This file defines the ondisk format of bcache internal meta data but no one includes it from user space, bcache-tools has its own copy of this header with minor modification. Therefore, this patch moves include/uapi/linux/bcache.h to bcache code directory as drivers/md/bcache/bcache_ondisk.h. Suggested-by: Arnd Bergmann <arnd@kernel.org> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211029060930.119923-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29btrfs: send: prepare for v2 protocolDavid Sterba
This is preparatory work for send protocol update to version 2 and higher. We have many pending protocol update requests but still don't have the basic protocol rev in place, the first thing that must happen is to do the actual versioning support. The protocol version is u32 and is a new member in the send ioctl struct. Validity of the version field is backed by a new flag bit. Old kernels would fail when a higher version is requested. Version protocol 0 will pick the highest supported version, BTRFS_SEND_STREAM_VERSION, that's also exported in sysfs. The version is still unchanged and will be increased once we have new incompatible commands or stream updates. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-28bpf: Add bpf_kallsyms_lookup_name helperKumar Kartikeya Dwivedi
This helper allows us to get the address of a kernel symbol from inside a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can relocate typeless ksym vars. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com
2021-10-28bpf: Add bloom filter map implementationJoanne Koong
This patch adds the kernel-side changes for the implementation of a bpf bloom filter map. The bloom filter map supports peek (determining whether an element is present in the map) and push (adding an element to the map) operations.These operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_UPDATE_ELEM -> push The bloom filter map does not have keys, only values. In light of this, the bloom filter map's API matches that of queue stack maps: user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM which correspond internally to bpf_map_peek_elem/bpf_map_push_elem, and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem APIs to query or add an element to the bloom filter map. When the bloom filter map is created, it must be created with a key_size of 0. For updates, the user will pass in the element to add to the map as the value, with a NULL key. For lookups, the user will pass in the element to query in the map as the value, with a NULL key. In the verifier layer, this requires us to modify the argument type of a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE; as well, in the syscall layer, we need to copy over the user value so that in bpf_map_peek_elem, we know which specific value to query. A few things to please take note of: * If there are any concurrent lookups + updates, the user is responsible for synchronizing this to ensure no false negative lookups occur. * The number of hashes to use for the bloom filter is configurable from userspace. If no number is specified, the default used will be 5 hash functions. The benchmarks later in this patchset can help compare the performance of using different number of hashes on different entry sizes. In general, using more hashes decreases both the false positive rate and the speed of a lookup. * Deleting an element in the bloom filter map is not supported. * The bloom filter map may be used as an inner map. * The "max_entries" size that is specified at map creation time is used to approximate a reasonable bitmap size for the bloom filter, and is not otherwise strictly enforced. If the user wishes to insert more entries into the bloom filter than "max_entries", they may do so but they should be aware that this may lead to a higher false positive rate. Signed-off-by: Joanne Koong <joannekoong@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
2021-10-26mctp: Implement extended addressingJeremy Kerr
This change allows an extended address struct - struct sockaddr_mctp_ext - to be passed to sendmsg/recvmsg. This allows userspace to specify output ifindex and physical address information (for sendmsg) or receive the input ifindex/physaddr for incoming messages (for recvmsg). This is typically used by userspace for MCTP address discovery and assignment operations. The extended addressing facility is conditional on a new sockopt: MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before the kernel will consume/populate the extended address data. Includes a fix for an uninitialised var: Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-25net: phy: add constants for fast retrain related registerLuo Jie
Add the constants for 2.5G fast retrain capability in 10G AN control register, fast retrain status and control register and THP bypass register into mdio.h. Signed-off-by: Luo Jie <luoj@codeaurora.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)Vincent Mailhol
Add the netlink interface for TDC parameters of struct can_tdc_const and can_tdc. Contrary to the can_bittiming(_const) structures for which there is just a single IFLA_CAN(_DATA)_BITTMING(_CONST) entry per structure, here, we create a nested entry IFLA_CAN_TDC. Within this nested entry, additional IFLA_CAN_TDC_TDC* entries are added for each of the TDC parameters of the newly introduced struct can_tdc_const and struct can_tdc. For struct can_tdc_const, these are: IFLA_CAN_TDC_TDCV_MIN IFLA_CAN_TDC_TDCV_MAX IFLA_CAN_TDC_TDCO_MIN IFLA_CAN_TDC_TDCO_MAX IFLA_CAN_TDC_TDCF_MIN IFLA_CAN_TDC_TDCF_MAX For struct can_tdc, these are: IFLA_CAN_TDC_TDCV IFLA_CAN_TDC_TDCO IFLA_CAN_TDC_TDCF This is done so that changes can be applied in the future to the structures without breaking the netlink interface. The TDC netlink logic works as follow: * CAN_CTRLMODE_FD is not provided: - if any TDC parameters are provided: error. - TDC parameters not provided: TDC parameters unchanged. * CAN_CTRLMODE_FD is provided and is false: - TDC is deactivated: both the structure and the CAN_CTRLMODE_TDC_{AUTO,MANUAL} flags are flushed. * CAN_CTRLMODE_FD provided and is true: - CAN_CTRLMODE_TDC_{AUTO,MANUAL} and tdc{v,o,f} not provided: call can_calc_tdco() to automatically decide whether TDC should be activated and, if so, set CAN_CTRLMODE_TDC_AUTO and uses the calculated tdco value. - CAN_CTRLMODE_TDC_AUTO and tdco provided: set CAN_CTRLMODE_TDC_AUTO and use the provided tdco value. Here, tdcv is illegal and tdcf is optional. - CAN_CTRLMODE_TDC_MANUAL and both of tdcv and tdco provided: set CAN_CTRLMODE_TDC_MANUAL and use the provided tdcv and tdco value. Here, tdcf is optional. - CAN_CTRLMODE_TDC_{AUTO,MANUAL} are mutually exclusive. Whenever one flag is turned on, the other will automatically be turned off. Providing both returns an error. - Combination other than the one listed above are illegal and will return an error. N.B. above rules mean that whenever CAN_CTRLMODE_FD is provided, the previous TDC values will be overwritten. The only option to reuse previous TDC value is to not provide CAN_CTRLMODE_FD. All the new parameters are defined as u32. This arbitrary choice is done to mimic the other bittiming values with are also all of type u32. An u16 would have been sufficient to hold the TDC values. This patch completes below series (c.f. [1]): - commit 289ea9e4ae59 ("can: add new CAN FD bittiming parameters: Transmitter Delay Compensation (TDC)") - commit c25cc7993243 ("can: bittiming: add calculation for CAN FD Transmitter Delay Compensation (TDC)") [1] https://lore.kernel.org/linux-can/20210224002008.4158-1-mailhol.vincent@wanadoo.fr/T/#t Link: https://lore.kernel.org/all/20210918095637.20108-5-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-24can: bittiming: allow TDC{V,O} to be zero and add can_tdc_const::tdc{v,o,f}_minVincent Mailhol
ISO 11898-1 specifies in section 11.3.3 "Transmitter delay compensation" that "the configuration range for [the] SSP position shall be at least 0 to 63 minimum time quanta." Because SSP = TDCV + TDCO, it means that we should allow both TDCV and TDCO to hold zero value in order to honor SSP's minimum possible value. However, current implementation assigned special meaning to TDCV and TDCO's zero values: * TDCV = 0 -> TDCV is automatically measured by the transceiver. * TDCO = 0 -> TDC is off. In order to allow for those values to really be zero and to maintain current features, we introduce two new flags: * CAN_CTRLMODE_TDC_AUTO indicates that the controller support automatic measurement of TDCV. * CAN_CTRLMODE_TDC_MANUAL indicates that the controller support manual configuration of TDCV. N.B.: current implementation failed to provide an option for the driver to indicate that only manual mode was supported. TDC is disabled if both CAN_CTRLMODE_TDC_AUTO and CAN_CTRLMODE_TDC_MANUAL flags are off, c.f. the helper function can_tdc_is_enabled() which is also introduced in this patch. Also, this patch adds three fields: tdcv_min, tdco_min and tdcf_min to struct can_tdc_const. While we are not convinced that those three fields could be anything else than zero, we can imagine that some controllers might specify a lower bound on these. Thus, those minimums are really added "just in case". Comments of struct can_tdc and can_tdc_const are updated accordingly. Finally, the changes are applied to the etas_es58x driver. Link: https://lore.kernel.org/all/20210918095637.20108-2-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-10-22Merge tag 'mac80211-next-for-net-next-2021-10-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Quite a few changes: * the applicable eth_hw_addr_set() and const hw_addr changes * various code cleanups/refactorings * stack usage reductions across the wireless stack * some unstructured find_ie() -> structured find_element() changes * a few more pieces of multi-BSSID support * some 6 GHz regulatory support * 6 GHz support in hwsim, for testing userspace code * Light Communications (LC, 802.11bb) early band definitions to be able to add a first driver soon * tag 'mac80211-next-for-net-next-2021-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (35 commits) cfg80211: fix kernel-doc for MBSSID EMA mac80211: Prevent AP probing during suspend nl80211: Add LC placeholder band definition to nl80211_band ... ==================== Link: https://lore.kernel.org/r/20211021154953.134849-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-21bpf: Add verified_insns to bpf_prog_info and fdinfoDave Marchevsky
This stat is currently printed in the verifier log and not stored anywhere. To ease consumption of this data, add a field to bpf_prog_aux so it can be exposed via BPF_OBJ_GET_INFO_BY_FD and fdinfo. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211020074818.1017682-2-davemarchevsky@fb.com
2021-10-21bpf: Add bpf_skc_to_unix_sock() helperHengqi Chen
The helper is used in tracing programs to cast a socket pointer to a unix_sock pointer. The return value could be NULL if the casting is illegal. Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20211021134752.1223426-2-hengqi.chen@gmail.com
2021-10-21cfg80211: fix kernel-doc for MBSSID EMAJohannes Berg
The struct member ema_max_profile_periodicity was listed with the wrong name in the kernel-doc, fix that. Link: https://lore.kernel.org/r/20211021173038.18ec2030c66b.Iac731bb299525940948adad2c41f514b7dd81c47@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-21nl80211: Add LC placeholder band definition to nl80211_bandSrinivasan Raju
Define LC band which is a draft under IEEE 802.11bb. Current NL80211_BAND_LC is a placeholder band and will be more defined IEEE 802.11bb progresses. Signed-off-by: Srinivasan Raju <srini.raju@purelifi.com> Link: https://lore.kernel.org/r/20211018100143.7565-2-srini.raju@purelifi.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-21nl80211: vendor-cmd: intel: add more details for ↵Emmanuel Grumbach
IWL_MVM_VENDOR_CMD_HOST_GET_OWNERSHIP Explain more the expected flow for this command. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Link: https://lore.kernel.org/r/20211020051147.29297-1-emmanuel.grumbach@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-10-20fq_codel: generalise ce_threshold marking for subset of trafficToke Høiland-Jørgensen
Commit e72aeb9ee0e3 ("fq_codel: implement L4S style ce_threshold_ect1 marking") expanded the ce_threshold feature of FQ-CoDel so it can be applied to a subset of the traffic, using the ECT(1) bit of the ECN field as the classifier. However, hard-coding ECT(1) as the only classifier for this feature seems limiting, so let's expand it to be more general. To this end, change the parameter from a ce_threshold_ect1 boolean, to a one-byte selector/mask pair (ce_threshold_{selector,mask}) which is applied to the whole diffserv/ECN field in the IP header. This makes it possible to classify packets by any value in either the ECN field or the diffserv field. In particular, setting a selector of INET_ECN_ECT_1 and a mask of INET_ECN_MASK corresponds to the functionality before this patch, and a mask of ~INET_ECN_MASK allows using the selector as a straight-forward match against a diffserv code point: # apply ce_threshold to ECT(1) traffic tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x1/0x3 # apply ce_threshold to ECN-capable traffic marked as diffserv AF22 tc qdisc replace dev eth0 root fq_codel ce_threshold 1ms ce_threshold_selector 0x50/0xfc Regardless of the selector chosen, the normal rules for ECN-marking of packets still apply, i.e., the flow must still declare itself ECN-capable by setting one of the bits in the ECN field to get marked at all. v2: - Add tc usage examples to patch description Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20211019174709.69081-1-toke@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-20media: allegro: add control to disable encoder bufferMichael Tretter
The encoder buffer can have a negative impact on the quality of the encoded video. Add a control to allow user space to disable the encoder buffer per channel if the VCU supports the encoder buffer but the quality is not sufficient. Signed-off-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2021-10-20bcache: reserve never used bits from bkey.highColy Li
There sre 3 bits in member high of struct bkey are never used, and no plan to support them in future, - HEADER_SIZE, start at bit 58, length 2 bits - KEY_PINNED, start at bit 55, length 1 bit No any kernel code, or user space tool references or accesses the three bits. Therefore it is possible and feasible to reserve the valuable bits from bkey.high. They can be used in future for other purpose. Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20211020143812.6403-3-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-19perf: Add mem_hops field in perf_mem_data_src structureKajol Jain
Going forward, future generation systems can have more hierarchy within the node/package level but currently we don't have any data source encoding field in perf, which can be used to represent this level of data. Add a new field called 'mem_hops' in the perf_mem_data_src structure which can be used to represent intra-node/package or inter-node/off-package details. This field is of size 3 bits where PERF_MEM_HOPS_{NA, 0..6} value can be used to present different hop levels data. Also add corresponding macros to define mem_hop field values and shift value. Currently we define macro for HOPS_0 which corresponds to data coming from another core but same node. For ex: Encodings for mem_hops fields with L2 cache: L2 - local L2 L2 | REMOTE | HOPS_0 - remote core, same node L2 Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-3-kjain@linux.ibm.com
2021-10-19perf: Add comment about current state of PERF_MEM_LVL_* namespace and remove ↵Kajol Jain
an extra line Add a comment about PERF_MEM_LVL_* namespace being depricated to some extent in favour of added PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Remove an extra line present in perf_mem__lvl_scnprintf function. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-2-kjain@linux.ibm.com
2021-10-19io_uring: add flag to not fail link after timeoutPavel Begunkov
For some reason non-off IORING_OP_TIMEOUT always fails links, it's pretty inconvenient and unnecessary limits chaining after it to hard linking, which is far from ideal, e.g. doesn't pair well with timeout cancellation. Add a flag forcing it to not fail links on -ETIME. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18bpf: Rename BTF_KIND_TAG to BTF_KIND_DECL_TAGYonghong Song
Patch set [1] introduced BTF_KIND_TAG to allow tagging declarations for struct/union, struct/union field, var, func and func arguments and these tags will be encoded into dwarf. They are also encoded to btf by llvm for the bpf target. After BTF_KIND_TAG is introduced, we intended to use it for kernel __user attributes. But kernel __user is actually a type attribute. Upstream and internal discussion showed it is not a good idea to mix declaration attribute and type attribute. So we proposed to introduce btf_type_tag as a type attribute and existing btf_tag renamed to btf_decl_tag ([2]). This patch renamed BTF_KIND_TAG to BTF_KIND_DECL_TAG and some other declarations with *_tag to *_decl_tag to make it clear the tag is for declaration. In the future, BTF_KIND_TYPE_TAG might be introduced per [3]. [1] https://lore.kernel.org/bpf/20210914223004.244411-1-yhs@fb.com/ [2] https://reviews.llvm.org/D111588 [3] https://reviews.llvm.org/D111199 Fixes: b5ea834dde6b ("bpf: Support for new btf kind BTF_KIND_TAG") Fixes: 5b84bd10363e ("libbpf: Add support for BTF_KIND_TAG") Fixes: 5c07f2fec003 ("bpftool: Add support for BTF_KIND_TAG") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211012164838.3345699-1-yhs@fb.com
2021-10-18treewide: Replace 0-element memcpy() destinations with flexible arraysKees Cook
The 0-element arrays that are used as memcpy() destinations are actually flexible arrays. Adjust their structures accordingly so that memcpy() can better reason able their destination size (i.e. they need to be seen as "unknown" length rather than "zero"). In some cases, use of the DECLARE_FLEX_ARRAY() helper is needed when a flexible array is alone in a struct. Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Nilesh Javali <njavali@marvell.com> Cc: Manish Rangankar <mrangankar@marvell.com> Cc: GR-QLogic-Storage-Upstream@marvell.com Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Phillip Potter <phil@philpotter.co.uk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Florian Schilhabel <florian.c.schilhabel@googlemail.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Fabio Aiuto <fabioaiuto83@gmail.com> Cc: Ross Schmidt <ross.schm.dev@gmail.com> Cc: Marco Cesati <marcocesati@gmail.com> Cc: ath10k@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: linux-staging@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18stddef: Introduce DECLARE_FLEX_ARRAY() helperKees Cook
There are many places where kernel code wants to have several different typed trailing flexible arrays. This would normally be done with multiple flexible arrays in a union, but since GCC and Clang don't (on the surface) allow this, there have been many open-coded workarounds, usually involving neighboring 0-element arrays at the end of a structure. For example, instead of something like this: struct thing { ... union { struct type1 foo[]; struct type2 bar[]; }; }; code works around the compiler with: struct thing { ... struct type1 foo[0]; struct type2 bar[]; }; Another case is when a flexible array is wanted as the single member within a struct (which itself is usually in a union). For example, this would be worked around as: union many { ... struct { struct type3 baz[0]; }; }; These kinds of work-arounds cause problems with size checks against such zero-element arrays (for example when building with -Warray-bounds and -Wzero-length-bounds, and with the coming FORTIFY_SOURCE improvements), so they must all be converted to "real" flexible arrays, avoiding warnings like this: fs/hpfs/anode.c: In function 'hpfs_add_sector_to_btree': fs/hpfs/anode.c:209:27: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct bplus_internal_node[0]' [-Wzero-length-bounds] 209 | anode->btree.u.internal[0].down = cpu_to_le32(a); | ~~~~~~~~~~~~~~~~~~~~~~~^~~ In file included from fs/hpfs/hpfs_fn.h:26, from fs/hpfs/anode.c:10: fs/hpfs/hpfs.h:412:32: note: while referencing 'internal' 412 | struct bplus_internal_node internal[0]; /* (internal) 2-word entries giving | ^~~~~~~~ drivers/net/can/usb/etas_es58x/es58x_fd.c: In function 'es58x_fd_tx_can_msg': drivers/net/can/usb/etas_es58x/es58x_fd.c:360:35: warning: array subscript 65535 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[]'} [-Wzero-length-bounds] 360 | tx_can_msg = (typeof(tx_can_msg))&es58x_fd_urb_cmd->raw_msg[msg_len]; | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from drivers/net/can/usb/etas_es58x/es58x_core.h:22, from drivers/net/can/usb/etas_es58x/es58x_fd.c:17: drivers/net/can/usb/etas_es58x/es58x_fd.h:231:6: note: while referencing 'raw_msg' 231 | u8 raw_msg[0]; | ^~~~~~~ However, it _is_ entirely possible to have one or more flexible arrays in a struct or union: it just has to be in another struct. And since it cannot be alone in a struct, such a struct must have at least 1 other named member -- but that member can be zero sized. Wrap all this nonsense into the new DECLARE_FLEX_ARRAY() in support of having flexible arrays in unions (or alone in a struct). As with struct_group(), since this is needed in UAPI headers as well, implement the core there, with a non-UAPI wrapper. Additionally update kernel-doc to understand its existence. https://github.com/KSPP/linux/issues/137 Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org>
2021-10-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS for net-next: 1) Add new run_estimation toggle to IPVS to stop the estimation_timer logic, from Dust Li. 2) Relax superfluous dynset check on NFT_SET_TIMEOUT. 3) Add egress hook, from Lukas Wunner. 4) Nowadays, almost all hook functions in x_table land just call the hook evaluation loop. Remove remaining hook wrappers from iptables and IPVS. From Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18ether: add EtherType for proprietary Realtek protocolsAlvin Šipraga
Add a new EtherType ETH_P_REALTEK to the if_ether.h uapi header. The EtherType 0x8899 is used in a number of different protocols from Realtek Semiconductor Corp [1], so no general assumptions should be made when trying to decode such packets. Observed protocols include: 0x1 - Realtek Remote Control protocol [2] 0x2 - Echo protocol [2] 0x3 - Loop detection protocol [2] 0x4 - RTL8365MB 4- and 8-byte switch CPU tag protocols [3] 0x9 - RTL8306 switch CPU tag protocol [4] 0xA - RTL8366RB switch CPU tag protocol [4] [1] https://lore.kernel.org/netdev/CACRpkdYQthFgjwVzHyK3DeYUOdcYyWmdjDPG=Rf9B3VrJ12Rzg@mail.gmail.com/ [2] https://www.wireshark.org/lists/ethereal-dev/200409/msg00090.html [3] https://lore.kernel.org/netdev/20210822193145.1312668-4-alvin@pqrs.dk/ [4] https://lore.kernel.org/netdev/20200708122537.1341307-2-linus.walleij@linaro.org/ Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18mctp: Be explicit about struct sockaddr_mctp paddingJeremy Kerr
We currently have some implicit padding in struct sockaddr_mctp. This patch makes this padding explicit, and ensures we have consistent layout on platforms with <32bit alignmnent. Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-18mctp: unify sockaddr_mctp typesJeremy Kerr
Use the more precise __kernel_sa_family_t for smctp_family, to match struct sockaddr. Also, use an unsigned int for the network member; negative networks don't make much sense. We're already using unsigned for mctp_dev and mctp_skb_cb, but need to change mctp_sock to suit. Fixes: 60fc63981693 ("mctp: Add sockaddr_mctp to uapi") Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Acked-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-16net/smc: add netlink support for SMC-Rv2Karsten Graul
Implement the netlink support for SMC-Rv2 related attributes that are provided to user space. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-15fq_codel: implement L4S style ce_threshold_ect1 markingEric Dumazet
Add TCA_FQ_CODEL_CE_THRESHOLD_ECT1 boolean option to select Low Latency, Low Loss, Scalable Throughput (L4S) style marking, along with ce_threshold. If enabled, only packets with ECT(1) can be transformed to CE if their sojourn time is above the ce_threshold. Note that this new option does not change rules for codel law. In particular, if TCA_FQ_CODEL_ECN is left enabled (this is the default when fq_codel qdisc is created), ECT(0) packets can still get CE if codel law (as governed by limit/target) decides so. Section 4.3.b of current draft [1] states: b. A scheduler with per-flow queues such as FQ-CoDel or FQ-PIE can be used for L4S. For instance within each queue of an FQ-CoDel system, as well as a CoDel AQM, there is typically also ECN marking at an immediate (unsmoothed) shallow threshold to support use in data centres (see Sec.5.2.7 of [RFC8290]). This can be modified so that the shallow threshold is solely applied to ECT(1) packets. Then if there is a flow of non-ECN or ECT(0) packets in the per-flow-queue, the Classic AQM (e.g. CoDel) is applied; while if there is a flow of ECT(1) packets in the queue, the shallower (typically sub-millisecond) threshold is applied. Tested: tc qd replace dev eth1 root fq_codel ce_threshold_ect1 50usec netperf ... -t TCP_STREAM -- K dctcp tc -s -d qd sh dev eth1 qdisc fq_codel 8022: root refcnt 32 limit 10240p flows 1024 quantum 9212 target 5ms ce_threshold_ect1 49us interval 100ms memory_limit 32Mb ecn drop_batch 64 Sent 14388596616 bytes 9543449 pkt (dropped 0, overlimits 0 requeues 152013) backlog 0b 0p requeues 152013 maxpacket 68130 drop_overlimit 0 new_flow_count 95678 ecn_mark 0 ce_mark 7639 new_flows_len 0 old_flows_len 0 [1] L4S current draft: https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-l4s-arch Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ingemar Johansson S <ingemar.s.johansson@ericsson.com> Cc: Tom Henderson <tomh@tomh.org> Cc: Bob Briscoe <in@bobbriscoe.net> Signed-off-by: David S. Miller <davem@davemloft.net>