summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2022-11-15bpf: Expand map key argument of bpf_redirect_map to u64Toke Høiland-Jørgensen
For queueing packets in XDP we want to add a new redirect map type with support for 64-bit indexes. To prepare fore this, expand the width of the 'key' argument to the bpf_redirect_map() helper. Since BPF registers are always 64-bit, this should be safe to do after the fact. Acked-by: Song Liu <song@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221108140601.149971-3-toke@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-15Merge tag 'br-v6.2d' of git://linuxtv.org/hverkuil/media_tree into media_stageMauro Carvalho Chehab
Tag branch * tag 'br-v6.2d' of git://linuxtv.org/hverkuil/media_tree: (35 commits) media: saa7164: remove variable cnt atomisp: fix potential NULL pointer dereferences radio-terratec: Remove variable p media: platform: s5p-mfc: Fix spelling mistake "mmaping" -> "mmapping" media: platform: mtk-mdp3: remove unused VIDEO_MEDIATEK_VPU config media: vivid: remove redundant assignment to variable checksum media: cedrus: h264: Optimize mv col buffer allocation media: cedrus: h265: Associate mv col buffers with buffer media: mediatek: vcodec: fix h264 cavlc bitstream fail media: cedrus: hevc: Fix offset adjustments media: imx-jpeg: Fix Coverity issue in probe media: v4l2-ioctl.c: Unify YCbCr/YUV terms in format descriptions media: atomisp: Fix spelling mistake "mis-match" -> "mismatch" media: c8sectpfe: Add missed header(s) media: adv748x: afe: Select input port when initializing AFE media: vimc: Update device configuration in the documentation media: adv748x: Remove dead function declaration media: mxl5005s: Make array RegAddr static const media: atomisp: Fix spelling mistake "modee" -> "mode" media: meson/vdec: always init coef_node_start ...
2022-11-15Merge tag 'br-v6.2e' of git://linuxtv.org/hverkuil/media_tree into media_stageMauro Carvalho Chehab
Tag branch * tag 'br-v6.2e' of git://linuxtv.org/hverkuil/media_tree: (29 commits) media: davinci/vpbe: Fix a typo ("defualt_mode") media: sun6i-csi: Remove unnecessary print function dev_err() media: Documentation: Drop deprecated bytesused == 0 media: platform: exynos4-is: fix return value check in fimc_md_probe() media: dvb-core: remove variable n, turn for-loop to while-loop media: vivid: fix compose size exceed boundary media: rkisp1: make const arrays ae_wnd_num and hist_wnd_num static media: dvb-core: Fix UAF due to refcount races at releasing media: rkvdec: Add required padding media: aspeed: Extend debug message media: aspeed: Support aspeed mode to reduce compressed data media: Documentation: aspeed-video: Add user documentation for the aspeed-video driver media: v4l2-ctrls: Reserve controls for ASPEED media: v4l: Add definition for the Aspeed JPEG format staging: media: tegra-video: fix device_node use after free staging: media: tegra-video: fix chan->mipi value on error media: cedrus: initialize controls a bit later media: cedrus: prefer untiled capture format media: cedrus: Remove cedrus_codec enum media: cedrus: set codec ops immediately ...
2022-11-15netfilter: nf_tables: Introduce NFT_MSG_GETRULE_RESETPhil Sutter
Analogous to NFT_MSG_GETOBJ_RESET, but for rules: Reset stateful expressions like counters or quotas. The latter two are the only consumers, adjust their 'dump' callbacks to respect the parameter introduced earlier. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-11-14bpf: Support bpf_list_head in map valuesKumar Kartikeya Dwivedi
Add the support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:name:node" where the name is then used to look up the type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during the lookup. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Note that the bpf_list_head_free function moves the list out to a local variable under the lock and releases it, doing the actual draining of the list items outside the lock. While this helps with not holding the lock for too long pessimizing other concurrent list operations, it is also necessary for deadlock prevention: unless every function called in the critical section would be notrace, a fentry/fexit program could attach and call bpf_map_update_elem again on the map, leading to the same lock being acquired if the key matches and lead to a deadlock. While this requires some special effort on part of the BPF programmer to trigger and is highly unlikely to occur in practice, it is always better if we can avoid such a condition. While notrace would prevent this, doing the draining outside the lock has advantages of its own, hence it is used to also fix the deadlock related problem. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-14vfio: Add an option to get migration data sizeYishai Hadas
Add an option to get migration data size by introducing a new migration feature named VFIO_DEVICE_FEATURE_MIG_DATA_SIZE. Upon VFIO_DEVICE_FEATURE_GET the estimated data length that will be required to complete STOP_COPY is returned. This option may better enable user space to consider before moving to STOP_COPY whether it can meet the downtime SLA based on the returned data. The patch also includes the implementation for mlx5 and hisi for this new option to make it feature complete for the existing drivers in this area. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Link: https://lore.kernel.org/r/20221106174630.25909-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-11-14cxl/doe: Request exclusive DOE accessIra Weiny
The PCIE Data Object Exchange (DOE) mailbox is a protocol run over configuration cycles. It assumes one initiator at a time. While the kernel has control of the mailbox user space writes could interfere with the kernel access. Mark DOE mailbox config space exclusive when iterated by the CXL driver. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20220926215711.2893286-3-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-11-14dmaengine: idxd: Fix crc_val field for completion recordFenghua Yu
The crc_val in the completion record should be 64 bits and not 32 bits. Fixes: 4ac823e9cd85 ("dmaengine: idxd: fix delta_rec and crc size field for completion record") Reported-by: Nirav N Shah <nirav.n.shah@intel.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221111012715.2031481-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2022-11-11Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2022-11-11 We've added 49 non-merge commits during the last 9 day(s) which contain a total of 68 files changed, 3592 insertions(+), 1371 deletions(-). The main changes are: 1) Veristat tool improvements to support custom filtering, sorting, and replay of results, from Andrii Nakryiko. 2) BPF verifier precision tracking fixes and improvements, from Andrii Nakryiko. 3) Lots of new BPF documentation for various BPF maps, from Dave Tucker, Donald Hunter, Maryam Tahhan, Bagas Sanjaya. 4) BTF dedup improvements and libbpf's hashmap interface clean ups, from Eduard Zingerman. 5) Fix veth driver panic if XDP program is attached before veth_open, from John Fastabend. 6) BPF verifier clean ups and fixes in preparation for follow up features, from Kumar Kartikeya Dwivedi. 7) Add access to hwtstamp field from BPF sockops programs, from Martin KaFai Lau. 8) Various fixes for BPF selftests and samples, from Artem Savkov, Domenico Cerasuolo, Kang Minchul, Rong Tao, Yang Jihong. 9) Fix redirection to tunneling device logic, preventing skb->len == 0, from Stanislav Fomichev. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits) selftests/bpf: fix veristat's singular file-or-prog filter selftests/bpf: Test skops->skb_hwtstamp selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test bpf: Add hwtstamp field for the sockops prog selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch bpf, docs: Document BPF_MAP_TYPE_ARRAY docs/bpf: Document BPF map types QUEUE and STACK docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS docs/bpf: Document BPF_MAP_TYPE_CPUMAP map docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map libbpf: Hashmap.h update to fix build issues using LLVM14 bpf: veth driver panics when xdp prog attached before veth_open selftests: Fix test group SKIPPED result selftests/bpf: Tests for btf_dedup_resolve_fwds libbpf: Resolve unambigous forward declarations libbpf: Hashmap interface update to allow both long and void* keys/values samples/bpf: Fix sockex3 error: Missing BPF prog type selftests/bpf: Fix u32 variable compared with less than zero Documentation: bpf: Escape underscore in BPF type name prefix selftests/bpf: Use consistent build-id type for liburandom_read.so ... ==================== Link: https://lore.kernel.org/r/20221111233733.1088228-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-11Merge tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Nothing major, just a few minor tweaks: - Tweak for the TCP zero-copy io_uring self test (Pavel) - Rather than use our internal cached value of number of CQ events available, use what the user can see (Dylan) - Fix a typo in a comment, added in this release (me) - Don't allow wrapping while adding provided buffers (me) - Fix a double poll race, and add a lockdep assertion for it too (Pavel)" * tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux: io_uring/poll: lockdep annote io_poll_req_insert_locked io_uring/poll: fix double poll req->flags races io_uring: check for rollover of buffer ID when providing buffers io_uring: calculate CQEs from the user visible value io_uring: fix typo in io_uring.h comment selftests/net: don't tests batched TCP io_uring zc
2022-11-11bpf: Add hwtstamp field for the sockops progMartin KaFai Lau
The bpf-tc prog has already been able to access the skb_hwtstamps(skb)->hwtstamp. This patch extends the same hwtstamp access to the sockops prog. In sockops, the skb is also available to the bpf prog during the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event. There is a use case that the hwtstamp will be useful to the sockops prog to better measure the one-way-delay when the sender has put the tx timestamp in the tcp header option. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
2022-11-11Merge tag 'dmaengine-fix-6.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "Misc minor driver fixes and a big pile of at_hdmac driver fixes. More work on this driver is done and sitting in next: - Pile of at_hdmac driver rework which fixes many long standing issues for this driver. - couple of stm32 driver fixes for clearing structure and race fix - idxd fixes for RO device state and batch size - ti driver mem leak fix - apple fix for grabbing channels in xlate - resource leak fix in mv xor" * tag 'dmaengine-fix-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (24 commits) dmaengine: at_hdmac: Check return code of dma_async_device_register dmaengine: at_hdmac: Fix impossible condition dmaengine: at_hdmac: Don't allow CPU to reorder channel enable dmaengine: at_hdmac: Fix completion of unissued descriptor in case of errors dmaengine: at_hdmac: Fix descriptor handling when issuing it to hardware dmaengine: at_hdmac: Fix concurrency over the active list dmaengine: at_hdmac: Free the memset buf without holding the chan lock dmaengine: at_hdmac: Fix concurrency over descriptor dmaengine: at_hdmac: Fix concurrency problems by removing atc_complete_all() dmaengine: at_hdmac: Protect atchan->status with the channel lock dmaengine: at_hdmac: Do not call the complete callback on device_terminate_all dmaengine: at_hdmac: Fix premature completion of desc in issue_pending dmaengine: at_hdmac: Start transfer for cyclic channels in issue_pending dmaengine: at_hdmac: Don't start transactions at tx_submit level dmaengine: at_hdmac: Fix at_lli struct definition dmaengine: stm32-dma: fix potential race between pause and resume dmaengine: ti: k3-udma-glue: fix memory leak when register device fail dmaengine: mv_xor_v2: Fix a resource leak in mv_xor_v2_remove() dmaengine: apple-admac: Fix grabbing of channels in of_xlate dmaengine: idxd: fix RO device state error after been disabled/reset ...
2022-11-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/can/pch_can.c ae64438be192 ("can: dev: fix skb drop check") 1dd1b521be85 ("can: remove obsolete PCH CAN driver") https://lore.kernel.org/all/20221110102509.1f7d63cc@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-10Merge tag 'net-6.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wifi, can and bpf. Current release - new code bugs: - can: af_can: can_exit(): add missing dev_remove_pack() of canxl_packet Previous releases - regressions: - bpf, sockmap: fix the sk->sk_forward_alloc warning - wifi: mac80211: fix general-protection-fault in ieee80211_subif_start_xmit() - can: af_can: fix NULL pointer dereference in can_rx_register() - can: dev: fix skb drop check, avoid o-o-b access - nfnetlink: fix potential dead lock in nfnetlink_rcv_msg() Previous releases - always broken: - bpf: fix wrong reg type conversion in release_reference() - gso: fix panic on frag_list with mixed head alloc types - wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker() - wifi: mac80211: set TWT Information Frame Disabled bit as 1 - eth: macsec offload related fixes, make sure to clear the keys from memory - tun: fix memory leaks in the use of napi_get_frags - tun: call napi_schedule_prep() to ensure we own a napi - tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent - ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network - tipc: fix a msg->req tlv length check - sctp: clear out_curr if all frag chunks of current msg are pruned, avoid list corruption - mctp: fix an error handling path in mctp_init(), avoid leaks" * tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) eth: sp7021: drop free_netdev() from spl2sw_init_netdev() MAINTAINERS: Move Vivien to CREDITS net: macvlan: fix memory leaks of macvlan_common_newlink ethernet: tundra: free irq when alloc ring failed in tsi108_open() net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open() ethernet: s2io: disable napi when start nic failed in s2io_card_up() net: atlantic: macsec: clear encryption keys from the stack net: phy: mscc: macsec: clear encryption keys when freeing a flow stmmac: dwmac-loongson: fix missing of_node_put() while module exiting stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe() stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open() mctp: Fix an error handling path in mctp_init() stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz net: cxgb3_main: disable napi when bind qsets failed in cxgb_up() net: cpsw: disable napi in cpsw_ndo_open() iavf: Fix VF driver counting VLAN 0 filters ice: Fix spurious interrupt during removal of trusted VF net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions net/mlx5e: E-Switch, Fix comparing termination table instance ...
2022-11-10KVM: Support dirty ring in conjunction with bitmapGavin Shan
ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is enabled. It's conflicting with that ring-based dirty page tracking always requires a running VCPU context. Introduce a new flavor of dirty ring that requires the use of both VCPU dirty rings and a dirty bitmap. The expectation is that for non-VCPU sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to the dirty bitmap. Userspace should scan the dirty bitmap before migrating the VM to the target. Use an additional capability to advertise this behavior. The newly added capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Peter Xu <peterx@redhat.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-4-gshan@redhat.com
2022-11-09KVM: x86: Add a VALID_MASK for the MSR exit reason flagsAaron Lewis
Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason flags. This simplifies checks that validate these flags, and makes it easier to introduce new flags in the future. No functional change intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220921151525.904162-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-09include/uapi/linux/swab: Fix potentially missing __always_inlineMatt Redfearn
Commit bc27fb68aaad ("include/uapi/linux/byteorder, swab: force inlining of some byteswap operations") added __always_inline to swab functions and commit 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to userspace headers") added a definition of __always_inline for use in exported headers when the kernel's compiler.h is not available. However, since swab.h does not include stddef.h, if the header soup does not indirectly include it, the definition of __always_inline is missing, resulting in a compilation failure, which was observed compiling the perf tool using exported headers containing this commit: In file included from /usr/include/linux/byteorder/little_endian.h:12:0, from /usr/include/asm/byteorder.h:14, from tools/include/uapi/linux/perf_event.h:20, from perf.h:8, from builtin-bench.c:18: /usr/include/linux/swab.h:160:8: error: unknown type name `__always_inline' static __always_inline __u16 __swab16p(const __u16 *p) Fix this by replacing the inclusion of linux/compiler.h with linux/stddef.h to ensure that we pick up that definition if required, without relying on it's indirect inclusion. compiler.h is then included indirectly, via stddef.h. Fixes: 283d75737837 ("uapi/linux/stddef.h: Provide __always_inline to userspace headers") Signed-off-by: Matt Redfearn <matt.redfearn@mips.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Petr Vaněk <arkamar@atlas.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-11-08Merge tag 'audit-pr-20221107' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A small audit patch to fix an instance of undefined behavior in a shift operator caused when shifting a signed value too far, the same case as the lsm patch merged previously. While the fix is trivial and I can't imagine it causing a problem in a backport, I'm not explicitly marking it for stable on the off chance that there is some system out there which is relying on some wonky unexpected behavior which this patch could break; *if* it does break, IMO it's better that to happen in a minor or -rcX release and not in a stable backport" * tag 'audit-pr-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix undefined behavior in bit shift for AUDIT_BIT
2022-11-08Merge tag 'lsm-pr-20221107' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "A small capability patch to fix an instance of undefined behavior in a shift operator caused when shifting a signed value too far. While the fix is trivial and I can't imagine it causing a problem in a backport, I'm not explicitly marking it for stable on the off chance that there is some system out there which is relying on some wonky unexpected behavior which this patch could break; *if* it does break, IMO it's better that to happen in a minor or -rcX release and not in a stable backport" * tag 'lsm-pr-20221107' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: capabilities: fix undefined behavior in bit shift for CAP_TO_MASK
2022-11-08net: sched: add helper support in act_ctXin Long
This patch is to add helper support in act_ct for OVS actions=ct(alg=xxx) offloading, which is corresponding to Commit cae3a2627520 ("openvswitch: Allow attaching helpers to ct action") in OVS kernel part. The difference is when adding TC actions family and proto cannot be got from the filter/match, other than helper name in tb[TCA_CT_HELPER_NAME], we also need to send the family in tb[TCA_CT_HELPER_FAMILY] and the proto in tb[TCA_CT_HELPER_PROTO] to kernel. Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-08ethtool: linkstate: add a statistic for PHY down eventsJakub Kicinski
The previous attempt to augment carrier_down (see Link) was not met with much enthusiasm so let's do the simple thing of exposing what some devices already maintain. Add a common ethtool statistic for link going down. Currently users have to maintain per-driver mapping to extract the right stat from the vendor-specific ethtool -S stats. carrier_down does not fit the bill because it counts a lot of software related false positives. Add the statistic to the extended link state API to steer vendors towards implementing all of it. Implement for bnxt and all Linux-controlled PHYs. mlx5 and (possibly) enic also have a counter for this but I leave the implementation to their maintainers. Link: https://lore.kernel.org/r/20220520004500.2250674-1-kuba@kernel.org Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20221104190125.684910-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-08Merge git://linuxtv.org/sailus/media_tree into media_stageMauro Carvalho Chehab
* git://linuxtv.org/sailus/media_tree: (47 commits) media: i2c: ov4689: code cleanup media: ov9650: Drop platform data code path media: ov7670: Drop unused include media: ov2640: Drop legacy includes media: tc358746: add Toshiba TC358746 Parallel to CSI-2 bridge driver media: dt-bindings: add bindings for Toshiba TC358746 phy: dphy: add support to calculate the timing based on hs_clk_rate phy: dphy: refactor get_default_config v4l: subdev: Warn if disabling streaming failed, return success dw9768: Enable low-power probe on ACPI media: i2c: imx290: Replace GAIN control with ANALOGUE_GAIN media: i2c: imx290: Add crop selection targets support media: i2c: imx290: Factor out format retrieval to separate function media: i2c: imx290: Move registers with fixed value to init array media: i2c: imx290: Create controls for fwnode properties media: i2c: imx290: Implement HBLANK and VBLANK controls media: i2c: imx290: Split control initialization to separate function media: i2c: imx290: Fix max gain value media: i2c: imx290: Add exposure time control media: i2c: imx290: Define more register macros ...
2022-11-06io_uring: fix typo in io_uring.h commentJens Axboe
Just a basic s/thig/this swap, fixing up a typo introduced by a commit added in the 6.1 release. Fixes: 9cda70f622cd ("io_uring: introduce fixed buffer support for io_uring_cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-05capabilities: fix undefined behavior in bit shift for CAP_TO_MASKGaosheng Cui
Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below: UBSAN: shift-out-of-bounds in security/commoncap.c:1252:2 left shift of 1 by 31 places cannot be represented in type 'int' Call Trace: <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c cap_task_prctl+0x561/0x6f0 security_task_prctl+0x5a/0xb0 __x64_sys_prctl+0x61/0x8f0 do_syscall_64+0x58/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Fixes: e338d263a76a ("Add 64-bit capability support to the kernel") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Reviewed-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-11-04media: aspeed: Support aspeed mode to reduce compressed dataJammy Huang
aspeed supports differential jpeg format which only compress the parts which are changed. In this way, it reduces both the amount of data to be transferred by network and those to be decoded on the client side. 2 new ctrls are added: * Aspeed HQ Mode: to control aspeed's high quality(2-pass) compression mode This only works with yuv444 subsampling. * Aspeed HQ Quality: to control the quality of aspeed's HQ mode only useful if Aspeed HQ mode is enabled Aspeed JPEG Format requires an additional buffer, called bcd, to store the information about which macro block in the new frame is different from the previous one. To have bcd correctly working, we need to swap the buffers for src0/1 to make src1 refer to previous frame and src0 to the coming new frame. Signed-off-by: Jammy Huang <jammy_huang@aspeedtech.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> [hverkuil: fix logging dma_addr_t, use %pad for that]
2022-11-04media: v4l2-ctrls: Reserve controls for ASPEEDJammy Huang
Reserve controls for ASPEED video family. Aspeed video engine contains a few features which improve video quality, reduce amount of compressed data, and etc. Hence, 16 controls are reserved for these aspeed proprietary features. Signed-off-by: Jammy Huang <jammy_huang@aspeedtech.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2022-11-04media: v4l: Add definition for the Aspeed JPEG formatJammy Huang
This introduces support for the Aspeed JPEG format, where the new frame can refer to previous frame to reduce the amount of compressed data. The concept is similar to I/P frame of video compression. It will compare the new frame with previous one to decide which macroblock's data is changed, and only the changed macroblocks will be compressed. This Aspeed JPEG format is used by the video engine on Aspeed platforms, which is generally adapted for remote KVM. Signed-off-by: Jammy Huang <jammy_huang@aspeedtech.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2022-11-03net: expose devlink port over rtnetlinkJiri Pirko
Expose devlink port handle related to netdev over rtnetlink. Introduce a new nested IFLA attribute to carry the info. Call into devlink code to fill-up the nest with existing devlink attributes that are used over devlink netlink. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03bridge: Add MAC Authentication Bypass (MAB) supportHans J. Schultz
Hosts that support 802.1X authentication are able to authenticate themselves by exchanging EAPOL frames with an authenticator (Ethernet bridge, in this case) and an authentication server. Access to the network is only granted by the authenticator to successfully authenticated hosts. The above is implemented in the bridge using the "locked" bridge port option. When enabled, link-local frames (e.g., EAPOL) can be locally received by the bridge, but all other frames are dropped unless the host is authenticated. That is, unless the user space control plane installed an FDB entry according to which the source address of the frame is located behind the locked ingress port. The entry can be dynamic, in which case learning needs to be enabled so that the entry will be refreshed by incoming traffic. There are deployments in which not all the devices connected to the authenticator (the bridge) support 802.1X. Such devices can include printers and cameras. One option to support such deployments is to unlock the bridge ports connecting these devices, but a slightly more secure option is to use MAB. When MAB is enabled, the MAC address of the connected device is used as the user name and password for the authentication. For MAB to work, the user space control plane needs to be notified about MAC addresses that are trying to gain access so that they will be compared against an allow list. This can be implemented via the regular learning process with the sole difference that learned FDB entries are installed with a new "locked" flag indicating that the entry cannot be used to authenticate the device. The flag cannot be set by user space, but user space can clear the flag by replacing the entry, thereby authenticating the device. Locked FDB entries implement the following semantics with regards to roaming, aging and forwarding: 1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports, in which case the "locked" flag is cleared. FDB entries cannot roam to locked ports regardless of MAB being enabled or not. Therefore, locked FDB entries are only created if an FDB entry with the given {MAC, VID} does not already exist. This behavior prevents unauthenticated devices from disrupting traffic destined to already authenticated devices. 2. Aging: Locked FDB entries age and refresh by incoming traffic like regular entries. 3. Forwarding: Locked FDB entries forward traffic like regular entries. If user space detects an unauthorized MAC behind a locked port and wishes to prevent traffic with this MAC DA from reaching the host, it can do so using tc or a different mechanism. Enable the above behavior using a new bridge port option called "mab". It can only be enabled on a bridge port that is both locked and has learning enabled. Locked FDB entries are flushed from the port once MAB is disabled. A new option is added because there are pure 802.1X deployments that are not interested in notifications about locked FDB entries. Signed-off-by: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-11-04 We've added 8 non-merge commits during the last 3 day(s) which contain a total of 10 files changed, 113 insertions(+), 16 deletions(-). The main changes are: 1) Fix memory leak upon allocation failure in BPF verifier's stack state tracking, from Kees Cook. 2) Fix address leakage when BPF progs release reference to an object, from Youlin Li. 3) Fix BPF CI breakage from buggy in.h uapi header dependency, from Andrii Nakryiko. 4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui. 5) Fix BPF sockmap lockdep warning by cancelling psock work outside of socket lock, from Cong Wang. 6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting, from Wang Yufen. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add verifier test for release_reference() bpf: Fix wrong reg type conversion in release_reference() bpf, sock_map: Move cancel_work_sync() out of sock lock tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI net/ipv4: Fix linux/in.h header dependencies bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues bpf, verifier: Fix memory leak in array reallocation for stack state ==================== Link: https://lore.kernel.org/r/20221104000445.30761-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-03net: dcb: add new apptrust attributeDaniel Machon
Add new apptrust extension attributes to the 8021Qaz APP managed object. Two new attributes, DCB_ATTR_DCB_APP_TRUST_TABLE and DCB_ATTR_DCB_APP_TRUST, has been added. Trusted selectors are passed in the nested attribute DCB_ATTR_DCB_APP_TRUST, in order of precedence. The new attributes are meant to allow drivers, whose hw supports the notion of trust, to be able to set whether a particular app selector is trusted - and in which order. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-03net: dcb: add new pcp selector to app objectDaniel Machon
Add new PCP selector for the 8021Qaz APP managed object. As the PCP selector is not part of the 8021Qaz standard, a new non-std extension attribute DCB_ATTR_DCB_APP has been introduced. Also two helper functions to translate between selector and app attribute type has been added. The new selector has been given a value of 255, to minimize the risk of future overlap of std- and non-std attributes. The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the app table. This means that the dcb_app struct can now both contain std- and non-std app attributes. Currently there is no overlap between the selector values of the two attributes. The purpose of adding the PCP selector, is to be able to offload PCP-based queue classification to the 8021Q Priority Code Point table, see 6.9.3 of IEEE Std 802.1Q-2018. PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}. Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-11-03net/ipv4: Fix linux/in.h header dependenciesAndrii Nakryiko
__DECLARE_FLEX_ARRAY is defined in include/uapi/linux/stddef.h but doesn't seem to be explicitly included from include/uapi/linux/in.h, which breaks BPF selftests builds (once we sync linux/stddef.h into tools/include directory in the next patch). Fix this by explicitly including linux/stddef.h. Given this affects BPF CI and bpf tree, targeting this for bpf tree. Fixes: 5854a09b4957 ("net/ipv4: Use __DECLARE_FLEX_ARRAY() helper") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/20221102182517.2675301-1-andrii@kernel.org
2022-11-03serial: Convert serial_rs485 to kernel docIlpo Järvinen
Convert struct serial_rs485 comments to kernel doc format and include it into documentation. Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20221019093343.9546-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-11-02Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-11-02 We've added 70 non-merge commits during the last 14 day(s) which contain a total of 96 files changed, 3203 insertions(+), 640 deletions(-). The main changes are: 1) Make cgroup local storage available to non-cgroup attached BPF programs such as tc BPF ones, from Yonghong Song. 2) Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers, from Martin KaFai Lau. 3) Add LLVM disassembler as default library for dumping JITed code in bpftool, from Quentin Monnet. 4) Various kprobe_multi_link fixes related to kernel modules, from Jiri Olsa. 5) Optimize x86-64 JIT with emitting BMI2-based shift instructions, from Jie Meng. 6) Improve BPF verifier's memory type compatibility for map key/value arguments, from Dave Marchevsky. 7) Only create mmap-able data section maps in libbpf when data is exposed via skeletons, from Andrii Nakryiko. 8) Add an autoattach option for bpftool to load all object assets, from Wang Yufen. 9) Various memory handling fixes for libbpf and BPF selftests, from Xu Kuohai. 10) Initial support for BPF selftest's vmtest.sh on arm64, from Manu Bretelle. 11) Improve libbpf's BTF handling to dedup identical structs, from Alan Maguire. 12) Add BPF CI and denylist documentation for BPF selftests, from Daniel Müller. 13) Check BPF cpumap max_entries before doing allocation work, from Florian Lehner. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits) samples/bpf: Fix typo in README bpf: Remove the obsolte u64_stats_fetch_*_irq() users. bpf: check max_entries before allocating memory bpf: Fix a typo in comment for DFS algorithm bpftool: Fix spelling mistake "disasembler" -> "disassembler" selftests/bpf: Fix bpftool synctypes checking failure selftests/bpf: Panic on hard/soft lockup docs/bpf: Add documentation for new cgroup local storage selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x selftests/bpf: Add selftests for new cgroup local storage selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str bpftool: Support new cgroup local storage libbpf: Support new cgroup local storage bpf: Implement cgroup storage available to non-cgroup-attached bpf progs bpf: Refactor some inode/task/sk storage functions for reuse bpf: Make struct cgroup btf id global selftests/bpf: Tracing prog can still do lookup under busy lock selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection bpf: Add new bpf_task_storage_delete proto with no deadlock detection bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check ... ==================== Link: https://lore.kernel.org/r/20221102062120.5724-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-31audit: fix undefined behavior in bit shift for AUDIT_BITGaosheng Cui
Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below: UBSAN: shift-out-of-bounds in kernel/auditfilter.c:179:23 left shift of 1 by 31 places cannot be represented in type 'int' Call Trace: <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c audit_register_class+0x9d/0x137 audit_classes_init+0x4d/0xb8 do_one_initcall+0x76/0x430 kernel_init_freeable+0x3b3/0x422 kernel_init+0x24/0x1e0 ret_from_fork+0x1f/0x30 </TASK> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> [PM: remove bad 'Fixes' tag as issue predates git, added in v2.6.6-rc1] Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-10-31netlink: split up copies in the ack constructionJakub Kicinski
Clean up the use of unsafe_memcpy() by adding a flexible array at the end of netlink message header and splitting up the header and data copies. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-31Merge 6.1-rc3 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-10-30Merge tag 'perf_urgent_for_v6.1_rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Rename a perf memory level event define to denote it is of CXL type - Add Alder and Raptor Lakes support to RAPL - Make sure raw sample data is output with tracepoints * tag 'perf_urgent_for_v6.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXL perf/x86/rapl: Add support for Intel Raptor Lake perf/x86/rapl: Add support for Intel AlderLake-N perf: Fix missing raw data on tracepoint events
2022-10-28net/packet: add PACKET_FANOUT_FLAG_IGNORE_OUTGOINGWillem de Bruijn
Extend packet socket option PACKET_IGNORE_OUTGOING to fanout groups. The socket option sets ptype.ignore_outgoing, which makes dev_queue_xmit_nit skip the socket. When the socket joins a fanout group, the option is not reflected in the struct ptype of the group. dev_queue_xmit_nit only tests the fanout ptype, so the flag is ignored once a socket joins a fanout group. Inheriting the option from a socket would change established behavior. Different sockets in the group can set different flags, and can also change them at runtime. Testing in packet_rcv_fanout defeats the purpose of the original patch, which is to avoid skb_clone in dev_queue_xmit_nit (esp. for MSG_ZEROCOPY packets). Instead, introduce a new fanout group flag with the same behavior. Tested with https://github.com/wdebruij/kerneltools/blob/master/tests/test_psock_fanout_ignore_outgoing.c Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221027211014.3581513-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28Kalle Valo says:Jakub Kicinski
==================== pull-request: wireless-next-2022-10-28 First set of patches v6.2. mac80211 refactoring continues for Wi-Fi 7. All mac80211 driver are now converted to use internal TX queues, this might cause some regressions so we wanted to do this early in the cycle. Note: wireless tree was merged[1] to wireless-next to avoid some conflicts with mac80211 patches between the trees. Unfortunately there are still two smaller conflicts in net/mac80211/util.c which Stephen also reported[2]. In the first conflict initialise scratch_len to "params->scratch_len ?: 3 * params->len" (note number 3, not 2!) and in the second conflict take the version which uses elems->scratch_pos. [1] https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/commit/?id=dfd2d876b3fda1790bc0239ba4c6967e25d16e91 [2] https://lore.kernel.org/all/20221020032340.5cf101c0@canb.auug.org.au/ mac80211 - preparation for Wi-Fi 7 Multi-Link Operation (MLO) continues - add API to show the link STAs in debugfs - all mac80211 drivers are now using mac80211 internal TX queues (iTXQs) rtw89 - support 8852BE rtl8xxxu - support RTL8188FU brmfmac - support two station interfaces concurrently bcma - support SPROM rev 11 ==================== Link: https://lore.kernel.org/r/20221028132943.304ECC433B5@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28tcp: add rcv_wnd and plb_rehash to TCP_INFOMubashir Adnan Qureshi
rcv_wnd can be useful to diagnose TCP performance where receiver window becomes the bottleneck. rehash reports the PLB and timeout triggered rehash attempts by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28tcp: add u32 counter in tcp_sock and an SNMP counter for PLBMubashir Adnan Qureshi
A u32 counter is added to tcp_sock for counting the number of PLB triggered rehashes for a TCP connection. An SNMP counter is also added to count overall PLB triggered rehash events for a host. These counters are hooked up to PLB implementation for DCTCP. TCP_NLA_REHASH is added to SCM_TIMESTAMPING_OPT_STATS that reports the rehash attempts triggered due to PLB or timeouts. This gives a historical view of sustained congestion or timeouts experienced by the TCP connection. Signed-off-by: Mubashir Adnan Qureshi <mubashirq@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-28crypto: af_alg - Support symmetric encryption via keyring keysFrederick Lawler
We want to leverage keyring to store sensitive keys, and then use those keys for symmetric encryption via the crypto API. Among the key types we wish to support are: user, logon, encrypted, and trusted. User key types are already able to have their data copied to user space, but logon does not support this. Further, trusted and encrypted keys will return their encrypted data back to user space on read, which does not make them ideal for symmetric encryption. To support symmetric encryption for these key types, add a new ALG_SET_KEY_BY_KEY_SERIAL setsockopt() option to the crypto API. This allows users to pass a key_serial_t to the crypto API to perform symmetric encryption. The behavior is the same as ALG_SET_KEY, but the crypto key data is copied in kernel space from a keyring key, which allows for the support of logon, encrypted, and trusted key types. Keyring keys must have the KEY_(POS|USR|GRP|OTH)_SEARCH permission set to leverage this feature. This follows the asymmetric_key type where key lookup calls eventually lead to keyring_search_rcu() without the KEYRING_SEARCH_NO_CHECK_PERM flag set. Signed-off-by: Frederick Lawler <fred@cloudflare.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-nextJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Move struct nft_payload_set definition to .c file where it is only used. 2) Shrink transport and inner header offset fields in the nft_pktinfo structure to 16-bits, from Florian Westphal. 3) Get rid of nft_objref Kbuild toggle, make it built-in into nf_tables. This expression is used to instantiate conntrack helpers in nftables. After removing the conntrack helper auto-assignment toggle it this feature became more important so move it to the nf_tables core module. Also from Florian. 4) Extend the existing function to calculate payload inner header offset to deal with the GRE and IPIP transport protocols. 6) Add inner expression support for nf_tables. This new expression provides a packet parser for tunneled packets which uses a userspace description of the expected inner headers. The inner expression invokes the payload expression (via direct call) to match on the inner header protocol fields using the inner link, network and transport header offsets. An example of the bytecode generated from userspace to match on IP source encapsulated in a VxLAN packet: # nft --debug=netlink add rule netdev x y udp dport 4789 vxlan ip saddr 1.2.3.4 netdev x y [ meta load l4proto => reg 1 ] [ cmp eq reg 1 0x00000011 ] [ payload load 2b @ transport header + 2 => reg 1 ] [ cmp eq reg 1 0x0000b512 ] [ inner type vxlan hdrsize 8 flags f [ meta load protocol => reg 1 ] ] [ cmp eq reg 1 0x00000008 ] [ inner type vxlan hdrsize 8 flags f [ payload load 4b @ network header + 12 => reg 1 ] ] [ cmp eq reg 1 0x04030201 ] 7) Store inner link, network and transport header offsets in percpu area to parse inner packet header once only. Matching on a different tunnel type invalidates existing offsets in the percpu area and it invokes the inner tunnel parser again. 8) Add support for inner meta matching. This support for NFTA_META_PROTOCOL, which specifies the inner ethertype, and NFT_META_L4PROTO, which specifies the inner transport protocol. 9) Extend nft_inner to parse GENEVE optional fields to calculate the link layer offset. 10) Update inner expression so tunnel offset points to GRE header to normalize tunnel header handling. This also allows to perform different interpretations of the GRE header from userspace. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nft_inner: set tunnel offset to GRE header offset netfilter: nft_inner: add geneve support netfilter: nft_meta: add inner match support netfilter: nft_inner: add percpu inner context netfilter: nft_inner: support for inner tunnel header matching netfilter: nft_payload: access ipip payload for inner offset netfilter: nft_payload: access GRE payload via inner offset netfilter: nft_objref: make it builtin netfilter: nf_tables: reduce nft_pktinfo by 8 bytes netfilter: nft_payload: move struct nft_payload_set definition where it belongs ==================== Link: https://lore.kernel.org/r/20221026132227.3287-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/can/usb/kvaser_usb/kvaser_usb_leaf.c 2871edb32f46 ("can: kvaser_usb: Fix possible completions during init_completion") abb8670938b2 ("can: kvaser_usb_leaf: Ignore stale bus-off after start") 8d21f5927ae6 ("can: kvaser_usb_leaf: Fix improved state not being reported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-27PCI: Add PCI_PTM_CAP_RES macroVidya Sagar
Add macro defining Responder capable bit in Precision Time Measurement capability register. Link: https://lore.kernel.org/r/20220919143340.4527-2-vidyas@nvidia.com Signed-off-by: Vidya Sagar <vidyas@nvidia.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Jingoo Han <jingoohan1@gmail.com>
2022-10-27perf/mem: Rename PERF_MEM_LVLNUM_EXTN_MEM to PERF_MEM_LVLNUM_CXLRavi Bangoria
PERF_MEM_LVLNUM_EXTN_MEM was introduced to cover CXL devices but it's bit ambiguous name and also not generic enough to cover cxl.cache and cxl.io devices. Rename it to PERF_MEM_LVLNUM_CXL to be more specific. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/f6268268-b4e9-9ed6-0453-65792644d953@amd.com
2022-10-25bpf: Implement cgroup storage available to non-cgroup-attached bpf progsYonghong Song
Similar to sk/inode/task storage, implement similar cgroup local storage. There already exists a local storage implementation for cgroup-attached bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper bpf_get_local_storage(). But there are use cases such that non-cgroup attached bpf progs wants to access cgroup local storage data. For example, tc egress prog has access to sk and cgroup. It is possible to use sk local storage to emulate cgroup local storage by storing data in socket. But this is a waste as it could be lots of sockets belonging to a particular cgroup. Alternatively, a separate map can be created with cgroup id as the key. But this will introduce additional overhead to manipulate the new map. A cgroup local storage, similar to existing sk/inode/task storage, should help for this use case. The life-cycle of storage is managed with the life-cycle of the cgroup struct. i.e. the storage is destroyed along with the owning cgroup with a call to bpf_cgrp_storage_free() when cgroup itself is deleted. The userspace map operations can be done by using a cgroup fd as a key passed to the lookup, update and delete operations. Typically, the following code is used to get the current cgroup: struct task_struct *task = bpf_get_current_task_btf(); ... task->cgroups->dfl_cgrp ... and in structure task_struct definition: struct task_struct { .... struct css_set __rcu *cgroups; .... } With sleepable program, accessing task->cgroups is not protected by rcu_read_lock. So the current implementation only supports non-sleepable program and supporting sleepable program will be the next step together with adding rcu_read_lock protection for rcu tagged structures. Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used for cgroup storage available to non-cgroup-attached bpf programs. The old cgroup storage supports bpf_get_local_storage() helper to get the cgroup data. The new cgroup storage helper bpf_cgrp_storage_get() can provide similar functionality. While old cgroup storage pre-allocates storage memory, the new mechanism can also pre-allocate with a user space bpf_map_update_elem() call to avoid potential run-time memory allocation failure. Therefore, the new cgroup storage can provide all functionality w.r.t. the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can be deprecated since the new one can provide the same functionality. Acked-by: David Vernet <void@manifault.com> Signed-off-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>