<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/arm64/lib/csum.c, branch linux-6.2.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.2.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2020-04-15T20:36:41Z</updated>
<entry>
<title>arm64: csum: Disable KASAN for do_csum()</title>
<updated>2020-04-15T20:36:41Z</updated>
<author>
<name>Will Deacon</name>
<email>will@kernel.org</email>
</author>
<published>2020-04-14T21:22:47Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c6a771d932332568df9f46a3b53507c578e8c8e8'/>
<id>urn:sha1:c6a771d932332568df9f46a3b53507c578e8c8e8</id>
<content type='text'>
do_csum() over-reads the source buffer and therefore abuses
READ_ONCE_NOCHECK() to avoid tripping up KASAN. In preparation for
READ_ONCE_NOCHECK() becoming a macro, and therefore losing its
'__no_sanitize_address' annotation, just annotate do_csum() explicitly
and fall back to normal loads.

Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: csum: Optimise IPv6 header checksum</title>
<updated>2020-03-09T18:08:25Z</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2020-01-20T18:52:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e9c7ddbf8b4b6a291bf3b5bfa7c883235164d9be'/>
<id>urn:sha1:e9c7ddbf8b4b6a291bf3b5bfa7c883235164d9be</id>
<content type='text'>
Throwing our __uint128_t idioms at csum_ipv6_magic() makes it
about 1.3x-2x faster across a range of microarchitecture/compiler
combinations. Not much in absolute terms, but every little helps.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>arm64: csum: Fix pathological zero-length calls</title>
<updated>2020-01-17T16:05:50Z</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2020-01-17T15:48:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c2c24edb1d9c308011f5a1328563d8da8c92c849'/>
<id>urn:sha1:c2c24edb1d9c308011f5a1328563d8da8c92c849</id>
<content type='text'>
In validating the checksumming results of the new routine, I sadly
neglected to test its not-checksumming results. Thus it slipped through
that the one case where @buff is already dword-aligned and @len = 0
manages to defeat the tail-masking logic and behave as if @len = 8.
For a zero length it doesn't make much sense to deference @buff anyway,
so just add an early return (which has essentially zero impact on
performance).

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>arm64: Implement optimised checksum routine</title>
<updated>2020-01-16T15:23:29Z</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2020-01-15T16:42:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5777eaed566a1d63e344d3dd8f2b5e33be20643e'/>
<id>urn:sha1:5777eaed566a1d63e344d3dd8f2b5e33be20643e</id>
<content type='text'>
Apparently there exist certain workloads which rely heavily on software
checksumming, for which the generic do_csum() implementation becomes a
significant bottleneck. Therefore let's give arm64 its own optimised
version - for ease of maintenance this foregoes assembly or intrisics,
and is thus not actually arm64-specific, but does rely heavily on C
idioms that translate well to the A64 ISA and the typical load/store
capabilities of most ARMv8 CPU cores.

The resulting increase in checksum throughput scales nicely with buffer
size, tending towards 4x for a small in-order core (Cortex-A53), and up
to 6x or more for an aggressive big core (Ampere eMAG).

Reported-by: Lingyan Huang &lt;huanglingyan2@huawei.com&gt;
Tested-by: Lingyan Huang &lt;huanglingyan2@huawei.com&gt;
Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
</feed>
