<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/arm64/net, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2019-07-03T11:13:43Z</updated>
<entry>
<title>bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd</title>
<updated>2019-07-03T11:13:43Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-04-26T19:48:22Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=272ca3913c8eaaa92086df6f78f108f2546e277d'/>
<id>urn:sha1:272ca3913c8eaaa92086df6f78f108f2546e277d</id>
<content type='text'>
commit 34b8ab091f9ef57a2bb3c8c8359a0a03a8abf2f9 upstream.

Since ARMv8.1 supplement introduced LSE atomic instructions back in 2016,
lets add support for STADD and use that in favor of LDXR / STXR loop for
the XADD mapping if available. STADD is encoded as an alias for LDADD with
XZR as the destination register, therefore add LDADD to the instruction
encoder along with STADD as special case and use it in the JIT for CPUs
that advertise LSE atomics in CPUID register. If immediate offset in the
BPF XADD insn is 0, then use dst register directly instead of temporary
one.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jean-Philippe Brucker &lt;jean-philippe.brucker@arm.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>bpf, arm64: remove prefetch insn in xadd mapping</title>
<updated>2019-05-22T05:39:49Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-04-26T19:48:21Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f413abccd1fee409b82eb3dd693426282041558a'/>
<id>urn:sha1:f413abccd1fee409b82eb3dd693426282041558a</id>
<content type='text'>
commit 8968c67a82ab7501bc3b9439c3624a49b42fe54c upstream.

Prefetch-with-intent-to-write is currently part of the XADD mapping in
the AArch64 JIT and follows the kernel's implementation of atomic_add.
This may interfere with other threads executing the LDXR/STXR loop,
leading to potential starvation and fairness issues. Drop the optional
prefetch instruction.

Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD")
Reported-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Jean-Philippe Brucker &lt;jean-philippe.brucker@arm.com&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>arm64: bpf: implement jitting of JMP32</title>
<updated>2019-01-26T21:33:02Z</updated>
<author>
<name>Jiong Wang</name>
<email>jiong.wang@netronome.com</email>
</author>
<published>2019-01-26T17:26:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=654b65a04880454efb7d3287f011f2f790963b32'/>
<id>urn:sha1:654b65a04880454efb7d3287f011f2f790963b32</id>
<content type='text'>
This patch implements code-gen for new JMP32 instructions on arm64.

Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Zi Shen Lim &lt;zlim.lnx@gmail.com&gt;

Signed-off-by: Jiong Wang &lt;jiong.wang@netronome.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf: arm64: Enable arm64 jit to provide bpf_line_info</title>
<updated>2018-12-12T01:16:56Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2018-12-12T00:02:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=37ab566c178d79807b2d20b4c999133780355c6c'/>
<id>urn:sha1:37ab566c178d79807b2d20b4c999133780355c6c</id>
<content type='text'>
This patch enables arm64's bpf_int_jit_compile() to provide
bpf_line_info by calling bpf_prog_fill_jited_linfo().

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>arm64/bpf: don't allocate BPF JIT programs in module memory</title>
<updated>2018-12-05T15:36:28Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2018-11-23T22:18:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=91fc957c9b1d6c55168df15a397cbd1af16bc00f'/>
<id>urn:sha1:91fc957c9b1d6c55168df15a397cbd1af16bc00f</id>
<content type='text'>
The arm64 module region is a 128 MB region that is kept close to
the core kernel, in order to ensure that relative branches are
always in range. So using the same region for programs that do
not have this restriction is wasteful, and preferably avoided.

Now that the core BPF JIT code permits the alloc/free routines to
be overridden, implement them by vmalloc()/vfree() calls from a
dedicated 128 MB region set aside for BPF programs. This ensures
that BPF programs are still in branching range of each other, which
is something the JIT currently depends upon (and is not guaranteed
when using module_alloc() on KASLR kernels like we do currently).
It also ensures that placement of BPF programs does not correlate
with the placement of the core kernel or modules, making it less
likely that leaking the former will reveal the latter.

This also solves an issue under KASAN, where shadow memory is
needlessly allocated for all BPF programs (which don't require KASAN
shadow pages since they are not KASAN instrumented)

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>arm64/bpf: use movn/movk/movk sequence to generate kernel addresses</title>
<updated>2018-11-30T09:23:25Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2018-11-23T17:29:02Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=cc2b8ed1369592fb84609e920f99a5659a6445f7'/>
<id>urn:sha1:cc2b8ed1369592fb84609e920f99a5659a6445f7</id>
<content type='text'>
On arm64, all executable code is guaranteed to reside in the vmalloc
space (or the module space), and so jump targets will only use 48
bits at most, and the remaining bits are guaranteed to be 0x1.

This means we can generate an immediate jump address using a sequence
of one MOVN (move wide negated) and two MOVK instructions, where the
first one sets the lower 16 bits but also sets all top bits to 0x1.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Acked-by: Will Deacon &lt;will.deacon@arm.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
</entry>
<entry>
<title>bpf, arm64: fix getting subprog addr from aux for calls</title>
<updated>2018-11-27T01:34:24Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-11-26T13:05:39Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8c11ea5ce13da0252fc92f91e90b0cb0c8fe5619'/>
<id>urn:sha1:8c11ea5ce13da0252fc92f91e90b0cb0c8fe5619</id>
<content type='text'>
The arm64 JIT has the same issue as ppc64 JIT in that the relative BPF
to BPF call offset can be too far away from core kernel in that relative
encoding into imm is not sufficient and could potentially be truncated,
see also fd045f6cd98e ("arm64: add support for module PLTs") which adds
spill-over space for module_alloc() and therefore bpf_jit_binary_alloc().
Therefore, use the recently added bpf_jit_get_func_addr() helper for
properly fetching the address through prog-&gt;aux-&gt;func[off]-&gt;bpf_func
instead. This also has the benefit to optimize normal helper calls since
their address can use the optimized emission. Tested on Cavium ThunderX
CN8890.

Fixes: db496944fdaa ("bpf: arm64: add JIT support for multi-function programs")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, arm64: save 4 bytes in prologue when ebpf insns came from cbpf</title>
<updated>2018-05-15T02:11:45Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-05-14T21:22:33Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=56ea6a8b4949c66d550d05e92da8ef4351b90027'/>
<id>urn:sha1:56ea6a8b4949c66d550d05e92da8ef4351b90027</id>
<content type='text'>
We can trivially save 4 bytes in prologue for cBPF since tail calls
can never be used from there. The register push/pop is pairwise,
here, x25 (fp) and x26 (tcc), so no point in changing that, only
reset to zero is not needed.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, arm64: optimize 32/64 immediate emission</title>
<updated>2018-05-15T02:11:45Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-05-14T21:22:32Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6d2eea6fb0791bf191043a68937c9aa59c5ea678'/>
<id>urn:sha1:6d2eea6fb0791bf191043a68937c9aa59c5ea678</id>
<content type='text'>
Improve the JIT to emit 64 and 32 bit immediates, the current
algorithm is not optimal and we often emit more instructions
than actually needed. arm64 has movz, movn, movk variants but
for the current 64 bit immediates we only use movz with a
series of movk when needed.

For example loading ffffffffffffabab emits the following 4
instructions in the JIT today:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: ffff, shift: 16, result: 00000000ffffabab
  * movk: ffff, shift: 32, result: 0000ffffffffabab
  * movk: ffff, shift: 48, result: ffffffffffffabab

Whereas after the patch the same load only needs a single
instruction:

  * movn: 5454, shift:  0, result: ffffffffffffabab

Another example where two extra instructions can be saved:

  * movz: abab, shift:  0, result: 000000000000abab
  * movk: 1f2f, shift: 16, result: 000000001f2fabab
  * movk: ffff, shift: 32, result: 0000ffff1f2fabab
  * movk: ffff, shift: 48, result: ffffffff1f2fabab

After the patch:

  * movn: e0d0, shift: 16, result: ffffffff1f2fffff
  * movk: abab, shift:  0, result: ffffffff1f2fabab

Another example with movz, before:

  * movz: 0000, shift:  0, result: 0000000000000000
  * movk: fea0, shift: 32, result: 0000fea000000000

After:

  * movz: fea0, shift: 32, result: 0000fea000000000

Moreover, reuse emit_a64_mov_i() for 32 bit immediates that
are loaded via emit_a64_mov_i64() which is a similar optimization
as done in 6fe8b9c1f41d ("bpf, x64: save several bytes by using
mov over movabsq when possible"). On arm64, the latter allows to
use a single instruction with movn due to zero extension where
otherwise two would be needed. And last but not least add a
missing optimization in emit_a64_mov_i() where movn is used but
the subsequent movk not needed. With some of the Cilium programs
in use, this shrinks the needed instructions by about three
percent. Tested on Cavium ThunderX CN8890.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
<entry>
<title>bpf, arm64: save 4 bytes of unneeded stack space</title>
<updated>2018-05-15T02:11:45Z</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2018-05-14T21:22:31Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=09ece3d0f2ee524b8d1d3c775d9eeb50c40558d8'/>
<id>urn:sha1:09ece3d0f2ee524b8d1d3c775d9eeb50c40558d8</id>
<content type='text'>
Follow-up to 816d9ef32a8b ("bpf, arm64: remove ld_abs/ld_ind") in
that the extra 4 byte JIT scratchpad is not needed anymore since it
was in ld_abs/ld_ind as stack buffer for bpf_load_pointer().

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
</entry>
</feed>
