<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/samples/bpf/map_perf_test_kern.c, branch linux-5.1.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-5.1.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2017-09-21T18:59:16Z</updated>
<entry>
<title>samples/bpf: Use getppid instead of getpgrp for array map stress</title>
<updated>2017-09-21T18:59:16Z</updated>
<author>
<name>Joel Fernandes</name>
<email>joelaf@google.com</email>
</author>
<published>2017-09-20T16:11:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=95ec66968571bf0af0a22effdc1b9d9e62ea6630'/>
<id>urn:sha1:95ec66968571bf0af0a22effdc1b9d9e62ea6630</id>
<content type='text'>
When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
that, just use a different syscall (getppid) for the array map stress test.

Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: Joel Fernandes &lt;joelaf@google.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Add lru_hash_lookup performance test</title>
<updated>2017-09-01T16:57:38Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2017-09-01T06:27:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=637cd8c312d8caf234821fd37238b8f956d9ab13'/>
<id>urn:sha1:637cd8c312d8caf234821fd37238b8f956d9ab13</id>
<content type='text'>
Create a new case to test the LRU lookup performance.

At the beginning, the LRU map is fully loaded (i.e. the number of keys
is equal to map-&gt;max_entries).   The lookup is done through key 0
to num_map_entries and then repeats from 0 again.

This patch also creates an anonymous struct to properly
name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Allow numa selection in INNER_LRU_HASH_PREALLOC test of map_perf_test</title>
<updated>2017-08-20T04:35:43Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2017-08-18T18:28:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ad17d0e6c708805bf9e6686eb747cc528b702e67'/>
<id>urn:sha1:ad17d0e6c708805bf9e6686eb747cc528b702e67</id>
<content type='text'>
This patch makes the needed changes to allow each process of
the INNER_LRU_HASH_PREALLOC test to provide its numa node id
when creating the lru map.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Alexei Starovoitov &lt;ast@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: lru: Add map-in-map LRU example</title>
<updated>2017-04-17T17:55:52Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2017-04-14T17:30:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3a5795b83d578cc542a92c94399946258cf1a2af'/>
<id>urn:sha1:3a5795b83d578cc542a92c94399946258cf1a2af</id>
<content type='text'>
This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.

It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.

BPF_F_NO_COMMON_LRU:
&gt; map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)

map-in-map LRU:
&gt; map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)

Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map.  8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: lru: Refactor LRU map tests in map_perf_test</title>
<updated>2017-04-17T17:55:52Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2017-04-14T17:30:27Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bf8db5d243a103ccd3f6d82a110e2302608e248c'/>
<id>urn:sha1:bf8db5d243a103ccd3f6d82a110e2302608e248c</id>
<content type='text'>
One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.

One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>samples/bpf: add map_lookup microbenchmark</title>
<updated>2017-03-17T03:44:12Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2017-03-16T01:26:44Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=95ff141e52f84f476fcde50560f42d4f118539c0'/>
<id>urn:sha1:95ff141e52f84f476fcde50560f42d4f118539c0</id>
<content type='text'>
$ map_perf_test 128
speed of HASH bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	46M		58M
after	42M		74M

perf report
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
functions, yet they take sizeable amount of cpu time.
htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
htab_map_lookup_elem() into three BPF insns which causing cpu time
for bpf_prog_da4fc6a3f41761a2() slightly increase.

$ map_perf_test 256
speed of ARRAY bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	97M		174M
after	64M		280M

before:
    37.33%  map_perf_test  [kernel.kallsyms]  [k] array_map_lookup_elem
    13.95%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     6.54%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     4.57%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    32.86%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     6.54%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

array_map_gen_lookup() removes calls to array_map_lookup_elem()
and bpf_map_lookup_elem() and replaces them with 7 bpf insns.

The performance without JIT is slower, since executing extra insns
in the interpreter is slower than running native C code,
but with JIT the performance gains are obvious,
since native C-&gt;x86 code is replaced with fewer bpf-&gt;x86 instructions.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>samples/bpf: add lpm-trie benchmark</title>
<updated>2017-01-23T21:10:38Z</updated>
<author>
<name>David Herrmann</name>
<email>dh.herrmann@gmail.com</email>
</author>
<published>2017-01-21T16:26:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=b8a943e2942296aad37a8e7adc43db493413e54b'/>
<id>urn:sha1:b8a943e2942296aad37a8e7adc43db493413e54b</id>
<content type='text'>
Extend the map_perf_test_{user,kern}.c infrastructure to stress test
lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
the latency depending on trie size and lookup count.

On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
bpf program takes roughly 6.5us on my system. Lookups in empty tries
take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
entries take ~7.1us (on the first _and_ any subsequent try).

Signed-off-by: David Herrmann &lt;dh.herrmann@gmail.com&gt;
Reviewed-by: Daniel Mack &lt;daniel@zonque.org&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf: Add tests for the LRU bpf_htab</title>
<updated>2016-11-15T16:50:43Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-11-11T18:55:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=5db58faf989f16d1d6a3d661aac616f9ca7932aa'/>
<id>urn:sha1:5db58faf989f16d1d6a3d661aac616f9ca7932aa</id>
<content type='text'>
This patch has some unit tests and a test_lru_dist.

The test_lru_dist reads in the numeric keys from a file.
The files used here are generated by a modified fio-genzipf tool
originated from the fio test suit.  The sample data file can be
found here: https://github.com/iamkafai/bpf-lru

The zipf.* data files have 100k numeric keys and the key is also
ranged from 1 to 100k.

The test_lru_dist outputs the number of unique keys (nr_unique).
F.e. The following means, 61239 of them is unique out of 100k keys.
nr_misses means it cannot be found in the LRU map, so nr_misses
must be &gt;= nr_unique. test_lru_dist also simulates a perfect LRU
map as a comparison:

[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a1_01.out 4000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
    task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
    task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
....
test_parallel_lru_dist (map_type:9 map_flags:0x2):
    task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
    task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)

[root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
/root/zipf.100k.a0_01.out 40000 1
...
test_parallel_lru_dist (map_type:9 map_flags:0x0):
    task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
    task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
...
test_parallel_lru_dist (map_type:9 map_flags:0x2):
    task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
    task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)

LRU map has also been added to map_perf_test:
/* Global LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
 1 cpus: 2934082 updates
 4 cpus: 7391434 updates
 8 cpus: 6500576 updates

/* Percpu LRU */
[root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
  1 cpus: 2896553 updates
  4 cpus: 9766395 updates
  8 cpus: 17460553 updates

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>samples/bpf: add map performance test</title>
<updated>2016-03-09T04:22:03Z</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@fb.com</email>
</author>
<published>2016-03-08T23:07:54Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=26e9093110fb9ceb10093e4914b129b58d49a425'/>
<id>urn:sha1:26e9093110fb9ceb10093e4914b129b58d49a425</id>
<content type='text'>
performance tests for hash map and per-cpu hash map
with and without pre-allocation

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
