<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/arch/mips/include/asm/cpu-features.h, branch linux-4.15.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.15.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-4.15.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2017-08-07T22:02:27Z</updated>
<entry>
<title>MIPS: Introduce cpu_tcache_line_size</title>
<updated>2017-08-07T22:02:27Z</updated>
<author>
<name>Matt Redfearn</name>
<email>matt.redfearn@imgtec.com</email>
</author>
<published>2017-07-26T07:41:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=21da5332327b6d183bd93336ecf29c70bc609b7b'/>
<id>urn:sha1:21da5332327b6d183bd93336ecf29c70bc609b7b</id>
<content type='text'>
There exist macros to return the cache line size of the L1 dcache and L2
scache but there is currently no macro for the L3 tcache. Add this macro
which will be used by the following patch "MIPS: PCI: Fix
smp_processor_id() in preemptible"

Signed-off-by: Matt Redfearn &lt;matt.redfearn@imgtec.com&gt;
Cc: Maciej W. Rozycki &lt;macro@imgtec.com&gt;
Cc: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Paul Burton &lt;paul.burton@imgtec.com&gt;
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/16871/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
<entry>
<title>MIPS: MIPS16e2: Identify ASE presence</title>
<updated>2017-07-05T12:06:44Z</updated>
<author>
<name>Maciej W. Rozycki</name>
<email>macro@imgtec.com</email>
</author>
<published>2017-05-23T12:37:05Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8d1630f13754f1435d3ea7078829121c52f38d15'/>
<id>urn:sha1:8d1630f13754f1435d3ea7078829121c52f38d15</id>
<content type='text'>
Identify the presence of the MIPS16e2 ASE as per the architecture
specification[1], by checking for CP0 Config5.CA2 bit being 1[2].

References:

[1] "MIPS32 Architecture for Programmers: MIPS16e2 Application-Specific
    Extension Technical Reference Manual", Imagination Technologies
    Ltd., Document Number: MD01172, Revision 01.00, April 26, 2016,
    Section 1.2 "Software Detection of the ASE", p. 5

[2] "MIPS32 interAptiv Multiprocessing System Software User's Manual",
    Imagination Technologies Ltd., Document Number: MD00904, Revision
    02.01, June 15, 2016, Section 2.2.1.6 "Device Configuration 5 --
    Config5 (CP0 Register 16, Select 5)", pp. 71-72

Signed-off-by: Maciej W. Rozycki &lt;macro@imgtec.com&gt;
Reviewed-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16094/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
<entry>
<title>MIPS: Add CPU shared FTLB feature detection</title>
<updated>2017-06-29T00:42:29Z</updated>
<author>
<name>Paul Burton</name>
<email>paul.burton@imgtec.com</email>
</author>
<published>2017-06-02T22:38:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=e7bc8557428f069eaa613b3676ea6931c0f7fe43'/>
<id>urn:sha1:e7bc8557428f069eaa613b3676ea6931c0f7fe43</id>
<content type='text'>
Some systems share FTLB RAMs or entries between sibling CPUs (ie.
hardware threads, or VP(E)s, within a core). These properties require
kernel handling in various places. As a start this patch introduces
cpu_has_shared_ftlb_ram &amp; cpu_has_shared_ftlb_entries feature macros
which we set appropriately for I6400 &amp; I6500 CPUs. Further patches will
make use of these macros as appropriate.

Signed-off-by: Paul Burton &lt;paul.burton@imgtec.com&gt;
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16202/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
<entry>
<title>MIPS: Probe guest MVH</title>
<updated>2017-03-28T13:49:15Z</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2017-03-14T10:15:11Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a929bdc52be658fd622c2d0f9fdf1d333aea34fd'/>
<id>urn:sha1:a929bdc52be658fd622c2d0f9fdf1d333aea34fd</id>
<content type='text'>
Probe for availablility of M{T,F}HC0 instructions used with e.g. XPA in
the VZ guest context, and make it available via cpu_guest_has_mvh. This
will be helpful in properly emulating the MAAR registers in KVM for MIPS
VZ.

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Acked-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Radim Krčmář" &lt;rkrcmar@redhat.com&gt;
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
</content>
</entry>
<entry>
<title>MIPS: Probe guest CP0_UserLocal</title>
<updated>2017-03-28T13:49:11Z</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2017-03-14T10:15:10Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=a7c7ad6c3e0a2ae88625df4e41c1c7305aa26f04'/>
<id>urn:sha1:a7c7ad6c3e0a2ae88625df4e41c1c7305aa26f04</id>
<content type='text'>
Probe for presence of guest CP0_UserLocal register and expose via
cpu_guest_has_userlocal. This register is optional pre-r6, so this will
allow KVM to only save/restore/expose the guest CP0_UserLocal register
if it exists.

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Acked-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Radim Krčmář" &lt;rkrcmar@redhat.com&gt;
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
</content>
</entry>
<entry>
<title>MIPS: Add defs &amp; probing of UFR</title>
<updated>2017-03-28T13:48:53Z</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2017-03-14T10:15:08Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=4e87580e6d144e44820a0d23e756136b6218f4f0'/>
<id>urn:sha1:4e87580e6d144e44820a0d23e756136b6218f4f0</id>
<content type='text'>
Add definitions and probing of the UFR bit in Config5. This bit allows
user mode control of the FR bit (floating point register mode). It is
present if the UFRP bit is set in the floating point implementation
register.

This is a capability KVM may want to expose to guest kernels, even
though Linux is unlikely to ever use it due to the implications for
multi-threaded programs.

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Acked-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
Cc: Paul Burton &lt;paul.burton@imgtec.com&gt;
Cc: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: "Radim Krčmář" &lt;rkrcmar@redhat.com&gt;
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
</content>
</entry>
<entry>
<title>lib/GCD.c: use binary GCD algorithm instead of Euclidean</title>
<updated>2016-05-21T00:58:30Z</updated>
<author>
<name>Zhaoxiu Zeng</name>
<email>zhaoxiu.zeng@gmail.com</email>
</author>
<published>2016-05-21T00:03:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=fff7fb0b2d908dec779783d8eaf3d7725230f75e'/>
<id>urn:sha1:fff7fb0b2d908dec779783d8eaf3d7725230f75e</id>
<content type='text'>
The binary GCD algorithm is based on the following facts:
	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available.  This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

	#include &lt;stdio.h&gt;
	#include &lt;stdlib.h&gt;
	#include &lt;stdint.h&gt;
	#include &lt;string.h&gt;
	#include &lt;time.h&gt;
	#include &lt;unistd.h&gt;

	#define swap(a, b) \
		do { \
			a ^= b; \
			b ^= a; \
			a ^= b; \
		} while (0)

	unsigned long gcd0(unsigned long a, unsigned long b)
	{
		unsigned long r;

		if (a &lt; b) {
			swap(a, b);
		}

		if (b == 0)
			return a;

		while ((r = a % b) != 0) {
			a = b;
			b = r;
		}

		return b;
	}

	unsigned long gcd1(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b &gt;&gt;= __builtin_ctzl(b);

		for (;;) {
			a &gt;&gt;= __builtin_ctzl(a);
			if (a == b)
				return a &lt;&lt; __builtin_ctzl(r);

			if (a &lt; b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd2(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &amp;= -r;

		while (!(b &amp; r))
			b &gt;&gt;= 1;

		for (;;) {
			while (!(a &amp; r))
				a &gt;&gt;= 1;
			if (a == b)
				return a;

			if (a &lt; b)
				swap(a, b);
			a -= b;
			a &gt;&gt;= 1;
			if (a &amp; r)
				a += b;
			a &gt;&gt;= 1;
		}
	}

	unsigned long gcd3(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b &gt;&gt;= __builtin_ctzl(b);
		if (b == 1)
			return r &amp; -r;

		for (;;) {
			a &gt;&gt;= __builtin_ctzl(a);
			if (a == 1)
				return r &amp; -r;
			if (a == b)
				return a &lt;&lt; __builtin_ctzl(r);

			if (a &lt; b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd4(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &amp;= -r;

		while (!(b &amp; r))
			b &gt;&gt;= 1;
		if (b == r)
			return r;

		for (;;) {
			while (!(a &amp; r))
				a &gt;&gt;= 1;
			if (a == r)
				return r;
			if (a == b)
				return a;

			if (a &lt; b)
				swap(a, b);
			a -= b;
			a &gt;&gt;= 1;
			if (a &amp; r)
				a += b;
			a &gt;&gt;= 1;
		}
	}

	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
		gcd0, gcd1, gcd2, gcd3, gcd4,
	};

	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

	#if defined(__x86_64__)

	#define rdtscll(val) do { \
		unsigned long __a,__d; \
		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)&lt;&lt;32); \
	} while(0)

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		unsigned long long start, end;
		unsigned long long ret;
		unsigned long gcd_res;

		rdtscll(start);
		gcd_res = gcd(a, b);
		rdtscll(end);

		if (end &gt;= start)
			ret = end - start;
		else
			ret = ~0ULL - start + 1 + end;

		*res = gcd_res;
		return ret;
	}

	#else

	static inline struct timespec read_time(void)
	{
		struct timespec time;
		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &amp;time);
		return time;
	}

	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
	{
		struct timespec temp;

		if ((end.tv_nsec - start.tv_nsec) &lt; 0) {
			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
		} else {
			temp.tv_sec = end.tv_sec - start.tv_sec;
			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
		}

		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
	}

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		struct timespec start, end;
		unsigned long gcd_res;

		start = read_time();
		gcd_res = gcd(a, b);
		end = read_time();

		*res = gcd_res;
		return diff_time(start, end);
	}

	#endif

	static inline unsigned long get_rand()
	{
		if (sizeof(long) == 8)
			return (unsigned long)rand() &lt;&lt; 32 | rand();
		else
			return rand();
	}

	int main(int argc, char **argv)
	{
		unsigned int seed = time(0);
		int loops = 100;
		int repeats = 1000;
		unsigned long (*res)[TEST_ENTRIES];
		unsigned long long elapsed[TEST_ENTRIES];
		int i, j, k;

		for (;;) {
			int opt = getopt(argc, argv, "n:r:s:");
			/* End condition always first */
			if (opt == -1)
				break;

			switch (opt) {
			case 'n':
				loops = atoi(optarg);
				break;
			case 'r':
				repeats = atoi(optarg);
				break;
			case 's':
				seed = strtoul(optarg, NULL, 10);
				break;
			default:
				/* You won't actually get here. */
				break;
			}
		}

		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
		memset(elapsed, 0, sizeof(elapsed));

		srand(seed);
		for (j = 0; j &lt; loops; j++) {
			unsigned long a = get_rand();
			/* Do we have args? */
			unsigned long b = argc &gt; optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			unsigned long long min_elapsed[TEST_ENTRIES];
			for (k = 0; k &lt; repeats; k++) {
				for (i = 0; i &lt; TEST_ENTRIES; i++) {
					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &amp;res[j][i]);
					if (k == 0 || min_elapsed[i] &gt; tmp)
						min_elapsed[i] = tmp;
				}
			}
			for (i = 0; i &lt; TEST_ENTRIES; i++)
				elapsed[i] += min_elapsed[i];
		}

		for (i = 0; i &lt; TEST_ENTRIES; i++)
			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

		k = 0;
		srand(seed);
		for (j = 0; j &lt; loops; j++) {
			unsigned long a = get_rand();
			unsigned long b = argc &gt; optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			for (i = 1; i &lt; TEST_ENTRIES; i++) {
				if (res[j][i] != res[j][0])
					break;
			}
			if (i &lt; TEST_ENTRIES) {
				if (k == 0) {
					k = 1;
					fprintf(stderr, "Error:\n");
				}
				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
				for (i = 0; i &lt; TEST_ENTRIES; i++)
					fprintf(stderr, "%ld%s", res[j][i], i &lt; TEST_ENTRIES - 1 ? ", " : "\n");
			}
		}

		if (k == 0)
			fprintf(stderr, "PASS\n");

		free(res);

		return 0;
	}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 10174
  gcd1: elapsed 2120
  gcd2: elapsed 2902
  gcd3: elapsed 2039
  gcd4: elapsed 2812
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9309
  gcd1: elapsed 2280
  gcd2: elapsed 2822
  gcd3: elapsed 2217
  gcd4: elapsed 2710
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9589
  gcd1: elapsed 2098
  gcd2: elapsed 2815
  gcd3: elapsed 2030
  gcd4: elapsed 2718
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9914
  gcd1: elapsed 2309
  gcd2: elapsed 2779
  gcd3: elapsed 2228
  gcd4: elapsed 2709
  PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng &lt;zhaoxiu.zeng@gmail.com&gt;
Signed-off-by: George Spelvin &lt;linux@horizon.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>MIPS: Add probing &amp; defs for VZ &amp; guest features</title>
<updated>2016-05-13T13:30:25Z</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2016-05-11T14:50:30Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=6ad816e77ed77538fe729050cf6631328c6113f7'/>
<id>urn:sha1:6ad816e77ed77538fe729050cf6631328c6113f7</id>
<content type='text'>
Add a few new cpu-features.h definitions for VZ sub-features, namely the
existence of the CP0_GuestCtl0Ext, CP0_GuestCtl1, and CP0_GuestCtl2
registers, and support for GuestID to dialias TLB entries belonging to
different guests.

Also add certain features present in the guest, with the naming scheme
cpu_guest_has_*. These are added separately to the main options bitfield
since they generally parallel similar features in the root context. A
few of these (FPU, MSA, watchpoints, perf counters, CP0_[X]ContextConfig
registers, MAAR registers, and probably others in future) can be
dynamically configured in the guest context, for which the
cpu_guest_has_dyn_* macros are added.

[ralf@linux-mips.org: Resolve merge conflict.]

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13231/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
<entry>
<title>MIPS: Add perf counter feature</title>
<updated>2016-05-13T13:30:25Z</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2016-05-11T12:50:53Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=30228c40f0c6f0a93910325f120904505a4c39cc'/>
<id>urn:sha1:30228c40f0c6f0a93910325f120904505a4c39cc</id>
<content type='text'>
Add CPU feature for standard MIPS r2 performance counters, as determined
by the Config1.PC bit. Both perf_events and oprofile probe this bit, so
lets combine the probing and change both to use cpu_has_perf.

This will also be used for VZ support in KVM to know whether performance
counters exist which can be exposed to guests.

[ralf@linux-mips.org: resolve conflict.]

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Robert Richter &lt;rric@kernel.org&gt;
Cc: linux-mips@linux-mips.org
Cc: oprofile-list@lists.sf.net
Patchwork: https://patchwork.linux-mips.org/patch/13226/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
<entry>
<title>MIPS: Add defs &amp; probing of [X]ContextConfig</title>
<updated>2016-05-13T13:30:25Z</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2016-05-11T12:50:52Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=f18bdfa191df3947879f93519ce4a4985e5f5fce'/>
<id>urn:sha1:f18bdfa191df3947879f93519ce4a4985e5f5fce</id>
<content type='text'>
The CP0_[X]ContextConfig registers are present if CP0_Config3.CTXTC or
CP0_Config3.SM are set, and provide more control over which bits of
CP0_[X]Context are set to the faulting virtual address on a TLB
exception.

KVM/VZ will need to be able to save and restore these registers in the
guest context, so add the relevant definitions and probing of the
ContextConfig feature in the root context first.

[ralf@linux-mips.org: resolve merge conflict.]

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13225/
Signed-off-by: Ralf Baechle &lt;ralf@linux-mips.org&gt;
</content>
</entry>
</feed>
