kernel/arch/mips/include/asm/cpu-features.h, branch linux-4.15.y

MIPS: Introduce cpu_tcache_line_size

2017-08-07T22:02:27Z

There exist macros to return the cache line size of the L1 dcache and L2 scache but there is currently no macro for the L3 tcache. Add this macro which will be used by the following patch "MIPS: PCI: Fix smp_processor_id() in preemptible" Signed-off-by: Matt Redfearn Cc: Maciej W. Rozycki Cc: James Hogan Cc: Paul Burton Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/16871/ Signed-off-by: Ralf Baechle

MIPS: MIPS16e2: Identify ASE presence

2017-07-05T12:06:44Z

Identify the presence of the MIPS16e2 ASE as per the architecture specification[1], by checking for CP0 Config5.CA2 bit being 1[2]. References: [1] "MIPS32 Architecture for Programmers: MIPS16e2 Application-Specific Extension Technical Reference Manual", Imagination Technologies Ltd., Document Number: MD01172, Revision 01.00, April 26, 2016, Section 1.2 "Software Detection of the ASE", p. 5 [2] "MIPS32 interAptiv Multiprocessing System Software User's Manual", Imagination Technologies Ltd., Document Number: MD00904, Revision 02.01, June 15, 2016, Section 2.2.1.6 "Device Configuration 5 -- Config5 (CP0 Register 16, Select 5)", pp. 71-72 Signed-off-by: Maciej W. Rozycki Reviewed-by: James Hogan Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16094/ Signed-off-by: Ralf Baechle

MIPS: Add CPU shared FTLB feature detection

2017-06-29T00:42:29Z

Some systems share FTLB RAMs or entries between sibling CPUs (ie. hardware threads, or VP(E)s, within a core). These properties require kernel handling in various places. As a start this patch introduces cpu_has_shared_ftlb_ram & cpu_has_shared_ftlb_entries feature macros which we set appropriately for I6400 & I6500 CPUs. Further patches will make use of these macros as appropriate. Signed-off-by: Paul Burton Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/16202/ Signed-off-by: Ralf Baechle

MIPS: Probe guest MVH

2017-03-28T13:49:15Z

Probe for availablility of M{T,F}HC0 instructions used with e.g. XPA in the VZ guest context, and make it available via cpu_guest_has_mvh. This will be helpful in properly emulating the MAAR registers in KVM for MIPS VZ. Signed-off-by: James Hogan Acked-by: Ralf Baechle Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org

MIPS: Probe guest CP0_UserLocal

2017-03-28T13:49:11Z

Probe for presence of guest CP0_UserLocal register and expose via cpu_guest_has_userlocal. This register is optional pre-r6, so this will allow KVM to only save/restore/expose the guest CP0_UserLocal register if it exists. Signed-off-by: James Hogan Acked-by: Ralf Baechle Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org

MIPS: Add defs & probing of UFR

2017-03-28T13:48:53Z

Add definitions and probing of the UFR bit in Config5. This bit allows user mode control of the FR bit (floating point register mode). It is present if the UFRP bit is set in the floating point implementation register. This is a capability KVM may want to expose to guest kernels, even though Linux is unlikely to ever use it due to the implications for multi-threaded programs. Signed-off-by: James Hogan Acked-by: Ralf Baechle Cc: Paul Burton Cc: Paolo Bonzini Cc: "Radim Krčmář" Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org

lib/GCD.c: use binary GCD algorithm instead of Euclidean

2016-05-21T00:58:30Z

The binary GCD algorithm is based on the following facts: 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) Even on x86 machines with reasonable division hardware, the binary algorithm runs about 25% faster (80% the execution time) than the division-based Euclidian algorithm. On platforms like Alpha and ARMv6 where division is a function call to emulation code, it's even more significant. There are two variants of the code here, depending on whether a fast __ffs (find least significant set bit) instruction is available. This allows the unpredictable branches in the bit-at-a-time shifting loop to be eliminated. If fast __ffs is not available, the "even/odd" GCD variant is used. I use the following code to benchmark: #include #include #include #include #include #include #define swap(a, b) \ do { \ a ^= b; \ b ^= a; \ a ^= b; \ } while (0) unsigned long gcd0(unsigned long a, unsigned long b) { unsigned long r; if (a < b) { swap(a, b); } if (b == 0) return a; while ((r = a % b) != 0) { a = b; b = r; } return b; } unsigned long gcd1(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); for (;;) { a >>= __builtin_ctzl(a); if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd2(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; for (;;) { while (!(a & r)) a >>= 1; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } unsigned long gcd3(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); if (b == 1) return r & -r; for (;;) { a >>= __builtin_ctzl(a); if (a == 1) return r & -r; if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd4(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; if (b == r) return r; for (;;) { while (!(a & r)) a >>= 1; if (a == r) return r; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = { gcd0, gcd1, gcd2, gcd3, gcd4, }; #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0])) #if defined(__x86_64__) #define rdtscll(val) do { \ unsigned long __a,__d; \ __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \ } while(0) static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { unsigned long long start, end; unsigned long long ret; unsigned long gcd_res; rdtscll(start); gcd_res = gcd(a, b); rdtscll(end); if (end >= start) ret = end - start; else ret = ~0ULL - start + 1 + end; *res = gcd_res; return ret; } #else static inline struct timespec read_time(void) { struct timespec time; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time); return time; } static inline unsigned long long diff_time(struct timespec start, struct timespec end) { struct timespec temp; if ((end.tv_nsec - start.tv_nsec) < 0) { temp.tv_sec = end.tv_sec - start.tv_sec - 1; temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec; } else { temp.tv_sec = end.tv_sec - start.tv_sec; temp.tv_nsec = end.tv_nsec - start.tv_nsec; } return temp.tv_sec * 1000000000ULL + temp.tv_nsec; } static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { struct timespec start, end; unsigned long gcd_res; start = read_time(); gcd_res = gcd(a, b); end = read_time(); *res = gcd_res; return diff_time(start, end); } #endif static inline unsigned long get_rand() { if (sizeof(long) == 8) return (unsigned long)rand() << 32 | rand(); else return rand(); } int main(int argc, char **argv) { unsigned int seed = time(0); int loops = 100; int repeats = 1000; unsigned long (*res)[TEST_ENTRIES]; unsigned long long elapsed[TEST_ENTRIES]; int i, j, k; for (;;) { int opt = getopt(argc, argv, "n:r:s:"); /* End condition always first */ if (opt == -1) break; switch (opt) { case 'n': loops = atoi(optarg); break; case 'r': repeats = atoi(optarg); break; case 's': seed = strtoul(optarg, NULL, 10); break; default: /* You won't actually get here. */ break; } } res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops); memset(elapsed, 0, sizeof(elapsed)); srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); /* Do we have args? */ unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); unsigned long long min_elapsed[TEST_ENTRIES]; for (k = 0; k < repeats; k++) { for (i = 0; i < TEST_ENTRIES; i++) { unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]); if (k == 0 || min_elapsed[i] > tmp) min_elapsed[i] = tmp; } } for (i = 0; i < TEST_ENTRIES; i++) elapsed[i] += min_elapsed[i]; } for (i = 0; i < TEST_ENTRIES; i++) printf("gcd%d: elapsed %llu\n", i, elapsed[i]); k = 0; srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); for (i = 1; i < TEST_ENTRIES; i++) { if (res[j][i] != res[j][0]) break; } if (i < TEST_ENTRIES) { if (k == 0) { k = 1; fprintf(stderr, "Error:\n"); } fprintf(stderr, "gcd(%lu, %lu): ", a, b); for (i = 0; i < TEST_ENTRIES; i++) fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n"); } } if (k == 0) fprintf(stderr, "PASS\n"); free(res); return 0; } Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got: zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 10174 gcd1: elapsed 2120 gcd2: elapsed 2902 gcd3: elapsed 2039 gcd4: elapsed 2812 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9309 gcd1: elapsed 2280 gcd2: elapsed 2822 gcd3: elapsed 2217 gcd4: elapsed 2710 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9589 gcd1: elapsed 2098 gcd2: elapsed 2815 gcd3: elapsed 2030 gcd4: elapsed 2718 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9914 gcd1: elapsed 2309 gcd2: elapsed 2779 gcd3: elapsed 2228 gcd4: elapsed 2709 PASS [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable] Signed-off-by: Zhaoxiu Zeng Signed-off-by: George Spelvin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds

MIPS: Add probing & defs for VZ & guest features

2016-05-13T13:30:25Z

Add a few new cpu-features.h definitions for VZ sub-features, namely the existence of the CP0_GuestCtl0Ext, CP0_GuestCtl1, and CP0_GuestCtl2 registers, and support for GuestID to dialias TLB entries belonging to different guests. Also add certain features present in the guest, with the naming scheme cpu_guest_has_*. These are added separately to the main options bitfield since they generally parallel similar features in the root context. A few of these (FPU, MSA, watchpoints, perf counters, CP0_[X]ContextConfig registers, MAAR registers, and probably others in future) can be dynamically configured in the guest context, for which the cpu_guest_has_dyn_* macros are added. [ralf@linux-mips.org: Resolve merge conflict.] Signed-off-by: James Hogan Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13231/ Signed-off-by: Ralf Baechle

MIPS: Add perf counter feature

2016-05-13T13:30:25Z

Add CPU feature for standard MIPS r2 performance counters, as determined by the Config1.PC bit. Both perf_events and oprofile probe this bit, so lets combine the probing and change both to use cpu_has_perf. This will also be used for VZ support in KVM to know whether performance counters exist which can be exposed to guests. [ralf@linux-mips.org: resolve conflict.] Signed-off-by: James Hogan Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Robert Richter Cc: linux-mips@linux-mips.org Cc: oprofile-list@lists.sf.net Patchwork: https://patchwork.linux-mips.org/patch/13226/ Signed-off-by: Ralf Baechle

MIPS: Add defs & probing of [X]ContextConfig

2016-05-13T13:30:25Z

The CP0_[X]ContextConfig registers are present if CP0_Config3.CTXTC or CP0_Config3.SM are set, and provide more control over which bits of CP0_[X]Context are set to the faulting virtual address on a TLB exception. KVM/VZ will need to be able to save and restore these registers in the guest context, so add the relevant definitions and probing of the ContextConfig feature in the root context first. [ralf@linux-mips.org: resolve merge conflict.] Signed-off-by: James Hogan Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13225/ Signed-off-by: Ralf Baechle