<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/drivers/thermal/intel/intel_hfi.c, branch linux-6.9.y</title>
<subtitle>Hosts the 0x221E linux distro kernel.</subtitle>
<id>https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y</id>
<link rel='self' href='https://universe.0xinfinity.dev/distro/kernel/atom?h=linux-6.9.y'/>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/'/>
<updated>2024-02-15T21:07:45Z</updated>
<entry>
<title>x86/cpu/topology: Rename topology_max_die_per_package()</title>
<updated>2024-02-15T21:07:45Z</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-02-13T21:06:13Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=bd745d1c41e7fa56242889eb5dc6df2d7dd5df32'/>
<id>urn:sha1:bd745d1c41e7fa56242889eb5dc6df2d7dd5df32</id>
<content type='text'>
The plural of die is dies.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Tested-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Tested-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Link: https://lore.kernel.org/r/20240213210253.065874205@linutronix.de





</content>
</entry>
<entry>
<title>thermal: intel: hfi: Add syscore callbacks for system-wide PM</title>
<updated>2024-01-12T14:44:42Z</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-10T03:07:04Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=97566d09fd02d2ab329774bb89a2cdf2267e86d9'/>
<id>urn:sha1:97566d09fd02d2ab329774bb89a2cdf2267e86d9</id>
<content type='text'>
The kernel allocates a memory buffer and provides its location to the
hardware, which uses it to update the HFI table. This allocation occurs
during boot and remains constant throughout runtime.

When resuming from hibernation, the restore kernel allocates a second
memory buffer and reprograms the HFI hardware with the new location as
part of a normal boot. The location of the second memory buffer may
differ from the one allocated by the image kernel.

When the restore kernel transfers control to the image kernel, its HFI
buffer becomes invalid, potentially leading to memory corruption if the
hardware writes to it (the hardware continues to use the buffer from the
restore kernel).

It is also possible that the hardware "forgets" the address of the memory
buffer when resuming from "deep" suspend. Memory corruption may also occur
in such a scenario.

To prevent the described memory corruption, disable HFI when preparing to
suspend or hibernate. Enable it when resuming.

Add syscore callbacks to handle the package of the boot CPU (packages of
non-boot CPUs are handled via CPU offline). Syscore ops always run on the
boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend
and hibernation. Syscore ops only run in these cases.

Cc: 6.1+ &lt;stable@vger.kernel.org&gt; # 6.1+
Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
[ rjw: Comment adjustment, subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Disable an HFI instance when all its CPUs go offline</title>
<updated>2024-01-03T13:06:40Z</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-03T04:14:58Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=1c53081d773c2cb4461636559b0d55b46559ceec'/>
<id>urn:sha1:1c53081d773c2cb4461636559b0d55b46559ceec</id>
<content type='text'>
In preparation to support hibernation, add functionality to disable an HFI
instance during CPU offline. The last CPU of an instance that goes offline
will disable such instance.

The Intel Software Development Manual states that the operating system must
wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after
disabling an HFI instance to ensure that it will no longer write on the HFI
memory. Some processors, however, do not ever set such bit. Wait a minimum
of 2ms to give time hardware to complete any pending memory writes.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Enable an HFI instance from its first online CPU</title>
<updated>2024-01-03T13:06:40Z</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-03T04:14:57Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=ac1f9230d92a04619331c600dbcead0e32b3e80e'/>
<id>urn:sha1:ac1f9230d92a04619331c600dbcead0e32b3e80e</id>
<content type='text'>
Previously, HFI instances were never disabled once enabled. A CPU in an
instance only had to check during boot whether another CPU had previously
initialized the instance and its corresponding data structure.

A subsequent changeset will add functionality to disable instances
to support hibernation. Such change will also make possible to disable an
HFI instance during runtime via CPU hotplug.

Enable an HFI instance from the first of its CPUs that comes online. This
covers the boot, CPU hotplug, and resume-from-suspend cases. It also covers
systems with one or more HFI instances (i.e., packages).

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Refactor enabling code into helper functions</title>
<updated>2024-01-03T13:06:40Z</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2024-01-03T04:14:56Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=8a8b6bb93c704776c4b05cb517c3fa8baffb72f5'/>
<id>urn:sha1:8a8b6bb93c704776c4b05cb517c3fa8baffb72f5</id>
<content type='text'>
In preparation for the addition of a suspend notifier, wrap the logic to
enable HFI and program its memory buffer into helper functions. Both the
CPU hotplug callback and the suspend notifier will use them.

This refactoring does not introduce functional changes.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: Remove core header inclusion from drivers</title>
<updated>2023-02-15T16:29:48Z</updated>
<author>
<name>Daniel Lezcano</name>
<email>daniel.lezcano@linaro.org</email>
</author>
<published>2023-02-06T15:34:29Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=9272d2d43b6e532d0c0b6d3a597cf75c9ca1e183'/>
<id>urn:sha1:9272d2d43b6e532d0c0b6d3a597cf75c9ca1e183</id>
<content type='text'>
As the name states "thermal_core.h" is the header file for the core
components of the thermal framework.

Too many drivers are including it. Hopefully the recent cleanups
helped to self encapsulate the code a bit more and prevented the
drivers to need this header.

Remove this inclusion in every place where it is possible.

Some other drivers did a confusion with the core header and the one
exported in linux/thermal.h. They include the former instead of the
latter. The changes also fix this.

The tegra/soctherm driver still remains as it uses an internal
function which need to be replaced.

The Intel HFI driver uses the netlink internal framework core and
should be changed to prevent to deal with the internals.

No functional changes intended.

Signed-off-by: Daniel Lezcano &lt;daniel.lezcano@linaro.org&gt;
Reviewed-by: Miquel Raynal &lt;miquel.raynal@bootlin.com&gt; # armada_thermal.c
Reviewed-by: Kunihiko Hayashi &lt;hayashi.kunihiko@socionext.com&gt; # uniphier_thermal.c
Reviewed-by: Niklas Söderlund &lt;niklas.soderlund+renesas@ragnatech.se&gt; # rcar_gen3_thermal.c
Reviewed-by: Neil Armstrong &lt;neil.armstrong@linaro.org&gt; # amlogic_thermal.c
Acked-by: Florian Fainelli &lt;f.fainelli@gmail.com&gt; # bcm2835_thermal.c
Acked-by: Thierry Reding &lt;treding@nvidia.com&gt; # tegra30-tsensor.c
Link: https://lore.kernel.org/r/20230206153432.1017282-1-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Remove a pointless die_id check</title>
<updated>2022-12-02T19:47:52Z</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2022-11-28T16:20:01Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=3a3073b69c76a8909374c5f9d610ea2f02ba3402'/>
<id>urn:sha1:3a3073b69c76a8909374c5f9d610ea2f02ba3402</id>
<content type='text'>
die_id is an u16 quantity. On single-die systems the default value of
die_id is 0. No need to check for negative values.

Plus, removing this check makes Coverity happy.

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: hfi: ACK HFI for the same timestamp</title>
<updated>2022-11-23T19:13:22Z</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-11-16T23:14:59Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=c0e3acdcdeb14099765de38224dfe0ad019c8482'/>
<id>urn:sha1:c0e3acdcdeb14099765de38224dfe0ad019c8482</id>
<content type='text'>
Some processors issue more than one HFI interrupt with the same
timestamp. Each interrupt must be acknowledged to let the hardware issue
new HFI interrupts. But this can't be done without some additional flow
modification in the existing interrupt handling.

For background, the HFI interrupt is a package level thermal interrupt
delivered via a LVT. This LVT is common for both the CPU and package
level interrupts. Hence, all CPUs receive the HFI interrupts. But only
one CPU should process interrupt and others simply exit by issuing EOI
to LAPIC.

The current HFI interrupt processing flow:

  1. Receive Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spinlock, one CPU will enter spinlock and others
     will simply return from here to issue EOI.
    (Let's assume CPU 4 is processing interrupt)
  4. Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.      ignore interrupt, unlock and return
  7. Copy the HFI message to local buffer
  8. unlock spinlock
  9. ACK HFI interrupt
 10. Queue the message for processing in a work-queue

It is tempting to simply acknowledge all the interrupts even if they
have the same timestamp. This may cause some interrupts to not be
processed.

Let's say CPU5 is slightly late and reaches step 4 while CPU4 is
between steps 8 and 9.

Currently we simply ignore interrupts with the same timestamp. No
issue here for CPU5. When CPU4 acknowledges the interrupt, the next
HFI interrupt can be delivered.

If we acknowledge interrupts with the same timestamp (at step 6), there
is a race condition. Under the same scenario, CPU 5 will acknowledge
the HFI interrupt. This lets hardware generate another HFI interrupt,
before CPU 4 start executing step 9. Once CPU 4 complete step 9, it
will acknowledge the newly arrived HFI interrupt, without actually
processing it.

Acknowledge the interrupt when holding the spinlock. This avoids
contention of the interrupt acknowledgment.

Updated flow:

  1. Receive HFI Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spin-lock
     Let's assume CPU 4 is processing interrupt
  4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit
  4.2	If hfi status is 0
  4.3		unlock spinlock
  4.4		return
  4.5 Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.1      ACK HFI Interrupt,
  6.2	unlock spinlock
  6.3	return
  7. Copy the HFI message to local buffer
  8. ACK HFI interrupt
  9. unlock spinlock
 10. Queue the message for processing in a work-queue

To avoid taking the lock unnecessarily, intel_hfi_process_event() checks
the status of the HFI interrupt before taking the lock. If CPU5 is late,
when it starts processing the interrupt there are two scenarios:

 a) CPU4 acknowledged the HFI interrupt before CPU5 read
    MSR_IA32_THERM_STATUS. CPU5 exits.

 b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the
    interrupt. CPU5 will take the lock if CPU4 has released it. It then
    re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt,
    the HFI status bit is clear and CPU5 exits. If a new HFI interrupt
    was generated it will find that the status bit is set and it will
    continue to process the interrupt. In this case even if timestamp
    is not changed, the ACK can be issued as this is a new interrupt.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Tested-by: Arshad, Adeel&lt;adeel.arshad@intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: Protect clearing of thermal status bits</title>
<updated>2022-11-23T19:09:06Z</updated>
<author>
<name>Srinivas Pandruvada</name>
<email>srinivas.pandruvada@linux.intel.com</email>
</author>
<published>2022-11-16T02:54:17Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=930d06bf071aa746db11d68d2d75660b449deff3'/>
<id>urn:sha1:930d06bf071aa746db11d68d2d75660b449deff3</id>
<content type='text'>
The clearing of the package thermal status is done by Read-Modify-Write
operation. This may result in clearing of some new status bits which are
being or about to be processed.

For example, while clearing of HFI status, after read of thermal status
register, a new thermal status bit is set by the hardware. But during
write back, the newly generated status bit will be set to 0 or cleared.
So, it is not safe to do read-modify-write.

Since thermal status Read-Write bits can be set to only 0 not 1, it is
safe to set all other bits to 1 which are not getting cleared.

Create a common interface for clearing package thermal status bits. Use
this interface to replace existing code to clear thermal package status
bits.

It is safe to call from different CPUs without protection as there is no
read-modify-write. Also wrmsrl results in just single instruction. For
example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If
CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write
0x4000aa8. The bits which are not part of clear are set to 1. The default
mask for bits, which can be written here is 0x4000aaa.

Signed-off-by: Srinivas Pandruvada &lt;srinivas.pandruvada@linux.intel.com&gt;
Reviewed-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
<entry>
<title>thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages</title>
<updated>2022-10-28T18:11:48Z</updated>
<author>
<name>Ricardo Neri</name>
<email>ricardo.neri-calderon@linux.intel.com</email>
</author>
<published>2022-10-18T11:22:40Z</published>
<link rel='alternate' type='text/html' href='https://universe.0xinfinity.dev/distro/kernel/commit/?id=54d9135cf223f221546bd51b0f5e4a73e99891f4'/>
<id>urn:sha1:54d9135cf223f221546bd51b0f5e4a73e99891f4</id>
<content type='text'>
A Coverity static code scan raised a potential overflow_before_widen
warning when hfi_features::nr_table_pages is used as an argument to
memcpy in intel_hfi_process_event().

Even though the overflow can never happen (the maximum number of pages of
the HFI table is 0x10 and 0x10 &lt;&lt; PAGE_SHIFT = 0x10000), using size_t as
the data type of hfi_features::nr_table_pages makes Coverity happy and
matches the data type of the argument 'size' of memcpy().

Signed-off-by: Ricardo Neri &lt;ricardo.neri-calderon@linux.intel.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
</entry>
</feed>
