linux

Author	SHA1	Message	Date
Ian Rogers	91c1949d76	perf test metrics: Update all metrics for possibly failing default metrics Default metrics may use unsupported events and be ignored. These metrics shouldn't cause metric testing to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	083ae6c1fb	perf test stat: Update std_output testing metric expectations Make the expectations match json metrics rather than the previous hard coded ones. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	b1cb2b76bd	perf test stat: Ignore failures in Default[234] metricgroups The Default[234] metric groups may contain unsupported legacy events. Allow those metric groups to fail. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	2c240484cf	perf test stat+json: Improve metric-only testing When testing metric-only, pass a metric to perf rather than expecting a hard coded metric value to be generated. Remove keys that were really metric-only units and instead don't expect metric only to have a matching json key as it encodes metrics as {"metric_name", "metric_value"}. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	1bcd627165	perf stat: Remove "unit" workarounds for metric-only Remove code that tested the "unit" as in KB/sec for certain hard coded metric values and did workarounds. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	a745c0831c	perf stat: Sort default events/metrics To improve the readability of default events/metrics, sort the evsels after the Default metric groups have be parsed. Before: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,087 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.3 % tma_bad_speculation # 25.8 % tma_frontend_bound # 34.5 % tma_backend_bound # 29.3 % tma_retiring 7,829 page-faults # nan faults/sec page_faults_per_second 880,144,270 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.10%) 1,693,081,235 cpu_core/cpu-cycles/ # nan GHz cycles_frequency TopdownL1 (cpu_atom) # 20.5 % tma_bad_speculation # 13.8 % tma_retiring (50.26%) # 34.6 % tma_frontend_bound (50.23%) 89,326,916 cpu_atom/branches/ # nan M/sec branch_frequency (60.19%) 538,123,088 cpu_core/branches/ # nan M/sec branch_frequency 1,368 cpu-migrations # nan migrations/sec migrations_per_second # 31.1 % tma_backend_bound (60.19%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 485,744,856 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (59.87%) 3,093,112,283 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 4,939,427 cpu_atom/branch-misses/ # 5.0 % branch_miss_rate (49.77%) 7,632,248 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.005084693 seconds time elapsed ``` After: ``` $ perf stat -a sleep 1 Performance counter stats for 'system wide': 22,165 context-switches # nan cs/sec cs_per_second 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 2,260 cpu-migrations # nan migrations/sec migrations_per_second 20,476 page-faults # nan faults/sec page_faults_per_second 17,052,357 cpu_core/branch-misses/ # 1.5 % branch_miss_rate 1,120,090,590 cpu_core/branches/ # nan M/sec branch_frequency 3,402,892,275 cpu_core/cpu-cycles/ # nan GHz cycles_frequency 6,129,236,701 cpu_core/instructions/ # 1.8 instructions insn_per_cycle 6,159,523 cpu_atom/branch-misses/ # 3.1 % branch_miss_rate (49.86%) 222,158,812 cpu_atom/branches/ # nan M/sec branch_frequency (50.25%) 1,547,610,244 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.40%) 1,304,901,260 cpu_atom/instructions/ # 0.8 instructions insn_per_cycle (50.41%) TopdownL1 (cpu_core) # 13.7 % tma_bad_speculation # 23.5 % tma_frontend_bound # 33.3 % tma_backend_bound # 29.6 % tma_retiring TopdownL1 (cpu_atom) # 32.1 % tma_backend_bound (59.65%) # 30.1 % tma_frontend_bound (59.51%) # 22.3 % tma_bad_speculation # 15.5 % tma_retiring (59.53%) 1.008405429 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	19df87d9ed	perf stat: Fix default metricgroup display on hybrid The logic to skip output of a default metric line was firing on Alderlake and not displaying 'TopdownL1 (cpu_atom)'. Remove the need_full_name check as it is equivalent to the different PMU test in the cases we care about, merge the 'if's and flip the evsel of the PMU test. The 'if' is now basically saying, if the output matches the last printed output then skip the output. Before: ``` TopdownL1 (cpu_core) # 11.3 % tma_bad_speculation # 24.3 % tma_frontend_bound TopdownL1 (cpu_core) # 33.9 % tma_backend_bound # 30.6 % tma_retiring # 42.2 % tma_backend_bound # 25.0 % tma_frontend_bound (49.81%) # 12.8 % tma_bad_speculation # 20.0 % tma_retiring (59.46%) ``` After: ``` TopdownL1 (cpu_core) # 8.3 % tma_bad_speculation # 43.7 % tma_frontend_bound # 30.7 % tma_backend_bound # 17.2 % tma_retiring TopdownL1 (cpu_atom) # 31.9 % tma_backend_bound # 37.6 % tma_frontend_bound (49.66%) # 18.0 % tma_bad_speculation # 12.6 % tma_retiring (59.58%) ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	b71f46a6a7	perf stat: Remove hard coded shadow metrics Now that the metrics are encoded in common json the hard coded printing means the metrics are shown twice. Remove the hard coded version. This means that when specifying events, and those events correspond to a hard coded metric, the metric will no longer be displayed. The metric will be displayed if the metric is requested. Due to the adhoc printing in the previous approach it was often found frustrating, the new approach avoids this. The default perf stat output on an alderlake now looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 19,697 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 10.7 % tma_bad_speculation # 24.9 % tma_frontend_bound TopdownL1 (cpu_core) # 34.3 % tma_backend_bound # 30.1 % tma_retiring 6,593 page-faults # nan faults/sec page_faults_per_second 729,065,658 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.79%) 1,605,131,101 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 19.7 % tma_bad_speculation # 14.2 % tma_retiring (50.14%) # 37.3 % tma_frontend_bound (50.31%) 87,302,268 cpu_atom/branches/ # nan M/sec branch_frequency (60.27%) 512,046,956 cpu_core/branches/ # nan M/sec branch_frequency 1,111 cpu-migrations # nan migrations/sec migrations_per_second # 28.8 % tma_backend_bound (60.26%) 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 392,509,323 cpu_atom/instructions/ # 0.6 instructions insn_per_cycle (60.19%) 2,990,369,310 cpu_core/instructions/ # 1.9 instructions insn_per_cycle 3,493,478 cpu_atom/branch-misses/ # 5.9 % branch_miss_rate (49.69%) 7,297,531 cpu_core/branch-misses/ # 1.4 % branch_miss_rate 1.006621701 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	3622990efa	perf script: Change metric format to use json metrics The metric format option isn't properly supported. This change improves that by making the sample events update the counts of an evsel, where the shadow metric code expects to read the values. To support printing metrics, metrics need to be found. This is done on the first attempt to print a metric. Every metric is parsed and then the evsels in the metric's evlist compared to those in perf script using the perf_event_attr type and config. If the metric matches then it is added for printing. As an event in the perf script's evlist may have >1 metric id, or different leader for aggregation, the first metric matched will be displayed in those cases. An example use is: ``` $ perf record -a -e '{instructions,cpu-cycles}:S' -a -- sleep 1 $ perf script -F period,metric ... 867817 metric: 0.30 insn per cycle 125394 metric: 0.04 insn per cycle 313516 metric: 0.11 insn per cycle metric: 1.00 insn per cycle ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	2dfc0cab3d	perf stat: Add detail -d,-dd,-ddd metrics Add metrics for the stat-shadow -d, -dd and -ddd events and hard coded metrics. Remove the events as these now come from the metrics. Following this change a detailed perf stat output looks like: ``` $ perf stat -a -ddd -- sleep 1 Performance counter stats for 'system wide': 21,089 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 14.1 % tma_bad_speculation # 27.3 % tma_frontend_bound (30.56%) TopdownL1 (cpu_core) # 31.5 % tma_backend_bound # 27.2 % tma_retiring (30.56%) 6,302 page-faults # nan faults/sec page_faults_per_second 928,495,163 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (28.41%) 1,841,409,834 cpu_core/cpu-cycles/ # nan GHz cycles_frequency (38.51%) # 14.5 % tma_bad_speculation # 16.0 % tma_retiring (28.41%) # 36.8 % tma_frontend_bound (35.57%) 100,859,118 cpu_atom/branches/ # nan M/sec branch_frequency (42.73%) 572,657,734 cpu_core/branches/ # nan M/sec branch_frequency (54.43%) 1,527 cpu-migrations # nan migrations/sec migrations_per_second # 32.7 % tma_backend_bound (42.73%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 498,668,509 cpu_atom/instructions/ # 0.57 insn per cycle # 0.6 instructions insn_per_cycle (42.97%) 3,281,762,225 cpu_core/instructions/ # 1.84 insn per cycle # 1.8 instructions insn_per_cycle (62.20%) 4,919,511 cpu_atom/branch-misses/ # 5.43% of all branches # 5.4 % branch_miss_rate (35.80%) 7,431,776 cpu_core/branch-misses/ # 1.39% of all branches # 1.4 % branch_miss_rate (62.20%) 2,517,007 cpu_atom/LLC-loads/ # 0.1 % llc_miss_rate (28.62%) 3,931,318 cpu_core/LLC-loads/ # 40.4 % llc_miss_rate (45.98%) 14,918,674 cpu_core/L1-dcache-load-misses/ # 2.25% of all L1-dcache accesses # nan % l1d_miss_rate (37.80%) 27,067,264 cpu_atom/L1-icache-load-misses/ # 15.92% of all L1-icache accesses # 15.9 % l1i_miss_rate (21.47%) 116,848,994 cpu_atom/dTLB-loads/ # 0.8 % dtlb_miss_rate (21.47%) 764,870,407 cpu_core/dTLB-loads/ # 0.1 % dtlb_miss_rate (15.12%) 1.006181526 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	a3248b5b54	perf jevents: Add metric DefaultShowEvents Some Default group metrics require their events showing for consistency with perf's previous behavior. Add a flag to indicate when this is the case and use it in stat-display. As events are coming from Default metrics remove that default hardware and software events from perf stat. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 1 Performance counter stats for 'system wide': 20,550 context-switches # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation # 28.1 % tma_frontend_bound TopdownL1 (cpu_core) # 29.2 % tma_backend_bound # 33.7 % tma_retiring 6,685 page-faults # nan faults/sec page_faults_per_second 790,091,064 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (49.83%) 2,563,918,366 cpu_core/cpu-cycles/ # nan GHz cycles_frequency # 12.3 % tma_bad_speculation # 14.5 % tma_retiring (50.20%) # 33.8 % tma_frontend_bound (50.24%) 76,390,322 cpu_atom/branches/ # nan M/sec branch_frequency (60.20%) 1,015,173,047 cpu_core/branches/ # nan M/sec branch_frequency 1,325 cpu-migrations # nan migrations/sec migrations_per_second # 39.3 % tma_backend_bound (60.17%) 0.00 msec cpu-clock # 0.000 CPUs utilized # 0.0 CPUs CPUs_utilized 554,347,072 cpu_atom/instructions/ # 0.64 insn per cycle # 0.6 instructions insn_per_cycle (60.14%) 5,228,931,991 cpu_core/instructions/ # 2.04 insn per cycle # 2.0 instructions insn_per_cycle 4,308,874 cpu_atom/branch-misses/ # 5.65% of all branches # 5.6 % branch_miss_rate (49.76%) 9,890,606 cpu_core/branch-misses/ # 0.97% of all branches # 1.0 % branch_miss_rate 1.005477803 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:35 -08:00
Ian Rogers	c7adeb0974	perf jevents: Add set of common metrics based on default ones Add support to getting a common set of metrics from a default table. It simplifies the generation to add json metrics at the same time. The metrics added are CPUs_utilized, cs_per_second, migrations_per_second, page_faults_per_second, insn_per_cycle, stalled_cycles_per_instruction, frontend_cycles_idle, backend_cycles_idle, cycles_frequency, branch_frequency and branch_miss_rate based on the shadow metric definitions. Following this change the default perf stat output on an alderlake looks like: ``` $ perf stat -a -- sleep 2 Performance counter stats for 'system wide': 0.00 msec cpu-clock # 0.000 CPUs utilized 77,739 context-switches 15,033 cpu-migrations 321,313 page-faults 14,355,634,225 cpu_atom/instructions/ # 1.40 insn per cycle (35.37%) 134,561,560,583 cpu_core/instructions/ # 3.44 insn per cycle (57.85%) 10,263,836,145 cpu_atom/cycles/ (35.42%) 39,138,632,894 cpu_core/cycles/ (57.60%) 2,989,658,777 cpu_atom/branches/ (42.60%) 32,170,570,388 cpu_core/branches/ (57.39%) 29,789,870 cpu_atom/branch-misses/ # 1.00% of all branches (42.69%) 165,991,152 cpu_core/branch-misses/ # 0.52% of all branches (57.19%) (software) # nan cs/sec cs_per_second TopdownL1 (cpu_core) # 11.9 % tma_bad_speculation # 19.6 % tma_frontend_bound (63.97%) TopdownL1 (cpu_core) # 18.8 % tma_backend_bound # 49.7 % tma_retiring (63.97%) (software) # nan faults/sec page_faults_per_second # nan GHz cycles_frequency (42.88%) # nan GHz cycles_frequency (69.88%) TopdownL1 (cpu_atom) # 11.7 % tma_bad_speculation # 29.9 % tma_retiring (50.07%) TopdownL1 (cpu_atom) # 31.3 % tma_frontend_bound (43.09%) (cpu_atom) # nan M/sec branch_frequency (43.09%) # nan M/sec branch_frequency (70.07%) # nan migrations/sec migrations_per_second TopdownL1 (cpu_atom) # 27.1 % tma_backend_bound (43.08%) (software) # 0.0 CPUs CPUs_utilized # 1.4 instructions insn_per_cycle (43.04%) # 3.5 instructions insn_per_cycle (69.99%) # 1.0 % branch_miss_rate (35.46%) # 0.5 % branch_miss_rate (65.02%) 2.005626564 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:34 -08:00
Ian Rogers	2e5140849b	perf expr: Add #target_cpu literal For CPU nanoseconds a lot of the stat-shadow metrics use either task-clock or cpu-clock, the latter being used when target__has_cpu. Add a #target_cpu literal so that json metrics can perform the same test. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:34 -08:00
Ian Rogers	c8035a4961	perf metricgroup: Add care to picking the evsel for displaying a metric Rather than using the first evsel in the matched events, try to find the least shared non-tool evsel. The aim is to pick the first evsel that typifies the metric within the list of metrics. This addresses an issue where Default metric group metrics may lose their counter value due to how the stat displaying hides counters for default event/metric output. For a metricgroup like TopdownL1 on an Intel Alderlake the change is, before there are 4 events with metrics: ``` $ perf stat -M topdownL1 -a sleep 1 Performance counter stats for 'system wide': 7,782,334,296 cpu_core/TOPDOWN.SLOTS/ # 10.4 % tma_bad_speculation # 19.7 % tma_frontend_bound 2,668,927,977 cpu_core/topdown-retiring/ # 35.7 % tma_backend_bound # 34.1 % tma_retiring 803,623,987 cpu_core/topdown-bad-spec/ 167,514,386 cpu_core/topdown-heavy-ops/ 1,555,265,776 cpu_core/topdown-fe-bound/ 2,792,733,013 cpu_core/topdown-be-bound/ 279,769,310 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.2 % tma_retiring # 15.1 % tma_bad_speculation 457,917,232 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 38.4 % tma_backend_bound # 34.2 % tma_frontend_bound 783,519,226 cpu_atom/TOPDOWN_FE_BOUND.ALL/ 10,790,192 cpu_core/INT_MISC.UOP_DROPPING/ 879,845,633 cpu_atom/TOPDOWN_BE_BOUND.ALL/ ``` After there are 6 events with metrics: ``` $ perf stat -M topdownL1 -a sleep 1 Performance counter stats for 'system wide': 2,377,551,258 cpu_core/TOPDOWN.SLOTS/ # 7.9 % tma_bad_speculation # 36.4 % tma_frontend_bound 480,791,142 cpu_core/topdown-retiring/ # 35.5 % tma_backend_bound 186,323,991 cpu_core/topdown-bad-spec/ 65,070,590 cpu_core/topdown-heavy-ops/ # 20.1 % tma_retiring 871,733,444 cpu_core/topdown-fe-bound/ 848,286,598 cpu_core/topdown-be-bound/ 260,936,456 cpu_atom/TOPDOWN_RETIRING.ALL/ # 12.4 % tma_retiring # 17.6 % tma_bad_speculation 419,576,513 cpu_atom/CPU_CLK_UNHALTED.CORE/ 797,132,597 cpu_atom/TOPDOWN_FE_BOUND.ALL/ # 38.0 % tma_frontend_bound 3,055,447 cpu_core/INT_MISC.UOP_DROPPING/ 671,014,164 cpu_atom/TOPDOWN_BE_BOUND.ALL/ # 32.0 % tma_backend_bound ``` Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:48:34 -08:00
Namhyung Kim	367377f45c	perf tools: Fix missing feature check for inherit + SAMPLE_READ It should also have PERF_SAMPLE_TID to enable inherit and PERF_SAMPLE_READ on recent kernels. Not having _TID makes the feature check wrongly detect the inherit and _READ support. It was reported that the following command failed due to the error in the missing feature check on Intel SPR machines. $ perf record -e '{cpu/mem-loads-aux/S,cpu/mem-loads,ldlat=3/PS}' -- ls Error: Failure to open event 'cpu/mem-loads,ldlat=3/PS' on PMU 'cpu' which will be removed. Invalid event (cpu/mem-loads,ldlat=3/PS) in per-thread mode, enable system wide with '-a'. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: `3b193a57ba` ("perf tools: Detect missing kernel features properly") Reported-and-tested-by: Chen, Zide <zide.chen@intel.com> Closes: https://lore.kernel.org/lkml/20251022220802.1335131-1-zide.chen@intel.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-11 16:43:37 -08:00
Chen Ni	e279039c3e	perf symbol: Remove unneeded semicolon Remove unnecessary semicolons reported by Coccinelle/coccicheck and the semantic patch at scripts/coccinelle/misc/semicolon.cocci. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-10 22:22:16 -08:00
Ian Rogers	081006b7c8	perf test: Add test that command line period overrides sysfs/json values The behavior of weak terms is subtle, add a test that they aren't accidentally broken. The test finds an event with a weak 'period' and then overrides it. In no such event is present then the test skips. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-09 23:07:57 -08:00
Ian Rogers	0e9b51a432	perf pmu: Make pmu_alias_terms weak again The terms for a json event should be weak so they don't override command line options. Before: ``` $ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1 \|grep "{ sample_period, sample_freq }" { sample_period, sample_freq } 200003 { sample_period, sample_freq } 2000003 { sample_period, sample_freq } 1000 ``` After: ``` $ perf record -vv -c 1000 -e uops_issued.any -o /dev/null true 2>&1 \|grep "{ sample_period, sample_freq }" { sample_period, sample_freq } 1000 { sample_period, sample_freq } 1000 { sample_period, sample_freq } 1000 ``` Fixes: `84bae3af20` ("perf pmu: Don't eagerly parse event terms") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-09 23:07:57 -08:00
Ian Rogers	6331b26693	perf tool: Add a delegate_tool that just delegates actions to another tool Add an ability to be able to compose perf_tools, by having one perform an action and then calling a delegate. Currently the perf_tools have if-then-elses setting the callback and then if-then-elses within the callback. Understanding the behavior is complex as it is in two places and logic for numerous operations, within things like perf inject, is interwoven. By chaining perf_tools together based on command line options this kind of code can be avoided. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-07 13:25:05 -08:00
Ian Rogers	71062e282d	perf tool: Add the perf_tool argument to all callbacks Getting context for what a tool is doing, such as the perf_inject instance, using container_of the tool is a common pattern in the code. This isn't possible event_op2, event_op3 and event_op4 callbacks as the tool isn't passed. Add the argument and then fix function signatures to match. As tools maybe reading a tool from somewhere else, change that code to use the passed in tool. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-07 13:25:05 -08:00
Xu Yang	fa4a527af5	perf vendor events arm64:: Add i.MX94 DDR Performance Monitor metrics Add JSON metrics for i.MX94 DDR Performance Monitor. Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-06 17:46:16 -08:00
Namhyung Kim	268a31a9f8	perf stat: Add ScaleUnit to {cpu,task}-clock JSON description This changes the output of the event like below. In fact, that's the output it used to have before the JSON conversion. Before: $ perf stat -e task-clock true Performance counter stats for 'true': 313,848 task-clock # 0.290 CPUs utilized 0.001081223 seconds time elapsed 0.001122000 seconds user 0.000000000 seconds sys After: $ perf stat -e task-clock true Performance counter stats for 'true': 0.36 msec task-clock # 0.297 CPUs utilized 0.001225435 seconds time elapsed 0.001268000 seconds user 0.000000000 seconds sys Reviewed-by: Ian Rogers <irogers@google.com> Fixes: `9957d8c801` ("perf jevents: Add common software event json") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-06 17:43:29 -08:00
Namhyung Kim	6bd89ae7d1	perf record: Make sure to update build-ID cache Recent change on enabling --buildid-mmap by default brought an issue with build-id handling. With build-ID in MMAP2 records, we don't need to save the build-ID table in the header of a perf data file. But the actual file contents still need to be cached in the debug directory for annotation etc. Split the build-ID header processing and caching and make sure perf record to save hit DSOs in the build-ID cache by moving perf_session__cache_build_ids() to the end of the record__ finish_output(). Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-06 17:29:16 -08:00
Ian Rogers	4df4370937	perf jevents: Make all tables static The tables created by jevents.py are only used within the pmu-events.c file. Change the declarations of those global variables to be static to encapsulate this. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-03 20:57:21 -08:00
Ian Rogers	3f02cebe13	perf metricgroup: When copy metrics copy default information When copy metrics into a group also copy default information from the original metrics. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-03 20:57:21 -08:00
Ian Rogers	3bae9228a5	perf metricgroup: Missed free on error path If an out-of-memory occurs the expr also needs freeing. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-03 20:57:21 -08:00
Ian Rogers	5faa23cdab	perf metricgroup: Update comment on location of metric_event list Update comment as the stat_config no longer holds all metrics. Signed-off-by: Ian Rogers <irogers@google.com> Fixes: `faebee18d7` ("perf stat: Move metric list from config to evlist") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-03 20:57:21 -08:00
Ian Rogers	371d32394e	perf evsel: Remove unused metric_events variable The metric_events exist in the metric_expr list and so this variable has been unused for a while. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-03 20:57:21 -08:00
Ian Rogers	01bc5d2f0d	perf tools: Cache counter names for raw samples on s390 Searching all event names is slower now that legacy names are included. Add a cache to avoid long iterative searches. Note, the cache isn't cleaned up and is as such a memory leak, however, globally reachable leaks like this aren't treated as leaks by leak sanitizer. Reported-by: Thomas Richter <tmricht@linux.ibm.com> Closes: https://lore.kernel.org/linux-perf-users/09943f4f-516c-4b93-877c-e4a64ed61d38@linux.ibm.com/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-31 12:46:19 -07:00
Namhyung Kim	915c31f0e6	perf trace: Increase syscall handler map size to 1024 The syscalls_sys_{enter,exit} map in augmented_raw_syscalls.bpf.c has max entries of 512. Usually syscall numbers are smaller than this but x86 has x32 ABI where syscalls start from 512. That makes trace__init_syscalls_bpf_prog_array_maps() fail in the middle of the loop when it accesses those keys. As the loop iteration is not ordered by syscall numbers anymore, the failure can affect non-x32 syscalls. Let's increase the map size to 1024 so that it can handle those ABIs too. While most systems won't need this, increasing the size will be safer for potential future changes. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-31 12:29:44 -07:00
Chu Guangqing	0d1e63183d	perf vendor events AmpereOneX: Fix spelling typo in the metrics file The json file incorrectly used "acceses" instead of "accesses". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-31 12:28:17 -07:00
Chu Guangqing	1d7f783809	perf vendor events arm64: Fix typo in Ampere eMag json file Correct instruction spelling errors. Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-31 12:27:56 -07:00
Shuai Xue	163e5f2b96	perf record: skip synthesize event when open evsel failed When using perf record with the `--overwrite` option, a segmentation fault occurs if an event fails to open. For example: perf record -e cycles-ct -F 1000 -a --overwrite Error: cycles-ct:H: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat' perf: Segmentation fault #0 0x6466b6 in dump_stack debug.c:366 #1 0x646729 in sighandler_dump_stack debug.c:378 #2 0x453fd1 in sigsegv_handler builtin-record.c:722 #3 0x7f8454e65090 in __restore_rt libc-2.32.so[54090] #4 0x6c5671 in __perf_event__synthesize_id_index synthetic-events.c:1862 #5 0x6c5ac0 in perf_event__synthesize_id_index synthetic-events.c:1943 #6 0x458090 in record__synthesize builtin-record.c:2075 #7 0x45a85a in __cmd_record builtin-record.c:2888 #8 0x45deb6 in cmd_record builtin-record.c:4374 #9 0x4e5e33 in run_builtin perf.c:349 #10 0x4e60bf in handle_internal_command perf.c:401 #11 0x4e6215 in run_argv perf.c:448 #12 0x4e653a in main perf.c:555 #13 0x7f8454e4fa72 in __libc_start_main libc-2.32.so[3ea72] #14 0x43a3ee in _start ??:0 The --overwrite option implies --tail-synthesize, which collects non-sample events reflecting the system status when recording finishes. However, when evsel opening fails (e.g., unsupported event 'cycles-ct'), session->evlist is not initialized and remains NULL. The code unconditionally calls record__synthesize() in the error path, which iterates through the NULL evlist pointer and causes a segfault. To fix it, move the record__synthesize() call inside the error check block, so it's only called when there was no error during recording, ensuring that evlist is properly initialized. Fixes: `4ea648aec0` ("perf record: Add --tail-synthesize option") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-31 12:01:02 -07:00
Namhyung Kim	553d18c98a	perf lock contention: Load kernel map before lookup On some machines, it caused troubles when it tried to find kernel symbols. I think it's because kernel modules and kallsyms are messed up during load and split. Basically we want to make sure the kernel map is loaded and the code has it in the lock_contention_read(). But recently we added more lookups in the lock_contention_prepare() which is called before _read(). Also the kernel map (kallsyms) may not be the first one in the group like on ARM. Let's use machine__kernel_map() rather than just loading the first map. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: `688d2e8de2` ("perf lock contention: Add -l/--lock-addr option") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-31 11:32:51 -07:00
Ian Rogers	3528647874	perf test workload: Add thread count argument to thloop Allow the number of threads for the thloop workload to be increased beyond the normal 2. Add error checking to the parsed time and thread count values. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-28 16:59:58 -07:00
Michal Suchanek	2fee899c06	perf hwmon_pmu: Fix uninitialized variable warning The line_len is only set on success. Check the return value instead. util/hwmon_pmu.c: In function ‘perf_pmus__read_hwmon_pmus’: util/hwmon_pmu.c:742:20: warning: ‘line_len’ may be used uninitialized [-Wmaybe-uninitialized] 742 \| if (line_len > 0 && line[line_len - 1] == '\n') \| ^ util/hwmon_pmu.c:719:24: note: ‘line_len’ was declared here 719 \| size_t line_len; Fixes: `53cc0b351e` ("perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-27 11:30:15 -07:00
tanze	ab29ff9f6f	perf auxtrace: Add auxtrace_synth_id_range_start() helper To avoid hardcoding the offset value for synthetic event IDs in multiple auxtrace modules (arm-spe, cs-etm, intel-pt, etc.), and to improve code reusability, this patch unifies the handling of the ID offset via a dedicated helper function. Signed-off-by: tanze <tanze@kylinos.cn> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-25 17:44:57 -07:00
Ian Rogers	be806f06ad	perf stat: Add/fix bperf cgroup max events workarounds Commit `b8308511f6` bumped the max events to 1024 but this results in BPF verifier issues if the number of command line events is too large. Workaround this by: 1) moving the constants to a header file to share between BPF and perf C code, 2) testing that the maximum number of events doesn't cause BPF verifier issues in debug builds, 3) lower the max events from 1024 to 128, 4) in perf stat, if there are more events than the BPF counters can support then disable BPF counter usage. The rodata setup is factored into its own function to avoid duplicating it in the testing code. Signed-off-by: Ian Rogers <irogers@google.com> Fixes: `b8308511f6` ("perf stat bperf cgroup: Increase MAX_EVENTS from 32 to 1024") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-25 16:44:21 -07:00
Leo Yan	3e98f0203e	perf cs-etm: Mute enumeration value warning When the OpenCSD library introduces a new enumeration value (for example, in the v1.7.1 release), the perf build fails with an error: util/cs-etm-decoder/cs-etm-decoder.c:600:10: error: enumeration value 'OCSD_GEN_TRC_ELEM_ITMTRACE' not explicitly handled in switch [-Werror, -Wswitch-enum] 600 \| switch (elem->elem_type) { \| ^~~~~~~~~~~~~~~ 1 error generated. Convert to if-else sentences to mute the enumeration value warning, which can avoid build failures whenever the lib is updated. No functional change. Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-23 18:59:13 -07:00
Kuninori Morimoto	9960889b32	tools: arm64: Add Cortex-A720AE definitions Add cputype definitions for Cortex-A720AE. These will be used for errata detection in subsequent patches. These values can be found in the Cortex-A720AE TRM: https://developer.arm.com/documentation/102828/0001/ ... in Table A-187 Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-23 14:48:15 -07:00
James Clark	f2195c5b43	perf annotate: Fix Clang build by adding block in switch case Clang and GCC disagree with what constitutes a "declaration after statement". GCC allows declarations in switch cases without an extra block, as long as it's immediately after the label. Clang does not. Unfortunately this is the case even in the latest versions of both compilers. The only option that makes them behave in the same way is -Wpedantic, which can't be enabled in Perf because of the number of warnings it generates. Add a block to fix the Clang build, which is the only thing we can do. Fixes the build error: ui/browsers/annotate.c:999:4: error: expected expression struct annotation_line *al = NULL; ui/browsers/annotate.c:1008:4: error: use of undeclared identifier 'al' al = annotated_source__get_line(notes->src, offset); ui/browsers/annotate.c:1009:24: error: use of undeclared identifier 'al' browser->curr_hot = al ? &al->rb_node : NULL; ui/browsers/annotate.c:1009:30: error: use of undeclared identifier 'al' browser->curr_hot = al ? &al->rb_node : NULL; ui/browsers/annotate.c:1000:8: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement] s64 offset = annotate_browser__curr_hot_offset(browser); Fixes: `ad83f3b715` ("perf c2c annotate: Start from the contention line") Signed-off-by: James Clark <james.clark@linaro.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-23 14:28:58 -07:00
Zecheng Li	a1d8548c23	perf annotate: Invalidate register states for untracked instructions When tracking variable types, instructions that modify a pointer value in an untracked way can lead to incorrect type propagation. To prevent this, invalidate the register state when encountering such instructions. This change invalidates pointer types for various arithmetic and bitwise operations that current pointer offset tracking doesn't support, like imul, shl, and, inc, etc. A special case is added for 'xor reg, reg', which is a common idiom for zeroing a register. For this, the register state is updated to be a constant with a value of 0. This could introduce slight regressions if a variable is zeroed and then reused. This can be addressed in the future by using all DWARF locations for instruction tracking instead of only the first one. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 06:02:49 -07:00
Zecheng Li	109218718d	perf annotate: Save pointer offset in stack state The tracked pointer offset was not being preserved in the stack state, which could lead to incorrect type analysis. This change adds a ptr_offset field to the type_state_stack struct and passes it to set_stack_state and findnew_stack_state to ensure the offset is preserved after the pointer is loaded from a stack location. It improves the type annotation coverage and quality. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 06:02:49 -07:00
Zecheng Li	1f4cc4ae3f	perf annotate: Track arithmetic instructions on pointers Track the arithmetic operations on registers with pointer types. We handle only add, sub and lea instructions. The original pointer information needs to be preserved for getting outermost struct types. For example, reg0 points to a struct cfs_rq, when we add 0x10 to reg0, it should preserve the information of struct cfs_rq + 0x10 in the register instead of a pointer type to the child field at 0x10. Details: 1. struct type_state_reg now includes an offset, indicating if the register points to the start or an internal part of its associated type. This offset is used in mem to reg and reg to stack mem transfers, and also applied to the final type offset. 2. lea offset(%sp/%fp), reg is now treated as taking the address of a stack variable. It worked fine in most cases, but an issue with this approach is the pointer type may not exist. 3. lea offset(%base), reg is handled by moving the type from %base and adding an offset, similar to an add operation followed by a mov reg to reg. 4. Non-stack variables from DWARF with non-zero offsets in their location expressions are now accepted with register offset tracking. Multi-register addressing modes in LEA are not supported. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 06:02:49 -07:00
Zecheng Li	24a30ce9b1	perf annotate: Track address registers via TSR_KIND_POINTER Introduce TSR_KIND_POINTER to improve the data type profiler's ability to track pointer-based memory accesses and address register variables. TSR_KIND_POINTER represents that the location holds a pointer type to the type in the type state. The semantics match the `breg` registers that describe a memory location. This change implements handling for this new kind in mov instructions and in the check_matching_type() function. When a TSR_KIND_POINTER is moved to the stack, the stack state size is set to the architecture's pointer size. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 06:02:49 -07:00
Zecheng Li	068b6a4524	perf annotate: Skip annotating data types to lea instructions Introduce a helper function is_address_gen_insn() to check arch-dependent address generation instructions like lea in x86. Remove type annotation on these instructions since they are not accessing memory. It should be counted as `no_mem_ops`. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 06:02:49 -07:00
Tianyou Li	f1204e5846	perf annotate: Check return value of evsel__get_arch() properly Check the error code of evsel__get_arch() in the symbol__annotate(). Previously it checked non-zero value but after the refactoring it does only for negative values. Fixes: `0669729eb0` ("perf annotate: Factor out evsel__get_arch()") Suggested-by: James Clark <james.clark@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Tianyou Li <tianyou.li@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 05:42:34 -07:00
Tianyou Li	262c61435c	perf annotate: fix a crash when annotate the same symbol with 's' and 'T' When perf report with annotation for a symbol, press 's' and 'T', then exit the annotate browser. Once annotate the same symbol, the annotate browser will crash. The browser.arch was required to be correctly updated when data type feature was enabled by 'T'. Usually it was initialized by symbol__annotate2 function. If a symbol has already been correctly annotated at the first time, it should not call the symbol__annotate2 function again, thus the browser.arch will not get initialized. Then at the second time to show the annotate browser, the data type needs to be displayed but the browser.arch is empty. Stack trace as below: Perf: Segmentation fault -------- backtrace -------- #0 0x55d365 in ui__signal_backtrace setup.c:0 #1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930] #2 0x570f08 in arch__is perf[570f08] #3 0x562186 in annotate_get_insn_location perf[562186] #4 0x562626 in __hist_entry__get_data_type annotate.c:0 #5 0x56476d in annotation_line__write perf[56476d] #6 0x54e2db in annotate_browser__write annotate.c:0 #7 0x54d061 in ui_browser__list_head_refresh perf[54d061] #8 0x54dc9e in annotate_browser__refresh annotate.c:0 #9 0x54c03d in __ui_browser__refresh browser.c:0 #10 0x54ccf8 in ui_browser__run perf[54ccf8] #11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92] #12 0x552293 in do_annotate hists.c:0 #13 0x55941c in evsel__hists_browse hists.c:0 #14 0x55b00f in evlist__tui_browse_hists perf[55b00f] #15 0x42ff02 in cmd_report perf[42ff02] #16 0x494008 in run_builtin perf.c:0 #17 0x494305 in handle_internal_command perf.c:0 #18 0x410547 in main perf[410547] #19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0] #20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680] #21 0x410b75 in _start perf[410b75] Fixes: `1d4374afd0` ("perf annotate: Add 'T' hot key to toggle data type display") Reviewed-by: James Clark <james.clark@linaro.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Tianyou Li <tianyou.li@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 05:42:01 -07:00
Namhyung Kim	0e6c07a3c3	perf annotate: Fix build with NO_SLANG=1 The recent change for perf c2c annotate broke build without slang support like below. builtin-annotate.c: In function 'hists__find_annotations': builtin-annotate.c:522:73: error: 'NO_ADDR' undeclared (first use in this function); did you mean 'NR_ADDR'? 522 \| key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR); \| ^~~~~~~ \| NR_ADDR builtin-annotate.c:522:73: note: each undeclared identifier is reported only once for each function it appears in builtin-annotate.c:522:31: error: too many arguments to function 'hist_entry__tui_annotate' 522 \| key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR); \| ^~~~~~~~~~~~~~~~~~~~~~~~ In file included from util/sort.h:6, from builtin-annotate.c:28: util/hist.h:756:19: note: declared here 756 \| static inline int hist_entry__tui_annotate(struct hist_entry *he __maybe_unused, \| ^~~~~~~~~~~~~~~~~~~~~~~~ And I noticed that it missed to update the other side of #ifdef HAVE_SLANG_SUPPORT. Let's fix it. Cc: Tianyou Li <tianyou.li@intel.com> Fixes: `cd3466cd26` ("perf c2c: Add annotation support to perf c2c report") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-21 05:36:02 -07:00
James Clark	29166bd0a4	perf jevents: Suppress circular dependency warnings When doing an in source build, $(OUTPUT) is empty so the rule has the same input and output file. Suppress the warning by only adding the rule when doing an out of source build. The same condition already exists for the clean rule for json files. This fixes the following warnings: make[3]: Circular pmu-events/arch/nds32/mapfile.csv <- pmu-events/arch/nds32/mapfile.csv dependency dropped. make[3]: Circular pmu-events/arch/powerpc/mapfile.csv <- pmu-events/arch/powerpc/mapfile.csv dependency dropped. ... Signed-off-by: James Clark <james.clark@linaro.org> Tested-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-10-20 21:11:49 -07:00

1 2 3 4 5 ...

1396739 Commits