linux

Author	SHA1	Message	Date
Besar Wicaksono	d852b838eb	perf arm-spe: Add NVIDIA Olympus to neoverse list Add NVIDIA Olympus MIDR to neoverse_spe range list. Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com> Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-23 10:15:50 -08:00
Ian Rogers	5c5f6fe32d	perf symbol: Fix ENOENT case for filename__read_build_id Some callers of filename__read_build_id assume the error value must be -1, fix by making them handle all < 0 values. If is_regular_file fails in filename__read_build_id then it could be the file is missing (ENOENT) and it would be wrong to return -EWOULDBLOCK in that case. Fix the logic so -EWOULDBLOCK is only reported if other errors with stat haven't occurred. Fixes: `834ebb5678` ("perf tools: Don't read build-ids from non-regular files") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-17 07:30:51 -08:00
Linus Torvalds	9e906a9dea	Merge tag 'perf-tools-for-v6.19-2025-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "Perf event/metric description: Unify all event and metric descriptions in JSON format. Now event parsing and handling is greatly simplified by that. From users point of view, perf list will provide richer information about hardware events like the following. $ perf list hw List of pre-defined events (to be used in -e or -M): legacy hardware: branch-instructions [Retired branch instructions [This event is an alias of branches]. Unit: cpu] branch-misses [Mispredicted branch instructions. Unit: cpu] branches [Retired branch instructions [This event is an alias of branch-instructions]. Unit: cpu] bus-cycles [Bus cycles,which can be different from total cycles. Unit: cpu] cache-misses [Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates. Unit: cpu] cache-references [Cache accesses. Usually this indicates Last Level Cache accesses but this may vary depending on your CPU. This may include prefetches and coherency messages; again this depends on the design of your CPU. Unit: cpu] cpu-cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cycles]. Unit: cpu] cycles [Total cycles. Be wary of what happens during CPU frequency scaling [This event is an alias of cpu-cycles]. Unit: cpu] instructions [Retired instructions. Be careful,these can be affected by various issues,most notably hardware interrupt counts. Unit: cpu] ref-cycles [Total cycles; not affected by CPU frequency scaling. Unit: cpu] But most notable changes would be in the perf stat. On the right side, the default metrics are better named and aligned. :) $ perf stat -- perf test -w noploop Performance counter stats for 'perf test -w noploop': 11 context-switches # 10.8 cs/sec cs_per_second 0 cpu-migrations # 0.0 migrations/sec migrations_per_second 3,612 page-faults # 3532.5 faults/sec page_faults_per_second 1,022.51 msec task-clock # 1.0 CPUs CPUs_utilized 110,466 branch-misses # 0.0 % branch_miss_rate (88.66%) 6,934,452,104 branches # 6781.8 M/sec branch_frequency (88.66%) 4,657,032,590 cpu-cycles # 4.6 GHz cycles_frequency (88.65%) 27,755,874,218 instructions # 6.0 instructions insn_per_cycle (89.03%) TopdownL1 # 0.3 % tma_backend_bound # 9.3 % tma_bad_speculation (89.05%) # 9.7 % tma_frontend_bound (77.86%) # 80.7 % tma_retiring (88.81%) 1.025318171 seconds time elapsed 1.013248000 seconds user 0.012014000 seconds sys Deferred unwinding support: With the kernel support (commit `c69993ecdd`: "perf: Support deferred user unwind"), perf can use deferred callchains for userspace stack trace with frame pointers like below: $ perf record --call-graph fp,defer ... This will be transparent to users when it comes to other commands like perf report and perf script. They will merge the deferred callchains to the previous samples as if they were collected together. ARM SPE updates - Extensive enhancements to support various kinds of memory operations including GCS, MTE allocation tags, memcpy/memset, register access, and SIMD operations. - Add inverted data source filter (inv_data_src_filter) support to exclude certain data sources. - Improve documentation. Vendor event updates: - Intel: Updated event files for Sierra Forest, Panther Lake, Meteor Lake, Lunar Lake, Granite Rapids, and others. - Arm64: Added metrics for i.MX94 DDR PMU and Cortex-A720AE definitions. - RISC-V: Added JSON support for T-HEAD C920V2. Misc: - Improve pointer tracking in data type profiling. It'd give better output when the variable is using container_of() to convert type. - Annotation support for perf c2c report in TUI. Press 'a' key to enter annotation view from cacheline browser window. This will show which instruction is causing the cacheline contention. - Lots of fixes and test coverage improvements!" * tag 'perf-tools-for-v6.19-2025-12-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (214 commits) libperf: Use 'extern' in LIBPERF_API visibility macro perf stat: Improve handling of termination by signal perf tests stat: Add test for error for an offline CPU perf stat: When no events, don't report an error if there is none perf tests stat: Add "--null" coverage perf cpumap: Add "any" CPU handling to cpu_map__snprint_mask libperf cpumap: Fix perf_cpu_map__max for an empty/NULL map perf stat: Allow no events to open if this is a "--null" run perf test kvm: Add some basic perf kvm test coverage perf tests evlist: Add basic evlist test perf tests script dlfilter: Add a dlfilter test perf tests kallsyms: Add basic kallsyms test perf tests timechart: Add a perf timechart test perf tests top: Add basic perf top coverage test perf tests buildid: Add purge and remove testing perf tests c2c: Add a basic c2c perf c2c: Clean up some defensive gets and make asan clean perf jitdump: Fix missed dso__put perf mem-events: Don't leak online CPU map perf hist: In init, ensure mem_info is put on error paths ...	2025-12-07 07:07:02 -08:00
Ian Rogers	e2de90bdc9	perf cpumap: Add "any" CPU handling to cpu_map__snprint_mask If the perf_cpu_map is empty or is just the any CPU value, then early return. Don't process the "any" CPU when creating the bitmap. Tested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-04 00:36:14 -08:00
Ian Rogers	1da7c10b2e	perf jitdump: Fix missed dso__put Reference count checking caught a missing dso__put following a machine__findnew_dso_id. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:46 -08:00
Ian Rogers	69d247295a	perf mem-events: Don't leak online CPU map Reference count checking found the online CPU map was being gotten but not put. Add in the missing put. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:46 -08:00
Ian Rogers	f60efb4454	perf hist: In init, ensure mem_info is put on error paths Rather than exit the internal map_symbols directly, put the mem-info that does this and also lowers the reference count on the mem-info itself otherwise the mem-info is being leaked. Fixes: `56e144fe98` ("perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit") Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:40 -08:00
Ian Rogers	dc4d16543e	perf probe-event: Ensure probe event nsinfo is always cleared Move nsinfo__zput from cleanup_perf_probe_events to clear_perf_probe_event so it is always executed. Clean up clear_perf_probe_events to not call nsinfo__zput and use the pev variable to avoid repeated array accesses. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:23 -08:00
Ian Rogers	b4e44399eb	perf symbol: Add missed dso__put Add missing dso__put for the dso created in maps__split_kallsyms. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:23 -08:00
Ian Rogers	b3ea721b80	perf symbol-elf: Add missing puts on error path In dso__process_kernel_symbol if inserting a map fails, probably ENOMEM, then the reference count puts were missing on the dso and map. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:07:23 -08:00
Leo Yan	c4fe074b61	perf arm_spe: Add CPU variants supporting common data source packet Add the following CPU variants to the list for data source decoding: - Cortex-A715 [1] - Cortex-A78C [2] - Cortex-X1 [3] - Cortex-X4 [4] - Neoverse V3 [5] [1] https://developer.arm.com/documentation/101590/0103/Statistical-Profiling-Extension-Support/Statistical-Profiling-Extension-data-source-packet [2] https://developer.arm.com/documentation/102226/0002/Debug-descriptions/Statistical-Profiling-Extension/implementation-defined-features-of-SPE [3] https://developer.arm.com/documentation/101433/0102/Debug-descriptions/Statistical-Profiling-Extension/implementation-defined-features-of-SPE [4] https://developer.arm.com/documentation/102484/0003/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet [5] https://developer.arm.com/documentation/107734/0002/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:01:12 -08:00
Arnaldo Carvalho de Melo	e28f834f57	perf auxtrace: Include sys/types.h for pid_t In `754187ad73` ("perf build: Remove NO_AUXTRACE build option") sys/types.h was removed, which broke the build in all Alpine Linux releases, as musl libc has pid_t defined via sys/types.h, add it back. Fixes: `754187ad73` ("perf build: Remove NO_AUXTRACE build option") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-03 11:01:05 -08:00
Namhyung Kim	4fba95fc38	perf tools: Use machine->root_dir to find /proc/kallsyms This is for test functions to find the kallsyms correctly. It can find the machine from the kernel maps and use its root_dir. This is helpful to setup fake /proc directory for testing. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	295d8a03ca	perf tools: Fallback to initial kernel map properly In maps__split_kallsyms(), it assumes new kernel map when it finds a symbol without module after any module and the initial kernel map has some symbols. Because it expects modules are out of the kernel map so modules should not have symbols in the kernel map. For example, the following memory map shows symbols and maps. Any symbols in the module 1 area will go to the module 1. The main kernel map starts at 0xffffffffbc200000. But if any symbol has a module between the symbols in that area, next symbols after 0xffffffffbd008000 will generate new kernel maps like [kernel].1. kernel address \| \| \| \| 0xffffffffc0000000 \|---------------------\| \| (symbols) \| \| ... \| <--- [kernel].N 0xffffffffbc400000 \|---------------------\| \| (symbols) \| \| module 2 \| <--- bad? 0xffffffffbc380000 \|---------------------\| \| ... \| \| (symbols) \| \| [kernel.kallsyms] \| <--- initial map 0xffffffffbc200000 \|---------------------\| \| \| \| \| 0xffffffffabcde000 \|---------------------\| \| (symbols) \| \| module 1 \| 0xffffffffabcd0000 \|---------------------\| This is very fragile when the module has a symbol that falls into the main kernel map for some reason. My system has a livepatch module with such symbols. And it created a lot of new kernel maps after those symbols. But the symbol may have broken addresses and the later symbols can still be found in the initial kernel map. Let's check the symbol address in the initial map and use it if found. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	ad0b9c4865	perf tools: Fix split kallsyms DSO counting It's counted twice as it's increased after calling maps__insert(). I guess we want to increase it only after it's added properly. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: `2e538c4a18` ("perf tools: Improve kernel/modules symbol lookup") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	7da4d60db3	perf tools: Mark split kallsyms DSOs as loaded The maps__split_kallsyms() will split symbols to module DSOs if it comes from a module. It also handled some unusual kernel symbols after modules by creating new kernel maps like "[kernel].0". But they are pseudo DSOs to have those unexpected symbols. They should not be considered as unloaded kernel DSOs. Otherwise the dso__load() for them will end up calling dso__load_kallsyms() and then maps__split_kallsyms() again and again. Reviewed-by: Ian Rogers <irogers@google.com> Fixes: `2e538c4a18` ("perf tools: Improve kernel/modules symbol lookup") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	405f5756bb	perf tools: Flush remaining samples w/o deferred callchains It's possible that some kernel samples don't have matching deferred callchain records when the profiling session was ended before the threads came back to userspace. Let's flush the samples before finish the session. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	9b4525fd08	perf tools: Merge deferred user callchains Save samples with deferred callchains in a separate list and deliver them after merging the user callchains. If users don't want to merge they can set tool->merge_deferred_callchains to false to prevent the behavior. With previous result, now perf script will show the merged callchains. $ perf script ... pwd 2312 121.163435: 249113 cpu/cycles/P: ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms]) ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms]) ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms]) ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms]) ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms]) ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms]) ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms]) 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) ... The old output can be get using --no-merge-callchain option. Also perf report can get the user callchain entry at the end. $ perf report --no-children --stdio -q -S __build_id_parse.isra.0 # symbol: __build_id_parse.isra.0 8.40% pwd [kernel.kallsyms] \| ---__build_id_parse.isra.0 perf_event_mmap mprotect_fixup do_mprotect_pkey __x64_sys_mprotect do_syscall_64 entry_SYSCALL_64_after_hwframe mprotect _dl_sysdep_start _dl_start_user Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	25a9dd56cf	perf script: Display PERF_RECORD_CALLCHAIN_DEFERRED Handle the deferred callchains in the script output. $ perf script ... pwd 2312 121.163435: 249113 cpu/cycles/P: ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms]) ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms]) ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms]) ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms]) ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms]) ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms]) ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms]) b00000006 (cookie) ([unknown]) pwd 2312 121.163447: DEFERRED CALLCHAIN [cookie: b00000006] 7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) 7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:14 -08:00
Namhyung Kim	27ddc1d7a6	perf record: Add --call-graph fp,defer option for deferred callchains Add a new callchain record mode option for deferred callchains. For now it only works with FP (frame-pointer) mode. And add the missing feature detection logic to clear the flag on old kernels. $ perf record --call-graph fp,defer -vv true ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP\|TID\|TIME\|CALLCHAIN\|PERIOD read_format ID\|LOST disabled 1 inherit 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 defer_callchain 1 defer_output 1 ------------------------------------------------------------ sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open failed, error -22 switching off deferred callchain support Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 21:59:13 -08:00
Namhyung Kim	f4e3381648	perf tools: Minimal DEFERRED_CALLCHAIN support Add a new event type for deferred callchains and a new callback for the struct perf_tool. For now it doesn't actually handle the deferred callchains but it just marks the sample if it has the PERF_CONTEXT_ USER_DEFFERED in the callchain array. At least, perf report can dump the raw data with this change. Actually this requires the next commit to enable attr.defer_callchain, but if you already have a data file, it'll show the following result. $ perf report -D ... 0x2158@perf.data [0x40]: event: 22 . . ... raw event: size 64 bytes . 0000: 16 00 00 00 02 00 40 00 06 00 00 00 0b 00 00 00 ......@......... . 0010: 03 00 00 00 00 00 00 00 a7 7f 33 fe 18 7f 00 00 ..........3..... . 0020: 0f 0e 33 fe 18 7f 00 00 48 14 33 fe 18 7f 00 00 ..3.....H.3..... . 0030: 08 09 00 00 08 09 00 00 e6 7a e7 35 1c 00 00 00 .........z.5.... 121163447014 0x2158 [0x40]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2312/2312: 0xb00000006 ... FP chain: nr:3 ..... 0: 00007f18fe337fa7 ..... 1: 00007f18fe330e0f ..... 2: 00007f18fe331448 : unhandled! Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 16:13:32 -08:00
Ian Rogers	6603c3c1fe	perf python: Correct copying of metric_leader in an evsel Ensure the metric_leader is copied and set up correctly. In compute_metric determine the correct metric_leader event to match the requested CPU. Fixes the handling of metrics particularly on hybrid machines. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 16:12:49 -08:00
Namhyung Kim	25d498e636	perf jitdump: Add sym/str-tables to build-ID generation It was reported that python backtrace with JIT dump was broken after the change to built-in SHA-1 implementation. It seems python generates the same JIT code for each function. They will become separate DSOs but the contents are the same. Only difference is in the symbol name. But this caused a problem that every JIT'ed DSOs will have the same build-ID which makes perf confused. And it resulted in no python symbols (from JIT) in the output. Looking back at the original code before the conversion, it used the load_addr as well as the code section to distinguish each DSO. But it'd be better to use contents of symtab and strtab instead as it aligns with some linker behaviors. This patch adds a buffer to save all the contents in a single place for SHA-1 calculation. Probably we need to add sha1_update() or similar to update the existing hash value with different contents and use it here. But it's out of scope for this change and I'd like something that can be backported to the stable trees easily. Reviewed-by: Ian Rogers <irogers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Pablo Galindo <pablogsal@gmail.com> Cc: Fangrui Song <maskray@sourceware.org> Link: https://github.com/python/cpython/issues/139544 Fixes: `e3f612c1d8` ("perf genelf: Remove libcrypto dependency and use built-in sha1()") Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 16:06:57 -08:00
Namhyung Kim	441863ae3d	perf tools: Remove a trailing newline in the event terms So that it can show the correct encoding info in the JSON output. $ perf list -j hw [ { "Unit": "cpu", "Topic": "legacy hardware", "EventName": "branch-instructions", "EventType": "Kernel PMU event", "BriefDescription": "Retired branch instructions [This event is an alias of branches]", "Encoding": "cpu/event=0xc4/" }, ... Reviewed-by: Ian Rogers <irogers@google.com> Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-12-02 16:02:33 -08:00
James Clark	834ebb5678	perf tools: Don't read build-ids from non-regular files Simplify the build ID reading code by removing the non-blocking option. Having to pass the correct option to this function was fragile and a mistake would result in a hang, see the linked fix. Furthermore, compressed files are always opened blocking anyway, ignoring the non-blocking option. We also don't expect to read build IDs from non-regular files. The only hits to this function that are non-regular are devices that won't be elf files with build IDs, for example "/dev/dri/renderD129". Now instead of opening these as non-blocking and failing to read, we skip them. Even if something like a pipe or character device did have a build ID, I don't think it would have worked because you need to call read() in a loop, check for -EAGAIN and handle timeouts to make non-blocking reads work. Link: https://lore.kernel.org/linux-perf-users/20251022-james-perf-fix-dso-block-v1-1-c4faab150546@linaro.org/ Signed-off-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-26 10:13:38 -08:00
Anubhav Shelat	87c75fa755	perf pmu: fix duplicate conditional statement Remove duplicate check for PERF_PMU_TYPE_DRM_END in perf_pmu__kind. Fixes: `f0feb21e0a` ("perf pmu: Add PMU kind to simplify differentiating") Signed-off-by: Anubhav Shelat <ashelat@redhat.com> Reviewed-by: Ian Rogers <irogers@google.com> Closes: https://lore.kernel.org/linux-perf-users/CA+G8Dh+wLx+FvjjoEkypqvXhbzWEQVpykovzrsHi2_eQjHkzQA@mail.gmail.com/ Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-26 09:57:13 -08:00
James Clark	14a84c708e	perf tools: Add support for perf_event_attr::config4 perf_event_attr has gained a new field, config4, so add support for it extending the existing configN support. Reviewed-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Tested-by: Leo Yan <leo.yan@arm.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-24 12:20:06 -08:00
Hrishikesh Suresh	9bef5cead6	perf: replace strcpy() with strncpy() in util/jitdump.c Usage of strcpy() can lead to buffer overflows. Therefore, it has been replaced with strncpy(). The output file path is provided as a parameter and might be restricted by command-line by default. But this defensive patch will prevent any potential overflow, making the code more robust against future changes in input handling. Testing: - ran perf test from tools/perf and did not observe any regression with the earlier code Signed-off-by: Hrishikesh Suresh <hrishikesh123s@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-20 11:13:04 -08:00
Ian Rogers	d53b499658	perf evsel: Skip store_evsel_ids for non-perf-event PMUs The IDs are associated with perf events and not applicable to non-perf event PMUs. The failure to generate the ids was causing perf stat record to fail. ``` $ perf stat record -a sleep 1 Performance counter stats for 'system wide': 47,941 context-switches # nan cs/sec cs_per_second 0.00 msec cpu-clock # 0.0 CPUs CPUs_utilized 3,261 cpu-migrations # nan migrations/sec migrations_per_second 516 page-faults # nan faults/sec page_faults_per_second 7,525,483 cpu_core/branch-misses/ # 2.3 % branch_miss_rate 322,069,004 cpu_core/branches/ # nan M/sec branch_frequency 1,895,684,291 cpu_core/cpu-cycles/ # nan GHz cycles_frequency 2,789,777,426 cpu_core/instructions/ # 1.5 instructions insn_per_cycle 7,074,765 cpu_atom/branch-misses/ # 3.2 % branch_miss_rate (49.89%) 224,225,412 cpu_atom/branches/ # nan M/sec branch_frequency (50.29%) 2,061,679,981 cpu_atom/cpu-cycles/ # nan GHz cycles_frequency (50.33%) 2,011,242,533 cpu_atom/instructions/ # 1.0 instructions insn_per_cycle (50.33%) TopdownL1 (cpu_core) # 9.0 % tma_bad_speculation # 28.3 % tma_frontend_bound # 35.2 % tma_backend_bound # 27.5 % tma_retiring TopdownL1 (cpu_atom) # 36.8 % tma_backend_bound (59.65%) # 22.8 % tma_frontend_bound (59.60%) # 11.6 % tma_bad_speculation # 28.8 % tma_retiring (59.59%) 1.006777519 seconds time elapsed $ perf stat report Performance counter stats for 'perf': 1,013,376,154 duration_time <not counted> duration_time <not counted> duration_time <not counted> duration_time <not counted> duration_time <not counted> duration_time 47,941 context-switches 0.00 msec cpu-clock 3,261 cpu-migrations 516 page-faults 7,525,483 cpu_core/branch-misses/ 322,069,814 cpu_core/branches/ 322,069,004 cpu_core/branches/ 1,895,684,291 cpu_core/cpu-cycles/ 1,895,679,209 cpu_core/cpu-cycles/ 2,789,777,426 cpu_core/instructions/ <not counted> cpu_core/cpu-cycles/ <not counted> cpu_core/stalled-cycles-frontend/ <not counted> cpu_core/cpu-cycles/ <not counted> cpu_core/stalled-cycles-backend/ <not counted> cpu_core/stalled-cycles-backend/ <not counted> cpu_core/instructions/ <not counted> cpu_core/stalled-cycles-frontend/ 7,074,765 cpu_atom/branch-misses/ (49.89%) 221,679,088 cpu_atom/branches/ (49.89%) 224,225,412 cpu_atom/branches/ (50.29%) 2,061,679,981 cpu_atom/cpu-cycles/ (50.33%) 2,016,259,567 cpu_atom/cpu-cycles/ (50.33%) 2,011,242,533 cpu_atom/instructions/ (50.33%) <not counted> cpu_atom/cpu-cycles/ <not counted> cpu_atom/stalled-cycles-frontend/ <not counted> cpu_atom/cpu-cycles/ <not counted> cpu_atom/stalled-cycles-backend/ <not counted> cpu_atom/stalled-cycles-backend/ <not counted> cpu_atom/instructions/ <not counted> cpu_atom/stalled-cycles-frontend/ 17,145,113 cpu_core/INT_MISC.UOP_DROPPING/ 10,594,226,100 cpu_core/TOPDOWN.SLOTS/ 2,919,021,401 cpu_core/topdown-retiring/ 943,101,838 cpu_core/topdown-bad-spec/ 3,031,152,533 cpu_core/topdown-fe-bound/ 3,739,756,791 cpu_core/topdown-be-bound/ 1,909,501,648 cpu_atom/CPU_CLK_UNHALTED.CORE/ (60.04%) 3,516,608,359 cpu_atom/TOPDOWN_BE_BOUND.ALL/ (59.65%) 2,179,403,876 cpu_atom/TOPDOWN_FE_BOUND.ALL/ (59.60%) 2,745,732,458 cpu_atom/TOPDOWN_RETIRING.ALL/ (59.59%) 1.006777519 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog ``` Reported-by: James Clark <james.clark@linaro.org> Closes: https://lore.kernel.org/lkml/ca0f0cd3-7335-48f9-8737-2f70a75b019a@linaro.org/ Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Thomas Falcon <thomas.falcon@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-19 17:49:45 -08:00
Ian Rogers	f0feb21e0a	perf pmu: Add PMU kind to simplify differentiating Rather than perf_pmu__is_xxx calls, and a notion of kind so that a single call can be used. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-19 17:49:45 -08:00
Ian Rogers	1a6b0deb2b	perf header: Switch "cpu" for find_core_pmu in caps feature writing Writing currently fails on non-x86 and hybrid CPUs. Switch to the more regular find_core_pmu that is normally used in this case. Tested on hybrid alderlake system. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-19 17:49:45 -08:00
Ian Rogers	245cfbcd3d	perf maps: Avoid RC_CHK use after free The case of __maps__fixup_overlap_and_insert where the "new" maps covers existing mappings can create a use-after-free with reference count checking enabled. The issue is that "pos" holds a map pointer from maps_by_address that is put from maps_by_address but then used to look for a map in maps_by_name (the compared map is now a use-after-free). The issue stems from using maps__remove which redoes some of the searches already done by __maps__fixup_overlap_and_insert, so optimize the code (by avoiding repeated searches) and avoid the use-after-free by inlining the appropriate removal code. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202511141407.f9edcfa6-lkp@intel.com Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-19 16:20:15 -08:00
Leo Yan	87cc0b44fc	perf arm_spe: Synthesize memory samples for SIMD operations Synthesize memory samples for SIMD operations (including Advanced SIMD, SVE, and SME). To provide complete information, also generate data source entries for SIMD operations. Since memory operations are not limited to load and store, set PERF_MEM_OP_STORE if the operation does not fall into these cases. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:30 -08:00
Leo Yan	b70aa41078	perf arm_spe: Expose SIMD information in other operations The other operations contain SME data processing, ASE (Advanced SIMD) and floating-point operations. Expose these info in the records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:30 -08:00
Leo Yan	d67835cd5d	perf arm_spe: Report GCS in record Report GCS related info in records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:30 -08:00
Leo Yan	d4b61de44f	perf arm_spe: Report memset and memcpy in records Expose memset and memcpy related info in records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:30 -08:00
Leo Yan	6d47c32ccb	perf arm_spe: Report associated info for SVE / SME operations SVE / SME operations can be predicated or Gather load / scatter store, save the relevant info into record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	f3b9bed72e	perf arm_spe: Report extended memory operations in records Extended memory operations include atomic (AT), acquire/release (AR), and exclusive (EXCL) operations. Save the relevant information in the records. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	c462dc70b1	perf arm_spe: Report MTE allocation tag in record Save MTE tag info in memory record. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	77e4291eaf	perf arm_spe: Report register access in record Record register access info for load / store operations. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	cdc1aff17f	perf arm_spe: Introduce data processing macro for SVE operations Introduce the ARM_SPE_OP_DP (data processing) macro as associated information for SVE operations. For SVE register access, only ARM_SPE_OP_SVE is set; for SVE data processing, both ARM_SPE_OP_SVE and ARM_SPE_OP_DP are set together. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	b64bf913b3	perf arm_spe: Consolidate operation types Consolidate operation types in a way: (a) Extract the second-level types into separate enums. (b) The second-level types for memory and SIMD operations are classified by modules. E.g., an operation may relate to general register, SIMD/FP, SVE, etc. (c) The associated information tells details. E.g., an operation is load or store, whether it is atomic operation, etc. Start the enum items for the second-level types from 8 to accommodate more entries within a 32-bit integer. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	c7c198b3ed	perf arm_spe: Remove unused operation types Remove unused SVE operation types. These operations will be reintroduced in subsequent refactoring, but with a different format. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	c4cfe1bceb	perf arm_spe: Decode SME data processing packet For SME data processing, decode its Effective vector length or Tile Size (ETS), and print out if a floating-point operation. After: . 00000000: 49 00 SME-OTHER ETS 1024 FP . 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18 . 0000000b: 9a 00 00 LAT 0 XLAT . `0000000e`: 43 00 DATA-SOURCE 0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	876294a645	perf arm_spe: Decode ASE and FP fields in other operation Add a check for other operation, which prevents any incorrectly classifying. Parse the ASE and FP fields. After: . 0000002f: 48 06 OTHER ASE FP INSN-OTHER . 00000031: b2 08 80 48 01 08 00 ff ff VA 0xffff000801488008 . 0000003a: 9a 00 00 LAT 0 XLAT . 0000003d: 42 16 EV RETIRED L1D-ACCESS TLB-ACCESS Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	c8bf2a05df	perf arm_spe: Rename SPE_OP_PKT_IS_OTHER_SVE_OP macro Rename the macro to SPE_OP_PKT_OTHER_SUBCLASS_SVE to unify naming. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	b4eaece3d9	perf arm_spe: Decode GCS operation Decode a load or store from a GCS operation and the associated "common" field. After: . 00000000: 49 44 LD GCS COMM . 00000002: b2 18 3c d7 83 00 80 ff ff VA 0xffff800083d73c18 . 0000000b: 9a 00 00 LAT 0 XLAT . `0000000e`: 43 00 DATA-SOURCE 0 Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	b61ca7219d	perf arm_spe: Unify operation naming Rename extended subclass and SVE/SME register access subclass, so that the naming can be consistent cross all sub classes. Add an log "SVE-SME-REG" for the SVE/SME register access, this is easier for parsing. Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Leo Yan	33e1fffea4	perf arm_spe: Fix memset subclass in operation The operation subclass is extracted from bits [7..1] of the payload. Since bit [0] is not parsed, there is no chance to match the memset type (0x25). As a result, the memset payload is never parsed successfully. Instead of extracting a unified bit field, change to extract the specific bits for each operation subclass. Fixes: `34fb60400e` ("perf arm-spe: Add raw decoding for SPEv1.3 MTE and MOPS load/store") Signed-off-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-18 20:31:29 -08:00
Ian Rogers	d8d8a0b360	perf tool_pmu: More accurately set the cpus for tool events The user and system time events can record on different CPUs, but for all other events a single CPU map of just CPU 0 makes sense. In parse-events detect a tool PMU and then pass the perf_event_attr so that the tool_pmu can return CPUs specific for the event. This avoids a CPU map of all online CPUs being used for events like duration_time. Avoiding this avoids the evlist CPUs containing CPUs for which duration_time just gives 0. Minimizing the evlist CPUs can remove unnecessary sched_setaffinity syscalls that delay metric calculations. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org>	2025-11-17 18:43:09 -08:00

1 2 3 4 5 ...

10053 Commits