perf docs: arm-spe: Document new SPE filtering features
FEAT_SPE_EFT and FEAT_SPE_FDS etc have new user facing format attributes so document them. Also document existing 'event_filter' bits that were missing from the doc and the fact that latency values are stored in the weight field. Reviewed-by: Leo Yan <leo.yan@arm.com> Tested-by: Leo Yan <leo.yan@arm.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: James Clark <james.clark@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
This commit is contained in:
committed by
Namhyung Kim
parent
14a84c708e
commit
5accdaec52
@@ -141,27 +141,65 @@ Config parameters
|
||||
These are placed between the // in the event and comma separated. For example '-e
|
||||
arm_spe/load_filter=1,min_latency=10/'
|
||||
|
||||
branch_filter=1 - collect branches only (PMSFCR.B)
|
||||
event_filter=<mask> - filter on specific events (PMSEVFR) - see bitfield description below
|
||||
event_filter=<mask> - logical AND filter on specific events (PMSEVFR) - see bitfield description below
|
||||
inv_event_filter=<mask> - logical OR to filter out specific events (PMSNEVFR, FEAT_SPEv1p2) - see bitfield description below
|
||||
jitter=1 - use jitter to avoid resonance when sampling (PMSIRR.RND)
|
||||
load_filter=1 - collect loads only (PMSFCR.LD)
|
||||
min_latency=<n> - collect only samples with this latency or higher* (PMSLATFR)
|
||||
pa_enable=1 - collect physical address (as well as VA) of loads/stores (PMSCR.PA) - requires privilege
|
||||
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
|
||||
store_filter=1 - collect stores only (PMSFCR.ST)
|
||||
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
|
||||
discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
|
||||
inv_data_src_filter=<mask> - mask to filter from 0-63 possible data sources (PMSDSFR, FEAT_SPE_FDS) - See 'Data source filtering'
|
||||
|
||||
+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
|
||||
than only the execution latency.
|
||||
|
||||
Only some events can be filtered on; these include:
|
||||
Only some events can be filtered on using 'event_filter' bits. The overall
|
||||
filter is the logical AND of these bits, for example if bits 3 and 5 are set
|
||||
only samples that have both 'L1D cache refill' AND 'TLB walk' are recorded. When
|
||||
FEAT_SPEv1p2 is implemented 'inv_event_filter' can also be used to exclude
|
||||
events that have any (OR) of the filter's bits set. For example setting bits 3
|
||||
and 5 in 'inv_event_filter' will exclude any events that are either L1D cache
|
||||
refill OR TLB walk. If the same bit is set in both filters it's UNPREDICTABLE
|
||||
whether the sample is included or excluded. Filter bits for both event_filter
|
||||
and inv_event_filter are:
|
||||
|
||||
bit 1 - instruction retired (i.e. omit speculative instructions)
|
||||
bit 1 - Instruction retired (i.e. omit speculative instructions)
|
||||
bit 2 - L1D access (FEAT_SPEv1p4)
|
||||
bit 3 - L1D refill
|
||||
bit 4 - TLB access (FEAT_SPEv1p4)
|
||||
bit 5 - TLB refill
|
||||
bit 7 - mispredict
|
||||
bit 11 - misaligned access
|
||||
bit 6 - Not taken event (FEAT_SPEv1p2)
|
||||
bit 7 - Mispredict
|
||||
bit 8 - Last level cache access (FEAT_SPEv1p4)
|
||||
bit 9 - Last level cache miss (FEAT_SPEv1p4)
|
||||
bit 10 - Remote access (FEAT_SPEv1p4)
|
||||
bit 11 - Misaligned access (FEAT_SPEv1p1)
|
||||
bit 12-15 - IMPLEMENTATION DEFINED events (when implemented)
|
||||
bit 16 - Transaction (FEAT_TME)
|
||||
bit 17 - Partial or empty SME or SVE predicate (FEAT_SPEv1p1)
|
||||
bit 18 - Empty SME or SVE predicate (FEAT_SPEv1p1)
|
||||
bit 19 - L2D access (FEAT_SPEv1p4)
|
||||
bit 20 - L2D miss (FEAT_SPEv1p4)
|
||||
bit 21 - Cache data modified (FEAT_SPEv1p4)
|
||||
bit 22 - Recently fetched (FEAT_SPEv1p4)
|
||||
bit 23 - Data snooped (FEAT_SPEv1p4)
|
||||
bit 24 - Streaming SVE mode event (when FEAT_SPE_SME is implemented), or
|
||||
IMPLEMENTATION DEFINED event 24 (when implemented, only versions
|
||||
less than FEAT_SPEv1p4)
|
||||
bit 25 - SMCU or external coprocessor operation event when FEAT_SPE_SME is
|
||||
implemented, or IMPLEMENTATION DEFINED event 25 (when implemented,
|
||||
only versions less than FEAT_SPEv1p4)
|
||||
bit 26-31 - IMPLEMENTATION DEFINED events (only versions less than FEAT_SPEv1p4)
|
||||
bit 48-63 - IMPLEMENTATION DEFINED events (when implemented)
|
||||
|
||||
For IMPLEMENTATION DEFINED bits, refer to the CPU TRM if these bits are
|
||||
implemented.
|
||||
|
||||
The driver will reject events if requested filter bits require unimplemented SPE
|
||||
versions, but will not reject filter bits for unimplemented IMPDEF bits or when
|
||||
their related feature is not present (e.g. SME). For example, if FEAT_SPEv1p2 is
|
||||
not implemented, filtering on "Not taken event" (bit 6) will be rejected.
|
||||
|
||||
So to sample just retired instructions:
|
||||
|
||||
@@ -171,6 +209,31 @@ or just mispredicted branches:
|
||||
|
||||
perf record -e arm_spe/event_filter=0x80/ -- ./mybench
|
||||
|
||||
When set, the following filters can be used to select samples that match any of
|
||||
the operation types (OR filtering). If only one is set then only samples of that
|
||||
type are collected:
|
||||
|
||||
branch_filter=1 - Collect branches (PMSFCR.B)
|
||||
load_filter=1 - Collect loads (PMSFCR.LD)
|
||||
store_filter=1 - Collect stores (PMSFCR.ST)
|
||||
|
||||
When extended filtering is supported (FEAT_SPE_EFT), SIMD and float
|
||||
pointer operations can also be selected:
|
||||
|
||||
simd_filter=1 - Collect SIMD loads, stores and operations (PMSFCR.SIMD)
|
||||
float_filter=1 - Collect floating point loads, stores and operations (PMSFCR.FP)
|
||||
|
||||
When extended filtering is supported (FEAT_SPE_EFT), operation type filters can
|
||||
be changed to AND using _mask fields. For example samples could be selected if
|
||||
they are store AND SIMD by setting 'store_filter=1,simd_filter=1,
|
||||
store_filter_mask=1,simd_filter_mask=1'. The new masks are as follows:
|
||||
|
||||
branch_filter_mask=1 - Change branch filter behavior from OR to AND (PMSFCR.Bm)
|
||||
load_filter_mask=1 - Change load filter behavior from OR to AND (PMSFCR.LDm)
|
||||
store_filter_mask=1 - Change store filter behavior from OR to AND (PMSFCR.STm)
|
||||
simd_filter_mask=1 - Change SIMD filter behavior from OR to AND (PMSFCR.SIMDm)
|
||||
float_filter_mask=1 - Change floating point filter behavior from OR to AND (PMSFCR.FPm)
|
||||
|
||||
Viewing the data
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -210,6 +273,10 @@ Memory access details are also stored on the samples and this can be viewed with
|
||||
|
||||
perf report --mem-mode
|
||||
|
||||
The latency value from the SPE sample is stored in the 'weight' field of the
|
||||
Perf samples and can be displayed in Perf script and report outputs by enabling
|
||||
its display from the command line.
|
||||
|
||||
Common errors
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
@@ -253,6 +320,25 @@ to minimize output. Then run perf stat:
|
||||
perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
|
||||
perf stat -e SAMPLE_FEED_LD
|
||||
|
||||
Data source filtering
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When FEAT_SPE_FDS is present, 'inv_data_src_filter' can be used as a mask to
|
||||
filter on a subset (0 - 63) of possible data source IDs. The full range of data
|
||||
sources is 0 - 65535 although these are unlikely to be used in practice. Data
|
||||
sources are IMPDEF so refer to the TRM for the mappings. Each bit N of the
|
||||
filter maps to data source N. The filter is an OR of all the bits, and the value
|
||||
provided inv_data_src_filter is inverted before writing to PMSDSFR_EL1 so that
|
||||
set bits exclude that data source and cleared bits include that data source.
|
||||
Therefore the default value of 0 is equivalent to no filtering (all data sources
|
||||
included).
|
||||
|
||||
For example, to include only data sources 0 and 3, clear bits 0 and 3
|
||||
(0xFFFFFFFFFFFFFFF6)
|
||||
|
||||
When 'inv_data_src_filter' is set to 0xFFFFFFFFFFFFFFFF, any samples with any
|
||||
data source set are excluded.
|
||||
|
||||
SEE ALSO
|
||||
--------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user