1413527 Commits

Author SHA1 Message Date
Nirjhar Roy (IBM)
a65fd81207 xfs: Fix xfs_grow_last_rtg()
The last rtg should be able to grow when the size of the last is less
than (and not equal to) sb_rgextents. xfs_growfs with realtime groups
fails without this patch. The reason is that, xfs_growfs_rtg() tries
to grow the last rt group even when the last rt group is at its
maximal size i.e, sb_rgextents. It fails with the following messages:

XFS (loop0): Internal error block >= mp->m_rsumblocks at line 253 of file fs/xfs/libxfs/xfs_rtbitmap.c.  Caller xfs_rtsummary_read_buf+0x20/0x80
XFS (loop0): Corruption detected. Unmount and run xfs_repair
XFS (loop0): Internal error xfs_trans_cancel at line 976 of file fs/xfs/xfs_trans.c.  Caller xfs_growfs_rt_bmblock+0x402/0x450
XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x10a/0x1f0 (fs/xfs/xfs_trans.c:977).  Shutting down filesystem.
XFS (loop0): Please unmount the filesystem and rectify the problem(s)

Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13 10:40:45 +01:00
Christoph Hellwig
df7ec7226f xfs: improve the assert at the top of xfs_log_cover
Move each condition into a separate assert so that we can see which
on triggered.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13 10:36:23 +01:00
Christoph Hellwig
baed03efe2 xfs: fix an overly long line in xfs_rtgroup_calc_geometry
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13 10:34:29 +01:00
Christoph Hellwig
e0aea42a32 xfs: mark __xfs_rtgroup_extents static
__xfs_rtgroup_extents is not used outside of xfs_rtgroup.c, so mark it
static.  Move it and xfs_rtgroup_extents up in the file to avoid forward
declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13 10:34:29 +01:00
Nirjhar Roy (IBM)
6b2d155366 xfs: Fix the return value of xfs_rtcopy_summary()
xfs_rtcopy_summary() should return the appropriate error code
instead of always returning 0. The caller of this function which is
xfs_growfs_rt_bmblock() is already handling the error.

Fixes: e94b53ff69 ("xfs: cache last bitmap block in realtime allocator")
Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # v6.7
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13 10:32:12 +01:00
Eric Dumazet
ffe4ccd359 net: add net.core.qdisc_max_burst
In blamed commit, I added a check against the temporary queue
built in __dev_xmit_skb(). Idea was to drop packets early,
before any spinlock was acquired.

if (unlikely(defer_count > READ_ONCE(q->limit))) {
	kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
	return NET_XMIT_DROP;
}

It turned out that HTB Qdisc has a zero q->limit.
HTB limits packets on a per-class basis.
Some of our tests became flaky.

Add a new sysctl : net.core.qdisc_max_burst to control
how many packets can be stored in the temporary lockless queue.

Also add a new QDISC_BURST_DROP drop reason to better diagnose
future issues.

Thanks Neal !

Fixes: 100dfa74ca ("net: dev_queue_xmit() llist adoption")
Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-13 10:12:11 +01:00
Ludovic Desroches
9380dc33cd drm/panel: simple: restore connector_type fallback
The switch from devm_kzalloc() + drm_panel_init() to
devm_drm_panel_alloc() introduced a regression.

Several panel descriptors do not set connector_type. For those panels,
panel_simple_probe() used to compute a connector type (currently DPI as a
fallback) and pass that value to drm_panel_init(). After the conversion
to devm_drm_panel_alloc(), the call unconditionally used
desc->connector_type instead, ignoring the computed fallback and
potentially passing DRM_MODE_CONNECTOR_Unknown, which
drm_panel_bridge_add() does not allow.

Move the connector_type validation / fallback logic before the
devm_drm_panel_alloc() call and pass the computed connector_type to
devm_drm_panel_alloc(), so panels without an explicit connector_type
once again get the DPI default.

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Fixes: de04bb0089 ("drm/panel/panel-simple: Use the new allocation in place of devm_kzalloc()")
Cc: stable@vger.kernel.org
Reviewed-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Link: https://lore.kernel.org/stable/20251126-lcd_panel_connector_type_fix-v2-1-c15835d1f7cb%40microchip.com
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251218-lcd_panel_connector_type_fix-v3-1-ddcea6d8d7ef@microchip.com
2026-01-13 10:07:40 +01:00
Marek Vasut
6ab3d4353b drm/panel-simple: fix connector type for DataImage SCF0700C48GGU18 panel
The connector type for the DataImage SCF0700C48GGU18 panel is missing and
devm_drm_panel_bridge_add() requires connector type to be set. This leads
to a warning and a backtrace in the kernel log and panel does not work:
"
WARNING: CPU: 3 PID: 38 at drivers/gpu/drm/bridge/panel.c:379 devm_drm_of_get_bridge+0xac/0xb8
"
The warning is triggered by a check for valid connector type in
devm_drm_panel_bridge_add(). If there is no valid connector type
set for a panel, the warning is printed and panel is not added.
Fill in the missing connector type to fix the warning and make
the panel operational once again.

Cc: stable@vger.kernel.org
Fixes: 97ceb1fb08 ("drm/panel: simple: Add support for DataImage SCF0700C48GGU18")
Signed-off-by: Marek Vasut <marex@nabladev.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260110152750.73848-1-marex@nabladev.com
2026-01-13 10:06:37 +01:00
Luo Haiyang
f2edf797da irqchip/riscv-imsic: Revert "Remove redundant irq_data lookups"
Commit c475c0b71314("irqchip/riscv-imsic: Remove redundant irq_data
lookups") leads to a NULL pointer deference in imsic_msi_update_msg():

 virtio_blk virtio1: 8/0/0 default/read/poll queues
 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
 Current kworker/u32:2 pgtable: 4K pagesize, 48-bit VAs, pgdp=0x0000000081c33000
 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
 CPU: 5 UID: 0 PID: 75 Comm: kworker/u32:2 Not tainted 6.19.0-rc4-next-20260109 #1 NONE
 epc : 0x0
  ra : imsic_irq_set_affinity+0x110/0x130

The irq_data argument of imsic_irq_set_affinity() is associated with the
imsic domain and not with the top-level MSI domain. As a consequence the
code dereferences the wrong interrupt chip, which has the
irq_write_msi_msg() callback not populated.

Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260113111930821RrC26avITHWSFCN0bYbgI@zte.com.cn
2026-01-13 09:51:46 +01:00
Lorenzo Bianconi
dfdf774656 net: airoha: Fix typo in airoha_ppe_setup_tc_block_cb definition
Fix Typo in airoha_ppe_dev_setup_tc_block_cb routine definition when
CONFIG_NET_AIROHA is not enabled.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601090517.Fj6v501r-lkp@intel.com/
Fixes: f45fc18b6d ("net: airoha: Add airoha_ppe_dev struct definition")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20260109-airoha_ppe_dev_setup_tc_block_cb-typo-v1-1-282e8834a9f9@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12 19:17:24 -08:00
Jijie Shao
e02f2a0f1f net: phy: motorcomm: fix duplex setting error for phy leds
fix duplex setting error for phy leds

Fixes: 355b82c54c ("net: phy: motorcomm: Add support for PHY LEDs on YT8521")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260108071409.2750607-1-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12 18:01:09 -08:00
Tetsuo Handa
ec69daabe4 bpf: Fix reference count leak in bpf_prog_test_run_xdp()
syzbot is reporting

  unregister_netdevice: waiting for sit0 to become free. Usage count = 2

problem. A debug printk() patch found that a refcount is obtained at
xdp_convert_md_to_buff() from bpf_prog_test_run_xdp().

According to commit ec94670fcb ("bpf: Support specifying ingress via
xdp_md context in BPF_PROG_TEST_RUN"), the refcount obtained by
xdp_convert_md_to_buff() will be released by xdp_convert_buff_to_md().

Therefore, we can consider that the error handling path introduced by
commit 1c19499825 ("bpf: introduce frags support to
bpf_prog_test_run_xdp()") forgot to call xdp_convert_buff_to_md().

Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Fixes: 1c19499825 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/af090e53-9d9b-4412-8acb-957733b3975c@I-love.SAKURA.ne.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-12 16:37:40 -08:00
sheetal
f34b32745e ASoC: tegra: Revert fix for uninitialized flat cache warning in tegra210_ahub
Commit 4d4021b0bb ("ASoC: tegra: Fix uninitialized flat cache warning
in tegra210_ahub") attempted to fix the uninitialized flat cache warning
that is observed for the Tegra210 AHUB driver. However, the change broke
various audio tests because an -EBUSY error is returned when accessing
registers from cache before they are read from hardware. Revert this
change for now, until a proper fix is available.

Fixes: 4d4021b0bb ("ASoC: tegra: Fix uninitialized flat cache warning in tegra210_ahub")
Signed-off-by: sheetal <sheetal@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20251217132524.2844499-1-sheetal@nvidia.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-12 20:32:03 +00:00
Linus Torvalds
b71e635fee Merge tag 'cgroup-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:

 - Fix -Wflex-array-member-not-at-end warnings in cgroup_root

* tag 'cgroup-for-6.19-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: Eliminate cgrp_ancestor_storage in cgroup_root
2026-01-12 09:56:17 -10:00
Rafael J. Wysocki
8f334e3522 ACPI: PM: s2idle: Add missing checks to acpi_s2idle_begin_lps0()
Commit 32ece31db4 ("ACPI: PM: s2idle: Only retrieve constraints
when needed"), that attempted to avoid useless evaluation of LPS0 _DSM
Function 1 in lps0_device_attach(), forgot to add checks for
lps0_device_handle and sleep_no_lps0 to acpi_s2idle_begin_lps0()
where they should be done before calling lpi_device_get_constraints()
or lpi_device_get_constraints_amd().

Add the missing checks.

Fixes: 32ece31db4 ("ACPI: PM: s2idle: Only retrieve constraints when needed")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/2818730.mvXUDI8C0e@rafael.j.wysocki
2026-01-12 19:33:29 +01:00
Kery Qi
f93fc5d12d net: octeon_ep_vf: fix free_irq dev_id mismatch in IRQ rollback
octep_vf_request_irqs() requests MSI-X queue IRQs with dev_id set to
ioq_vector. If request_irq() fails part-way, the rollback loop calls
free_irq() with dev_id set to 'oct', which does not match the original
dev_id and may leave the irqaction registered.

This can keep IRQ handlers alive while ioq_vector is later freed during
unwind/teardown, leading to a use-after-free or crash when an interrupt
fires.

Fix the error path to free IRQs with the same ioq_vector dev_id used
during request_irq().

Fixes: 1cd3b40797 ("octeon_ep_vf: add Tx/Rx processing and interrupt support")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Link: https://patch.msgid.link/20260108164256.1749-2-qikeyu2017@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-12 09:04:52 -08:00
Anna Schumaker
803e18641f NFS: Don't immediately return directory delegations when disabled
The function nfs_inode_evict_delegation() immediately and synchronously
returns a delegation when called. This means we can't call it from
nfs4_have_delegation(), since that function could be called under a
lock. Instead we should mark the delegation for return and let the state
manager handle it for us.

Fixes: b6d2a520f4 ("NFS: Add a module option to disable directory delegations")
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-12 11:50:22 -05:00
Boqun Feng
05f66cf5e7 PCI: Provide pci_free_irq_vectors() stub
473b9f3317 ("rust: pci: fix build failure when CONFIG_PCI_MSI is
disabled") fixed a build error by providing Rust helpers when
CONFIG_PCI_MSI is not set. However the Rust helpers rely on
pci_free_irq_vectors(), which is only available when CONFIG_PCI=y.

When CONFIG_PCI is not set, there is already a stub for
pci_alloc_irq_vectors().  Add a similar stub for pci_free_irq_vectors().

Fixes: 473b9f3317 ("rust: pci: fix build failure when CONFIG_PCI_MSI is disabled")
Reported-by: FUJITA Tomonori <fujita.tomonori@gmail.com>
Closes: https://lore.kernel.org/rust-for-linux/20251209014312.575940-1-fujita.tomonori@gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512220740.4Kexm4dW-lkp@intel.com/
Reported-by: Liang Jie <liangjie@lixiang.com>
Closes: https://lore.kernel.org/rust-for-linux/20251222034415.1384223-1-buaajxlj@163.com/
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20251226113938.52145-1-boqun.feng@gmail.com
2026-01-12 10:45:31 -06:00
Günther Noack
6abbb8703a landlock: Clarify documentation for the IOCTL access right
Move the description of the LANDLOCK_ACCESS_FS_IOCTL_DEV access right
together with the file access rights.

This group of access rights applies to files (in this case device
files), and they can be added to file or directory inodes using
landlock_add_rule(2).  The check for that works the same for all file
access rights, including LANDLOCK_ACCESS_FS_IOCTL_DEV.

Invoking ioctl(2) on directory FDs can not currently be restricted
with Landlock.  Having it grouped separately in the documentation is a
remnant from earlier revisions of the LANDLOCK_ACCESS_FS_IOCTL_DEV
patch set.

Link: https://lore.kernel.org/all/20260108.Thaex5ruach2@digikod.net/
Signed-off-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20260111175203.6545-2-gnoack3000@gmail.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
2026-01-12 17:07:21 +01:00
Li Ming
d4026a4462 cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve()
In __cxl_dpa_reserve(), it will check if the new resource range is
included in one of paritions of the cxl memory device.
cxlds->nr_paritions is used to represent how many partitions information
the cxl memory device has. In the loop, if driver cannot find a
partition including the new resource range, it will be an infinite loop.

[ dj: Removed incorrect fixes tag ]

Fixes: 991d98f17d ("cxl: Make cxl_dpa_alloc() DPA partition number agnostic")
Signed-off-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20260112120526.530232-1-ming.li@zohomail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-12 08:59:16 -07:00
Jiasheng Jiang
a11224a016 btrfs: fix memory leaks in create_space_info() error paths
In create_space_info(), the 'space_info' object is allocated at the
beginning of the function. However, there are two error paths where the
function returns an error code without freeing the allocated memory:

1. When create_space_info_sub_group() fails in zoned mode.
2. When btrfs_sysfs_add_space_info_type() fails.

In both cases, 'space_info' has not yet been added to the
fs_info->space_info list, resulting in a memory leak. Fix this by
adding an error handling label to kfree(space_info) before returning.

Fixes: 2be12ef79f ("btrfs: Separate space_info create/update")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12 16:21:55 +01:00
Filipe Manana
8826807749 btrfs: invalidate pages instead of truncate after reflinking
Qu reported that generic/164 often fails because the read operations get
zeroes when it expects to either get all bytes with a value of 0x61 or
0x62. The issue stems from truncating the pages from the page cache
instead of invalidating, as truncating can zero page contents. This
zeroing is not just in case the range is not page sized (as it's commented
in truncate_inode_pages_range()) but also in case we are using large
folios, they need to be split and the splitting fails. Stealing Qu's
comment in the thread linked below:

  "We can have the following case:

	0          4K         8K         12K          16K
        |          |          |          |            |
        |<---- Extent A ----->|<----- Extent B ------>|

   The page size is still 4K, but the folio we got is 16K.

   Then if we remap the range for [8K, 16K), then
   truncate_inode_pages_range() will get the large folio 0 sized 16K,
   then call truncate_inode_partial_folio().

   Which later calls folio_zero_range() for the [8K, 16K) range first,
   then tries to split the folio into smaller ones to properly drop them
   from the cache.

   But if splitting failed (e.g. racing with other operations holding the
   filemap lock), the partially zeroed large folio will be kept, resulting
   the range [8K, 16K) being zeroed meanwhile the folio is still a 16K
   sized large one."

So instead of truncating, invalidate the page cache range with a call to
filemap_invalidate_inode(), which besides not doing any zeroing also
ensures that while it's invalidating folios, no new folios are added.

This helps ensure that buffered reads that happen while a reflink
operation is in progress always get either the whole old data (the one
before the reflink) or the whole new data, which is what generic/164
expects.

Link: https://lore.kernel.org/linux-btrfs/7fb9b44f-9680-4c22-a47f-6648cb109ddf@suse.com/
Reported-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12 16:21:55 +01:00
Qu Wenruo
64dd1caf88 btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTAL
The following new features are missing:

- Async checksum

- Shutdown ioctl and auto-degradation

- Larger block size support
  Which is dependent on larger folios.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12 16:21:55 +01:00
Andreas Gruenbacher
469d71512d Revert "gfs2: Fix use of bio_chain"
This reverts commit 8a157e0a0a.

That commit incorrectly assumed that the bio_chain() arguments were
swapped in gfs2.  However, gfs2 intentionally constructs bio chains so
that the first bio's bi_end_io callback is invoked when all bios in the
chain have completed, unlike bio chains where the last bio's callback is
invoked.

Fixes: 8a157e0a0a ("gfs2: Fix use of bio_chain")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-12 14:58:32 +01:00
Rob Herring (Arm)
70d95c5d20 ASoC: dt-bindings: rockchip-spdif: Allow "port" node
Add a "port" node entry for Rockchip S/PDIF binding. It's already in use
and a common property for DAIs.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260108224938.1320809-1-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-12 11:20:06 +00:00
Rob Herring (Arm)
f66e7da2a6 ASoC: dt-bindings: realtek,rt5640: Allow 7 for realtek,jack-detect-source
The driver accepts and uses a value of 7 for realtek,jack-detect-source.
What exactly it means isn't clear though.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260108215307.1138515-2-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-12 11:20:05 +00:00
Rob Herring (Arm)
101b982654 ASoC: dt-bindings: realtek,rt5640: Add missing properties/node
The RT5640 has an MCLK pin and several users already define a clocks
entry. A 'port' node is also in use and a common node for codecs.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260108215307.1138515-1-robh@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
2026-01-12 11:20:04 +00:00
Ben Dooks
81d0223832 drm/i915/guc: make 'guc_hw_reg_state' static as it isn't exported
The guc_hw_reg_state array is not exported, so make it static.
Fixes the following sparse warning:
drivers/gpu/drm/i915/i915_gpu_error.c:692:3: warning: symbol 'guc_hw_reg_state' was not declared. Should it be static?

Fixes: ba391a102e ("drm/i915/guc: Include the GuC registers in the error state")
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260108201202.59250-2-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 701c47493328a8173996e7590733be3493af572f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2026-01-12 13:10:36 +02:00
Bartosz Golaszewski
471e998c0e gpiolib: remove redundant callback check
The presence of the .get_direction() callback is already checked in
gpiochip_get_direction(). Remove the duplicated check which also returns
the wrong error code to user-space.

Fixes: e623c4303e ("gpiolib: sanitize the return value of gpio_chip::get_direction()")
Reported-by: Michael Walle <mwalle@kernel.org>
Closes: https://lore.kernel.org/all/DFJAFK3DTBOZ.3G2P3A5IH34GF@kernel.org/
Link: https://lore.kernel.org/r/20260109105557.20024-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2026-01-12 09:35:04 +01:00
Bartosz Golaszewski
c187900187 gpio: davinci: implement .get_direction()
It's strongly recommended for GPIO drivers to always implement the
.get_direction() callback - even for fixed-direction controllers.

GPIO core will even emit a warning if the callback is missing, when
users try to read the direction of a pin.

Implement .get_direction() for gpio-davinci.

Reported-by: Michael Walle <mwalle@kernel.org>
Closes: https://lore.kernel.org/all/DFJAFK3DTBOZ.3G2P3A5IH34GF@kernel.org/
Reviewed-by: Linus Walleij <linusw@kernel.org>
Fixes: a060b8c511 ("gpiolib: implement low-level, shared GPIO support")
Tested-by: Michael Walle <mwalle@kernel.org> # on sa67
Link: https://lore.kernel.org/r/20260109130832.27326-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2026-01-12 09:34:26 +01:00
Linus Torvalds
0f61b1860c Linux 6.19-rc5 v6.19-rc5 2026-01-11 17:03:14 -10:00
Linus Torvalds
7143203341 Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fixes from Eric Biggers:

 - A couple more fixes for the lib/crypto KUnit tests

 - Fix missing MMU protection for the AES S-box

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
  lib/crypto: aes: Fix missing MMU protection for AES S-box
  MAINTAINERS: add test vector generation scripts to "CRYPTO LIBRARY"
  lib/crypto: tests: Fix syntax error for old python versions
  lib/crypto: tests: polyval_kunit: Increase iterations for preparekey in IRQs
2026-01-11 15:07:56 -10:00
Linus Torvalds
9c7ef209cd Merge tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
 "Here are some small char/misc driver fixes for some reported issues.
  Included in here is:

   - much reported rust_binder fix

   - counter driver fixes

   - new device ids for the mei driver

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  rust_binder: remove spin_lock() in rust_shrink_free_page()
  mei: me: add nova lake point S DID
  counter: 104-quad-8: Fix incorrect return value in IRQ handler
  counter: interrupt-cnt: Drop IRQF_NO_THREAD flag
2026-01-11 07:27:44 -10:00
Linus Torvalds
316a94cb63 Merge tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
 "Disable GCOV instrumentation in the SEV noinstr.c collection of SEV
  noinstr methods, to further robustify the code"

* tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Disable GCOV on noinstr object
2026-01-11 07:19:43 -10:00
Linus Torvalds
fac4bdbaca Merge tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix a crash in sched_mm_cid_after_execve()"

* tag 'sched-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mm_cid: Prevent NULL mm dereference in sched_mm_cid_after_execve()
2026-01-11 07:11:53 -10:00
Linus Torvalds
fe948326e9 Merge tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
 "Fix perf swevent hrtimer deinit regression"

* tag 'perf-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Ensure swevent hrtimer is properly destroyed
2026-01-11 06:55:27 -10:00
Janne Grunau
76cba1e60b dmaengine: apple-admac: Add "apple,t8103-admac" compatible
After discussion with the devicetree maintainers we agreed to not extend
lists with the generic compatible "apple,admac" anymore [1]. Use
"apple,t8103-admac" as base compatible as it is the SoC the driver and
bindings were written for.

[1]: https://lore.kernel.org/asahi/12ab93b7-1fc2-4ce0-926e-c8141cfe81bf@kernel.org/

Fixes: b127315d9a ("dmaengine: apple-admac: Add Apple ADMAC driver")
Cc: stable@vger.kernel.org
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Janne Grunau <j@jannau.net>
Link: https://patch.msgid.link/20251231-apple-admac-t8103-base-compat-v1-1-ec24a3708f76@jannau.net
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-11 22:12:49 +05:30
Haotian Zhang
2e1136acf8 dmaengine: omap-dma: fix dma_pool resource leak in error paths
The dma_pool created by dma_pool_create() is not destroyed when
dma_async_device_register() or of_dma_controller_register() fails,
causing a resource leak in the probe error paths.

Add dma_pool_destroy() in both error paths to properly release the
allocated dma_pool resource.

Fixes: 7bedaa5537 ("dmaengine: add OMAP DMA engine driver")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251103073018.643-1-vulab@iscas.ac.cn
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-11 22:12:44 +05:30
Miaoqian Lin
3f747004bb dmaengine: qcom: gpi: Fix memory leak in gpi_peripheral_config()
Fix a memory leak in gpi_peripheral_config() where the original memory
pointed to by gchan->config could be lost if krealloc() fails.

The issue occurs when:
1. gchan->config points to previously allocated memory
2. krealloc() fails and returns NULL
3. The function directly assigns NULL to gchan->config, losing the
   reference to the original memory
4. The original memory becomes unreachable and cannot be freed

Fix this by using a temporary variable to hold the krealloc() result
and only updating gchan->config when the allocation succeeds.

Found via static analysis and code review.

Fixes: 5d0c3533a1 ("dmaengine: qcom: Add GPI dma driver")
Cc: stable@vger.kernel.org
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Link: https://patch.msgid.link/20251029123421.91973-1-linmq006@gmail.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-01-11 22:12:38 +05:30
Linus Torvalds
88730166f3 Merge tag 'irq-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc irqchip fixes from Ingo Molnar:

 - Fix an endianness bug in the gic-v5 irqchip driver

 - Revert a broken commit from the riscv-imsic irqchip driver

* tag 'irq-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "irqchip/riscv-imsic: Embed the vector array in lpriv"
  irqchip/gic-v5: Fix gicv5_its_map_event() ITTE read endianness
2026-01-11 06:36:20 -10:00
Thomas Gleixner
2e4b28c48f treewide: Update email address
In a vain attempt to consolidate the email zoo switch everything to the
kernel.org account.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-11 06:09:11 -10:00
Cristian Ciocaltea
db8061bbb9 drm/rockchip: dw_hdmi_qp: Switch to gpiod_set_value_cansleep()
Since commit 20cf2aed89 ("gpio: rockchip: mark the GPIO controller as
sleeping"), the Rockchip GPIO chip operations potentially sleep, hence
the kernel complains when trying to make use of the non-sleeping API:

[   16.653343] WARNING: drivers/gpio/gpiolib.c:3902 at gpiod_set_value+0xd0/0x108, CPU#5: kworker/5:1/93
...
[   16.678470] Hardware name: Radxa ROCK 5B (DT)
[   16.682374] Workqueue: events dw_hdmi_qp_rk3588_hpd_work [rockchipdrm]
...
[   16.729314] Call trace:
[   16.731846]  gpiod_set_value+0xd0/0x108 (P)
[   16.734548]  dw_hdmi_qp_rockchip_encoder_enable+0xbc/0x3a8 [rockchipdrm]
[   16.737487]  drm_atomic_helper_commit_encoder_bridge_enable+0x314/0x380 [drm_kms_helper]
[   16.740555]  drm_atomic_helper_commit_tail_rpm+0xa4/0x100 [drm_kms_helper]
[   16.743501]  commit_tail+0x1e0/0x2c0 [drm_kms_helper]
[   16.746290]  drm_atomic_helper_commit+0x274/0x2b8 [drm_kms_helper]
[   16.749178]  drm_atomic_commit+0x1f0/0x248 [drm]
[   16.752000]  drm_client_modeset_commit_atomic+0x490/0x5d0 [drm]
[   16.754954]  drm_client_modeset_commit_locked+0xf4/0x400 [drm]
[   16.757911]  drm_client_modeset_commit+0x50/0x80 [drm]
[   16.760791]  __drm_fb_helper_restore_fbdev_mode_unlocked+0x9c/0x170 [drm_kms_helper]
[   16.763843]  drm_fb_helper_hotplug_event+0x340/0x368 [drm_kms_helper]
[   16.766780]  drm_fbdev_client_hotplug+0x64/0x1d0 [drm_client_lib]
[   16.769634]  drm_client_hotplug+0x178/0x240 [drm]
[   16.772455]  drm_client_dev_hotplug+0x170/0x1c0 [drm]
[   16.775303]  drm_connector_helper_hpd_irq_event+0xa4/0x178 [drm_kms_helper]
[   16.778248]  dw_hdmi_qp_rk3588_hpd_work+0x44/0xb8 [rockchipdrm]
[   16.781080]  process_one_work+0xc3c/0x1658
[   16.783719]  worker_thread+0xa24/0xc40
[   16.786333]  kthread+0x3b4/0x3d8
[   16.788889]  ret_from_fork+0x10/0x20

Since gpiod_get_value() is called from a context that can sleep, switch
to its *_cansleep() variant and get rid of the issue.

Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Link: https://patch.msgid.link/20260110-dw-hdmi-qp-cansleep-v1-1-1ce937c5b201@collabora.com
2026-01-11 14:36:21 +01:00
Linus Torvalds
755bc1335e Merge tag 'riscv-for-linus-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
 "Notable changes include a fix to close one common microarchitectural
  attack vector for out-of-order cores. Another patch exposed an
  omission in my boot test coverage, which is currently missing
  relocatable kernels. Otherwise, the fixes seem to be settling down for
  us.

   - Fix CONFIG_RELOCATABLE=y boots by building Image files from
     vmlinux, rather than vmlinux.unstripped, now that the .modinfo
     section is included in vmlinux.unstripped

   - Prevent branch predictor poisoning microarchitectural attacks that
     use the syscall index as a vector by using array_index_nospec() to
     clamp the index after the bounds check (as x86 and ARM64 already
     do)

   - Fix a crash in test_kprobes when building with Clang

   - Fix a deadlock possible when tracing is enabled for SBI ecalls

   - Fix the definition of the Zk standard RISC-V ISA extension bundle,
     which was missing the Zknh extension

   - A few other miscellaneous non-functional cleanups, removing unused
     macros, fixing an out-of-date path in code comments, resolving a
     compile-time warning for a type mismatch in a pr_crit(), and
     removing an unnecessary header file inclusion"

* tag 'riscv-for-linus-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: trace: fix snapshot deadlock with sbi ecall
  riscv: remove irqflags.h inclusion in asm/bitops.h
  riscv: cpu_ops_sbi: smp_processor_id() returns int, not unsigned int
  riscv: configs: Clean up references to non-existing configs
  riscv: kexec_image: Fix dead link to boot-image-header.rst
  riscv: pgtable: Cleanup useless VA_USER_XXX definitions
  riscv: cpufeature: Fix Zk bundled extension missing Zknh
  riscv: fix KUnit test_kprobes crash when building with Clang
  riscv: Sanitize syscall table indexing under speculation
  riscv: boot: Always make Image from vmlinux, not vmlinux.unstripped
2026-01-10 15:54:41 -10:00
Linus Torvalds
0fa27899e0 Merge tag 'driver-core-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core
Pull driver core fixes from Danilo Krummrich:

 - Fix swapped example values for the `family` and `machine` attributes
   in the sysfs SoC bus ABI documentation

 - Fix Rust build and intra-doc issues when optional subsystems
   (CONFIG_PCI, CONFIG_AUXILIARY_BUS, CONFIG_PRINTK) are disabled

 - Fix typos and incorrect safety comments in Rust PCI, DMA, and
   device ID documentation

* tag 'driver-core-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  rust: device: Remove explicit import of CStrExt
  rust: pci: fix typos in Bar struct's comments
  rust: device: fix broken intra-doc links
  rust: dma: fix broken intra-doc links
  rust: driver: fix broken intra-doc links to example driver types
  rust: device_id: replace incorrect word in safety documentation
  rust: dma: remove incorrect safety documentation
  docs: ABI: sysfs-devices-soc: Fix swapped sample values
2026-01-10 15:04:04 -10:00
Linus Torvalds
b061fcffe3 Merge tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fix from Shuah Khan:
 "Fix tracing test_multiple_writes stalls when buffer_size_kb is less
  than 12KB"

* tag 'linux_kselftest-fixes-6.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/tracing: Fix test_multiple_writes stall
2026-01-10 14:57:55 -10:00
Jakub Kicinski
16ce6e6fa9 Merge branch 'mlx5e-profile-change-fix'
Saeed Mahameed says:

====================
mlx5e profile change fix

This series fixes a crash in mlx5e due to profile change error flow.
====================

Link: https://patch.msgid.link/20260108212657.25090-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:21:14 -08:00
Saeed Mahameed
5629f8859d net/mlx5e: Restore destroying state bit after profile cleanup
Profile rollback can fail in mlx5e_netdev_change_profile() and we will
end up with invalid mlx5e_priv memset to 0, we must maintain the
'destroying' bit in order to gracefully shutdown even if the
profile/priv are not valid.

This patch maintains the previous state of the 'destroying' state of
mlx5e_priv after priv cleanup, to allow the remove flow to cleanup
common resources from mlx5_core to avoid FW fatal errors as seen below:

$ devlink dev eswitch set pci/0000:00:03.0 mode switchdev
    Error: mlx5_core: Failed setting eswitch to offloads.
dmesg: mlx5_core 0000:00:03.0 enp0s3np0: failed to rollback to orig profile, ...

$ devlink dev reload pci/0000:00:03.0

mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
mlx5_core 0000:00:03.0: poll_health:803:(pid 519): Fatal error 3 detected
mlx5_core 0000:00:03.0: firmware version: 28.41.1000
mlx5_core 0000:00:03.0: 0.000 Gb/s available PCIe bandwidth (Unknown x255 link)
mlx5_core 0000:00:03.0: mlx5_function_enable:1200:(pid 519): enable hca failed
mlx5_core 0000:00:03.0: mlx5_function_enable:1200:(pid 519): enable hca failed
mlx5_core 0000:00:03.0: mlx5_health_try_recover:340:(pid 141): handling bad device here
mlx5_core 0000:00:03.0: mlx5_handle_bad_state:285:(pid 141): Expected to see disabled NIC but it is full driver
mlx5_core 0000:00:03.0: mlx5_error_sw_reset:236:(pid 141): start
mlx5_core 0000:00:03.0: NIC IFC still 0 after 4000ms.

Fixes: c4d7eb5768 ("net/mxl5e: Add change profile method")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260108212657.25090-5-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:21:11 -08:00
Saeed Mahameed
4ef8512e14 net/mlx5e: Pass netdev to mlx5e_destroy_netdev instead of priv
mlx5e_priv is an unstable structure that can be memset(0) if profile
attaching fails.

Pass netdev to mlx5e_destroy_netdev() to guarantee it will work on a
valid netdev.

On mlx5e_remove: Check validity of priv->profile, before attempting
to cleanup any resources that might be not there.

This fixes a kernel oops in mlx5e_remove when switchdev mode fails due
to change profile failure.

$ devlink dev eswitch set pci/0000:00:03.0 mode switchdev
Error: mlx5_core: Failed setting eswitch to offloads.
dmesg:
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12

$ devlink dev reload pci/0000:00:03.0 ==> oops

BUG: kernel NULL pointer dereference, address: 0000000000000370
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 15 UID: 0 PID: 520 Comm: devlink Not tainted 6.18.0-rc5+ #115 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100
RSP: 0018:ffffc9000083f8b8 EFLAGS: 00010286
RAX: ffff8881126fc380 RBX: ffff8881015ac400 RCX: ffffffff826ffc45
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881035109c0
RBP: ffff8881035109c0 R08: ffff888101e3e838 R09: ffff888100264e10
R10: ffffc9000083f898 R11: ffffc9000083f8a0 R12: ffff888101b921a0
R13: ffff888101b921a0 R14: ffff8881015ac9a0 R15: ffff8881015ac400
FS:  00007f789a3c8740(0000) GS:ffff88856aa59000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000370 CR3: 000000010b6c0001 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 mlx5e_remove+0x57/0x110
 device_release_driver_internal+0x19c/0x200
 bus_remove_device+0xc6/0x130
 device_del+0x160/0x3d0
 ? devl_param_driverinit_value_get+0x2d/0x90
 mlx5_detach_device+0x89/0xe0
 mlx5_unload_one_devl_locked+0x3a/0x70
 mlx5_devlink_reload_down+0xc8/0x220
 devlink_reload+0x7d/0x260
 devlink_nl_reload_doit+0x45b/0x5a0
 genl_family_rcv_msg_doit+0xe8/0x140

Fixes: c4d7eb5768 ("net/mxl5e: Add change profile method")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260108212657.25090-4-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:21:10 -08:00
Saeed Mahameed
123eda2e5b net/mlx5e: Don't store mlx5e_priv in mlx5e_dev devlink priv
mlx5e_priv is an unstable structure that can be memset(0) if profile
attaching fails, mlx5e_priv in mlx5e_dev devlink private is used to
reference the netdev and mdev associated with that struct. Instead,
store netdev directly into mlx5e_dev and get mdev from the containing
mlx5_adev aux device structure.

This fixes a kernel oops in mlx5e_remove when switchdev mode fails due
to change profile failure.

$ devlink dev eswitch set pci/0000:00:03.0 mode switchdev
Error: mlx5_core: Failed setting eswitch to offloads.
dmesg:
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12

$ devlink dev reload pci/0000:00:03.0 ==> oops

BUG: kernel NULL pointer dereference, address: 0000000000000520
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 3 UID: 0 PID: 521 Comm: devlink Not tainted 6.18.0-rc5+ #117 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:mlx5e_remove+0x68/0x130
RSP: 0018:ffffc900034838f0 EFLAGS: 00010246
RAX: ffff88810283c380 RBX: ffff888101874400 RCX: ffffffff826ffc45
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff888102d789c0 R08: ffff8881007137f0 R09: ffff888100264e10
R10: ffffc90003483898 R11: ffffc900034838a0 R12: ffff888100d261a0
R13: ffff888100d261a0 R14: ffff8881018749a0 R15: ffff888101874400
FS:  00007f8565fea740(0000) GS:ffff88856a759000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000520 CR3: 000000010b11a004 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 device_release_driver_internal+0x19c/0x200
 bus_remove_device+0xc6/0x130
 device_del+0x160/0x3d0
 ? devl_param_driverinit_value_get+0x2d/0x90
 mlx5_detach_device+0x89/0xe0
 mlx5_unload_one_devl_locked+0x3a/0x70
 mlx5_devlink_reload_down+0xc8/0x220
 devlink_reload+0x7d/0x260
 devlink_nl_reload_doit+0x45b/0x5a0
 genl_family_rcv_msg_doit+0xe8/0x140

Fixes: ee75f1fc44 ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device")
Fixes: c4d7eb5768 ("net/mxl5e: Add change profile method")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://patch.msgid.link/20260108212657.25090-3-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:21:10 -08:00
Saeed Mahameed
4dadc4077e net/mlx5e: Fix crash on profile change rollback failure
mlx5e_netdev_change_profile can fail to attach a new profile and can
fail to rollback to old profile, in such case, we could end up with a
dangling netdev with a fully reset netdev_priv. A retry to change
profile, e.g. another attempt to call mlx5e_netdev_change_profile via
switchdev mode change, will crash trying to access the now NULL
priv->mdev.

This fix allows mlx5e_netdev_change_profile() to handle previous
failures and an empty priv, by not assuming priv is valid.

Pass netdev and mdev to all flows requiring
mlx5e_netdev_change_profile() and avoid passing priv.
In mlx5e_netdev_change_profile() check if current priv is valid, and if
not, just attach the new profile without trying to access the old one.

This fixes the following oops, when enabling switchdev mode for the 2nd
time after first time failure:

 ## Enabling switchdev mode first time:

mlx5_core 0012:03:00.1: E-Switch: Supported tc chains and prios offload
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12
workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12
mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
                                                                         ^^^^^^^^
mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)

 ## retry: Enabling switchdev mode 2nd time:

mlx5_core 0000:00:03.0: E-Switch: Supported tc chains and prios offload
BUG: kernel NULL pointer dereference, address: 0000000000000038
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 13 UID: 0 PID: 520 Comm: devlink Not tainted 6.18.0-rc4+ #91 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:mlx5e_detach_netdev+0x3c/0x90
Code: 50 00 00 f0 80 4f 78 02 48 8b bf e8 07 00 00 48 85 ff 74 16 48 8b 73 78 48 d1 ee 83 e6 01 83 f6 01 40 0f b6 f6 e8 c4 42 00 00 <48> 8b 45 38 48 85 c0 74 08 48 89 df e8 cc 47 40 1e 48 8b bb f0 07
RSP: 0018:ffffc90000673890 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8881036a89c0 RCX: 0000000000000000
RDX: ffff888113f63800 RSI: ffffffff822fe720 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000002dcd R09: 0000000000000000
R10: ffffc900006738e8 R11: 00000000ffffffff R12: 0000000000000000
R13: 0000000000000000 R14: ffff8881036a89c0 R15: 0000000000000000
FS:  00007fdfb8384740(0000) GS:ffff88856a9d6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000038 CR3: 0000000112ae0005 CR4: 0000000000370ef0
Call Trace:
 <TASK>
 mlx5e_netdev_change_profile+0x45/0xb0
 mlx5e_vport_rep_load+0x27b/0x2d0
 mlx5_esw_offloads_rep_load+0x72/0xf0
 esw_offloads_enable+0x5d0/0x970
 mlx5_eswitch_enable_locked+0x349/0x430
 ? is_mp_supported+0x57/0xb0
 mlx5_devlink_eswitch_mode_set+0x26b/0x430
 devlink_nl_eswitch_set_doit+0x6f/0xf0
 genl_family_rcv_msg_doit+0xe8/0x140
 genl_rcv_msg+0x18b/0x290
 ? __pfx_devlink_nl_pre_doit+0x10/0x10
 ? __pfx_devlink_nl_eswitch_set_doit+0x10/0x10
 ? __pfx_devlink_nl_post_doit+0x10/0x10
 ? __pfx_genl_rcv_msg+0x10/0x10
 netlink_rcv_skb+0x52/0x100
 genl_rcv+0x28/0x40
 netlink_unicast+0x282/0x3e0
 ? __alloc_skb+0xd6/0x190
 netlink_sendmsg+0x1f7/0x430
 __sys_sendto+0x213/0x220
 ? __sys_recvmsg+0x6a/0xd0
 __x64_sys_sendto+0x24/0x30
 do_syscall_64+0x50/0x1f0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fdfb8495047

Fixes: c4d7eb5768 ("net/mxl5e: Add change profile method")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260108212657.25090-2-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-10 15:21:09 -08:00