7
submitted 6 months ago* (last edited 6 months ago) by server_paul@lemmy.world to c/proxmox@lemmy.world

I've been using a Proxmox home server for quite some time now without many problems. Recently i got an AMD Navi 10 RX 5700 XT and tried to pass it through to a windows VM. I mainly followed the official Proxmox guide but got it running by using some other tutorials too. For now, it works once after i reboot the host. Then its no problem to start the VM, but after a restart the VM doesnt start no more, showing this error: swtpm_setup: Not overwriting existing state file. kvm: ../hw/pci/pci.c:1637: pci_irq_handler: Assertion 0 <= irq_num && irq_num < PCI_NUM_PINS' failed. stopping swtpm instance (pid 98348) due to QEMU startup error TASK ERROR: start failed: QEMU exited with code -1` I tried fixing it using this but it didnt change much.

EDIT: link was not shown

top 11 comments
sorted by: hot top controversial new old

Maybe this?
https://github.com/gnif/vendor-reset
Although I've been passing through a vega64 without needing this.

[-] server_paul@lemmy.world 2 points 6 months ago

Yeah, i tried that - the link was just not shown in the original post That didnt really fix it

Try journalctl to get more details from when it fails?

[-] server_paul@lemmy.world 2 points 6 months ago* (last edited 6 months ago)

This is the output from journalctl, since stopping and rebooting the VM: Main error seems to occur at 16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK

Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting

Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:

Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:

Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting

Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up

Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible

Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible

Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.

Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.

Dec 19 16:41:41 pve systemd[1]: Started 195.scope.

Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state

Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state

Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s

Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running

Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed

Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s

Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed

Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting

Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting

Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting

Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting

Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting

Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode

Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error

Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1

Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>

Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.

Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.`

[-] server_paul@lemmy.world 2 points 6 months ago* (last edited 6 months ago)

dmesg also reported vendor_reset: module verification failed: signature and/or required key missing - tainting kernel However, according to https://github.com/gnif/vendor-reset/issues/46#issuecomment-983087796 this error is not as important...

[-] server_paul@lemmy.world 2 points 6 months ago* (last edited 6 months ago)

To everyone else encountering this error, I finally fixed it this way: This forum entry sent me here, which then helped me resolve the issue. Huge thanks to you, InEnduringGrowStrong, for pushing me in the right direction.

Ah nice you got it working.
Once it works it's great.
I've been running mine for a while now, but purposefully avoided Kernel upgrades so far.

[-] server_paul@lemmy.world 1 points 6 months ago

Haha, I already started worrying about that :) But you‘re right, its great.

Formatted with a code block so it's more readable:

16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK
Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.
Dec 19 16:41:41 pve systemd[1]: Started 195.scope.
Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state
Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state
Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running
Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting
Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting
Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting
Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting
Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error
Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1
Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>
Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.

It does seem a lot like the reset bug, but then you already tried that. :/ Kernel module aren't as easy to install and if you're missing the required flags it might just do nothing.

grep -E '(CONFIG_FTRACE|CONFIG_KPROBES|CONFIG_PCI_QUIRKS|CONFIG_KALLSYMS|CONFIG_KALLSYMS_ALL|CONFIG_FUNCTION_TRACER)\b' /boot/config-`uname -r`  

Should show the 6 flags =y

Or maybe some variation of manual reset...
https://forum.proxmox.com/threads/issues-with-intel-arc-a770m-gpu-passthrough-on-nuc12snki72-vfio-pci-not-ready-after-flr-or-bus-reset.130667/

[-] server_paul@lemmy.world 1 points 6 months ago

It was inteded to be a code block, but that way it was just a bunch of text without newlines somehow

this post was submitted on 19 Dec 2023
7 points (88.9% liked)

Proxmox

965 readers
1 users here now

Proxmox VE is a complete, open-source server management platform for enterprise virtualization. It tightly integrates the KVM hypervisor and Linux Containers (LXC), software-defined storage and networking functionality, on a single platform. With the integrated web-based user interface you can manage VMs and containers, high availability for clusters, or the integrated disaster recovery tools with ease.

Proxmox VE Official site

K3S on Proxmox LXC

founded 1 year ago
MODERATORS