Memahami Message [watchdog: BUG: soft lockup - CPU#X stuck for XXs!]

September 25, 2021

Memahami Message [watchdog: BUG: soft lockup - CPU#X stuck for XXs!]

This 'soft lockup' can happen if the kernel is busy, working on a huge amount of objects which need to be scanned, freed, or allocated, respectively.


Berdasarkan pengalaman, kernel message watchdog: BUG: soft lockup - CPU#X stuck for XXs! dalam beberapa kasus tidak akan terjadi problem (Hang) jika stuck hanya beberapa detik. Namun jika sudah stuck time -nya semakin lama, dan message tersebut sering muncul mengindikasikan kernel space busy tiada henti.

A soft lockup is a situation usually caused by a bug, when a task is executing in kernel space on a CPU without rescheduling. The task also does not allow any other task to execute on that particular CPU. As a result, a warning is displayed to a user through the system console. This problem is also referred to as the soft lockup firing.

Soft lockup merupakan situasi yang disebabkan oleh bug, ketika sebuah task dieksekusi oleh kernel space di CPU tanpa rescheduling. Task tersebut juga tidak mengizinkan task yang lain dieksekusi di CPU tertentu. Sehingga muncul Warning message dari kernel, watchdog: BUG: soft lockup - CPU#X stuck for XXs!

Untuk indikasi lebih dalam perlu pengecekan Host dari VM tersebut.

Beberapa Opsi yang bisa dicoba untuk menangani hal tersebut

  1. Disable kerenel panic; By default RHEL 8 sudah melakukan hal tersebut. Parameters ini berguna supaya tidak terjadi kernel panic ketika ada soft lockup; Sehingga VM Tidak Hang ketika terjadi soft lockup.
 echo "kernel.softlockup_panic=0" >> /etc/sysctl.conf
 sysctl --system
  1. Incrase watchdog_thresh, agar message soft lockup muncul hanya ketika cpu stuck lebih dari 30s.
   echo "kernel.watchdog_thresh=30" >> /etc/sysctl.conf                
   sysctl --system
  1. Upgrade Kernel to latest. Kembali lagi tindakan yang bisa dilakukan untuk bug kernel tidak banyak, opsi terbaik adalah patch the bug.
  2. Check VM Host, issue seperti ini sering terjadi pada teknologi virtualisasi seperti VMware, KVM. Tidak menutup kemungkinan Host VM tersebut memang sudah exhausted.

Reference

  • What are all these "Bug: soft lockup" messages about? | Support | SUSE

https://www.suse.com/support/kb/doc/?id=000018705

  • kernel - watchdog: BUG: soft lockup - CPU#6 stuck for 23s - Ask Ubuntu

https://askubuntu.com/questions/1264859/watchdog-bug-soft-lockup-cpu6-stuck-for-23s

  • Error message: "NMI watchdog: BUG: soft lockup - CPU#X stuck for XXs!" on console (67623)

https://kb.vmware.com/s/article/67623

  • VMware virtual machine guest soft lockups in mpt_put_msg_frame - Red Hat Customer Portal

https://access.redhat.com/solutions/3176741

  • Virtualization lags and hypervisor overcommitment - Red Hat Customer Portal

https://access.redhat.com/articles/5008811

  • Chapter 7. Keeping kernel panic parameters disabled in virtualized environments Red Hat Enterprise Linux 8 | Red Hat Customer Portal

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/keeping-kernel-panic-parameters-disabled-in-virtualized-environments_managing-monitoring-and-updating-the-kernel

  • linux - What does "kernel:NMI watchdog: BUG: soft lockup" followed by other errors mean? - Unix & Linux Stack Exchange

https://unix.stackexchange.com/questions/216959/what-does-kernelnmi-watchdog-bug-soft-lockup-followed-by-other-errors-mean

  • "NMI watchdog: BUG: soft lockup - CPU # stuck for #s! [X:3005]" in message log file after a Flame application crash | Flame Products 2020 | Autodesk Knowledge Network

https://knowledge.autodesk.com/support/flame-products/troubleshooting/caas/sfdcarticles/sfdcarticles/NMI-watchdog-BUG-soft-lockup-CPU-stuck-for-s-X-3005-in-message-log-file-after-a-Flame-application-crash.html


Profile picture

Written by Nicolas Julian Seseorang yang mencoba berkarya. Chit Chat with me in Twitter