Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions data/templates/system/watchdog.conf.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Autogenerated by system_watchdog.py ###
[Manager]
RuntimeWatchdogSec={{ timeout }}
ShutdownWatchdogSec={{ shutdown_timeout }}
RebootWatchdogSec={{ reboot_timeout }}
70 changes: 70 additions & 0 deletions interface-definitions/system_watchdog.xml.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
<?xml version="1.0"?>
<interfaceDefinition>
<node name="system">
<children>
<node name="watchdog" owner="${vyos_conf_scripts_dir}/system_watchdog.py">
<properties>
<help>Hardware watchdog configuration</help>
<priority>9999</priority>
</properties>
<children>
<leafNode name="module">
<properties>
<help>Kernel module to load for watchdog device (optional)</help>
<valueHelp>
<format>txt</format>
<description>Module name (e.g. 'softdog', 'iTCO_wdt', 'sp5100_tco')</description>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally prefer set-time validation over commit-time validation whenever possible, since it provides a faster feedback and the user doesn't have to commit to discover that some options are wrong. Two questions from this perspective:

  1. How likely is it that the module for the hardware watchdog actually present in the system will not be loaded? Can we fully automate that and either load the module for the watchdog that the system has or fail the commit if there's no module matching the watchdog?
  2. If we can't fully automate it, can we add an explicit list of available watchdog module for set-time validation?

</valueHelp>
<constraint>
<regex>[a-zA-Z0-9_\-]+</regex>
</constraint>
<constraintErrorMessage>Module name must be alphanumeric/underscore/hyphen</constraintErrorMessage>
</properties>
</leafNode>
<leafNode name="timeout">
<properties>
<help>Watchdog timeout for runtime in seconds (1-86400)</help>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 86400 the upper limit for those timeouts in the kernel or systemd? If it's not, I believe we shouldn't introduce an artificial limit — one person's constant is another person's variable, as Alan Perlis quipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max value of watchdog timeout varies per watchdog. Eg: bcm2835_wdt has a max timeout of 16 seconds. The iTCO_wdt module on v1 hardware is around 30 seconds and as high as 10mins on V2 hardware.

I will investigate further, it is likely that the max value can be read from some modules once it has been loaded.

<valueHelp>
<format>u32:1-86400</format>
<description>Seconds</description>
</valueHelp>
<constraint>
<validator name="numeric" argument="--range 1-86400"/>
</constraint>
<constraintErrorMessage>Timeout must be between 1 and 86400 seconds</constraintErrorMessage>
</properties>
<defaultValue>10</defaultValue>
</leafNode>
<leafNode name="shutdown-timeout">
<properties>
<help>Watchdog timeout during shutdown in seconds (1-86400)</help>
<valueHelp>
<format>u32:60-86400</format>
<description>Seconds</description>
</valueHelp>
<constraint>
<validator name="numeric" argument="--range 60-86400"/>
</constraint>
<constraintErrorMessage>Shutdown timeout must be between 60 and 86400 seconds</constraintErrorMessage>
</properties>
<defaultValue>120</defaultValue>
</leafNode>
<leafNode name="reboot-timeout">
<properties>
<help>Watchdog timeout during reboot in seconds (60-86400)</help>
<valueHelp>
<format>u32:60-86400</format>
<description>Seconds</description>
</valueHelp>
<constraint>
<validator name="numeric" argument="--range 60-86400"/>
</constraint>
<constraintErrorMessage>Reboot timeout must be between 60 and 86400 seconds</constraintErrorMessage>
</properties>
<defaultValue>120</defaultValue>
</leafNode>
</children>
</node>
</children>
</node>
</interfaceDefinition>
62 changes: 62 additions & 0 deletions smoketest/scripts/cli/test_system_watchdog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env python3
# Copyright VyOS maintainers and contributors <[email protected]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 or later as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

import os
import unittest

from base_vyostest_shim import VyOSUnitTestSHIM
from vyos.utils.process import cmd

base_path = ['system', 'watchdog']


class TestSystemWatchdog(VyOSUnitTestSHIM.TestCase):
def tearDown(self):
self.cli_delete(base_path)
self.cli_commit()
super().tearDown()

def test_enable_watchdog_softdog(self):
"""Configure watchdog (presence enables) with softdog and check state"""
# Presence of 'system watchdog' enables watchdog; set module to softdog
self.cli_set(base_path)
self.cli_set(base_path + ['module', 'softdog'])
self.cli_commit()
# Check if softdog module is loaded
lsmod = cmd('lsmod')
self.assertIn('softdog', lsmod)
# Check /dev/watchdog0 exists
self.assertTrue(
os.path.exists('/dev/watchdog0'), '/dev/watchdog0 does not exist'
)
# Check systemd config file exists
config_path = '/run/systemd/system.conf.d/watchdog.conf'
self.assertTrue(
os.path.exists(config_path), f"Systemd config file not found: {config_path}"
)

def test_invalid_module_rejected(self):
"""Verify that a non-existent watchdog module causes commit failure"""
# Choose a module name unlikely to exist; include a prefix to avoid collision with real names
bogus_module = 'zzzx_watchdog_unit_test_fake'
self.cli_set(base_path)
self.cli_set(base_path + ['module', bogus_module])
# Commit should fail due to verify() raising ConfigError on modprobe dry-run failure
with self.assertRaisesRegex(Exception, r"Watchdog module '.*' was not found"):
self.cli_commit()


if __name__ == '__main__':
unittest.main(verbosity=2, failfast=VyOSUnitTestSHIM.TestCase.debug_on())
138 changes: 138 additions & 0 deletions src/conf_mode/system_watchdog.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
#!/usr/bin/env python3
#
# Copyright VyOS maintainers and contributors <[email protected]>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 or later as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

from sys import exit
from pathlib import Path

from vyos.config import Config
from vyos.template import render
from vyos.utils.process import call, run
from vyos import ConfigError
from vyos import airbag

airbag.enable()

watchdog_config_dir = Path('/run/systemd/system.conf.d')
watchdog_config_file = Path(watchdog_config_dir / 'watchdog.conf')
modules_load_directory = Path('/run/modules-load.d')
modules_load_file = Path(modules_load_directory / 'watchdog.conf')
Comment on lines +28 to +31
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
watchdog_config_dir = Path('/run/systemd/system.conf.d')
watchdog_config_file = Path(watchdog_config_dir / 'watchdog.conf')
modules_load_directory = Path('/run/modules-load.d')
modules_load_file = Path(modules_load_directory / 'watchdog.conf')
watchdog_config_dir = Path('/run/systemd/system.conf.d')
watchdog_config_file = watchdog_config_dir / 'watchdog.conf'
modules_load_directory = Path('/run/modules-load.d')
modules_load_file = modules_load_directory / 'watchdog.conf'

Don't need to encapsulate the file paths in another Path()

WATCHDOG_DEV = Path('/dev/watchdog0')


def get_config(config=None):
if config:
conf = config
else:
conf = Config()
base = ['system', 'watchdog']

if not conf.exists(base):
return None

watchdog = conf.get_config_dict(
base, key_mangling=('-', '_'), get_first_key=True, with_recursive_defaults=True
)

return watchdog


def verify(watchdog):
if watchdog is None:
return None

module = watchdog.get('module')
device_exists = WATCHDOG_DEV.exists()

# Require a usable watchdog: either device already present or a module provided
if not module and not device_exists:
raise ConfigError(
"No watchdog device found at /dev/watchdog0 and no module configured. "
"Either configure 'system watchdog module <name>' or ensure the kernel auto-loads a watchdog driver."
)

# If a module is provided, verify it resolves via modprobe (dry-run)
if module:
# Dry-run modprobe (-n) in quiet mode (-q) verifies availability without loading
rc = run(f'modprobe -n -q {module}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only issue here is that it doesn't verify the specified module is a valid watchdog driver.

As per @dmbaturin's comment, probably best to build a validator or list of watchdog modules enabled in the VyOS kernel.

if rc != 0:
raise ConfigError(
f"Watchdog module '{module}' was not found or cannot be loaded"
)

return None


def generate(watchdog):
# If watchdog node removed entirely, clean up everything
if watchdog is None:
watchdog_config_file.unlink(missing_ok=True)
modules_load_file.unlink(missing_ok=True)
return None

# Persist kernel module autoload on boot if specified (even if not enabled)
module = watchdog.get('module')
if module:
try:
modules_load_directory.mkdir(exist_ok=True)
modules_load_file.write_text(f"{module}\n")
except Exception as e:
print(f"Warning: Failed writing modules-load configuration: {e}")
Comment on lines +88 to +92
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bare Exception in except clauses is too broad and could mask unexpected errors. Consider catching more specific exceptions (e.g., OSError, IOError) or letting unexpected exceptions propagate to be caught by the airbag error handler.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to VyOS team to decide.

else:
# If module option removed, drop persisted autoload file
modules_load_file.unlink(missing_ok=True)

# Try to load kernel module if specified and /dev/watchdog0 is missing
if not WATCHDOG_DEV.exists():
if module:
# Try to load the module using vyos call wrapper for logging/airbag integration
try:
call(f'modprobe {module}')
except Exception as e:
print(f"Warning: Could not load watchdog module '{module}': {e}")
Comment on lines +101 to +104
Copy link

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bare Exception in except clauses is too broad and could mask unexpected errors. Consider catching more specific exceptions related to process execution (e.g., OSError) or letting unexpected exceptions propagate to be caught by the airbag error handler.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to VyOS team to decide.

# Re-check for device
if not WATCHDOG_DEV.exists():
print(
"Warning: /dev/watchdog0 not found. Systemd watchdog will not be enabled."
)
watchdog_config_file.unlink(missing_ok=True)
return None

# Ensure the directory exists
watchdog_config_dir.mkdir(parents=True, exist_ok=True)

# Pass through configured time values directly as seconds
render(str(watchdog_config_file), 'system/watchdog.conf.j2', watchdog)

return None


def apply(watchdog):
# Reload systemd daemon to apply/unload the watchdog configuration
# The watchdog settings take immediate effect after systemd is reloaded
call('systemctl daemon-reload')

return None


if __name__ == '__main__':
try:
c = get_config()
verify(c)
generate(c)
apply(c)
except ConfigError as e:
print(e)
exit(1)
Loading