-
Notifications
You must be signed in to change notification settings - Fork 395
T7101: Add hardware watchdog support via systemd #4843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: current
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| ### Autogenerated by system_watchdog.py ### | ||
| [Manager] | ||
| RuntimeWatchdogSec={{ timeout }} | ||
| ShutdownWatchdogSec={{ shutdown_timeout }} | ||
| RebootWatchdogSec={{ reboot_timeout }} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| <?xml version="1.0"?> | ||
| <interfaceDefinition> | ||
| <node name="system"> | ||
| <children> | ||
| <node name="watchdog" owner="${vyos_conf_scripts_dir}/system_watchdog.py"> | ||
| <properties> | ||
| <help>Hardware watchdog configuration</help> | ||
| <priority>9999</priority> | ||
| </properties> | ||
| <children> | ||
| <leafNode name="module"> | ||
| <properties> | ||
| <help>Kernel module to load for watchdog device (optional)</help> | ||
| <valueHelp> | ||
| <format>txt</format> | ||
| <description>Module name (e.g. 'softdog', 'iTCO_wdt', 'sp5100_tco')</description> | ||
| </valueHelp> | ||
| <constraint> | ||
| <regex>[a-zA-Z0-9_\-]+</regex> | ||
Firefishy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| </constraint> | ||
| <constraintErrorMessage>Module name must be alphanumeric/underscore/hyphen</constraintErrorMessage> | ||
| </properties> | ||
| </leafNode> | ||
| <leafNode name="timeout"> | ||
| <properties> | ||
| <help>Watchdog timeout for runtime in seconds (1-86400)</help> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is 86400 the upper limit for those timeouts in the kernel or systemd? If it's not, I believe we shouldn't introduce an artificial limit — one person's constant is another person's variable, as Alan Perlis quipped.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The max value of watchdog timeout varies per watchdog. Eg: bcm2835_wdt has a max timeout of 16 seconds. The iTCO_wdt module on v1 hardware is around 30 seconds and as high as 10mins on V2 hardware. I will investigate further, it is likely that the max value can be read from some modules once it has been loaded. |
||
| <valueHelp> | ||
| <format>u32:1-86400</format> | ||
| <description>Seconds</description> | ||
| </valueHelp> | ||
| <constraint> | ||
| <validator name="numeric" argument="--range 1-86400"/> | ||
| </constraint> | ||
| <constraintErrorMessage>Timeout must be between 1 and 86400 seconds</constraintErrorMessage> | ||
Firefishy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| </properties> | ||
| <defaultValue>10</defaultValue> | ||
| </leafNode> | ||
| <leafNode name="shutdown-timeout"> | ||
| <properties> | ||
| <help>Watchdog timeout during shutdown in seconds (1-86400)</help> | ||
Firefishy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| <valueHelp> | ||
| <format>u32:60-86400</format> | ||
| <description>Seconds</description> | ||
| </valueHelp> | ||
| <constraint> | ||
| <validator name="numeric" argument="--range 60-86400"/> | ||
| </constraint> | ||
| <constraintErrorMessage>Shutdown timeout must be between 60 and 86400 seconds</constraintErrorMessage> | ||
| </properties> | ||
| <defaultValue>120</defaultValue> | ||
| </leafNode> | ||
| <leafNode name="reboot-timeout"> | ||
| <properties> | ||
| <help>Watchdog timeout during reboot in seconds (60-86400)</help> | ||
| <valueHelp> | ||
| <format>u32:60-86400</format> | ||
| <description>Seconds</description> | ||
| </valueHelp> | ||
| <constraint> | ||
| <validator name="numeric" argument="--range 60-86400"/> | ||
| </constraint> | ||
| <constraintErrorMessage>Reboot timeout must be between 60 and 86400 seconds</constraintErrorMessage> | ||
| </properties> | ||
| <defaultValue>120</defaultValue> | ||
| </leafNode> | ||
| </children> | ||
| </node> | ||
| </children> | ||
| </node> | ||
| </interfaceDefinition> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| #!/usr/bin/env python3 | ||
| # Copyright VyOS maintainers and contributors <[email protected]> | ||
| # | ||
| # This program is free software; you can redistribute it and/or modify | ||
| # it under the terms of the GNU General Public License version 2 or later as | ||
| # published by the Free Software Foundation. | ||
| # | ||
| # This program is distributed in the hope that it will be useful, | ||
| # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
| # GNU General Public License for more details. | ||
| # | ||
| # You should have received a copy of the GNU General Public License | ||
| # along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
|
||
| import os | ||
| import unittest | ||
|
|
||
| from base_vyostest_shim import VyOSUnitTestSHIM | ||
| from vyos.utils.process import cmd | ||
|
|
||
| base_path = ['system', 'watchdog'] | ||
|
|
||
|
|
||
| class TestSystemWatchdog(VyOSUnitTestSHIM.TestCase): | ||
| def tearDown(self): | ||
| self.cli_delete(base_path) | ||
| self.cli_commit() | ||
| super().tearDown() | ||
|
|
||
| def test_enable_watchdog_softdog(self): | ||
| """Configure watchdog (presence enables) with softdog and check state""" | ||
| # Presence of 'system watchdog' enables watchdog; set module to softdog | ||
| self.cli_set(base_path) | ||
| self.cli_set(base_path + ['module', 'softdog']) | ||
| self.cli_commit() | ||
| # Check if softdog module is loaded | ||
| lsmod = cmd('lsmod') | ||
| self.assertIn('softdog', lsmod) | ||
| # Check /dev/watchdog0 exists | ||
| self.assertTrue( | ||
| os.path.exists('/dev/watchdog0'), '/dev/watchdog0 does not exist' | ||
| ) | ||
| # Check systemd config file exists | ||
| config_path = '/run/systemd/system.conf.d/watchdog.conf' | ||
| self.assertTrue( | ||
| os.path.exists(config_path), f"Systemd config file not found: {config_path}" | ||
| ) | ||
|
|
||
| def test_invalid_module_rejected(self): | ||
| """Verify that a non-existent watchdog module causes commit failure""" | ||
| # Choose a module name unlikely to exist; include a prefix to avoid collision with real names | ||
| bogus_module = 'zzzx_watchdog_unit_test_fake' | ||
| self.cli_set(base_path) | ||
| self.cli_set(base_path + ['module', bogus_module]) | ||
| # Commit should fail due to verify() raising ConfigError on modprobe dry-run failure | ||
| with self.assertRaisesRegex(Exception, r"Watchdog module '.*' was not found"): | ||
Firefishy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| self.cli_commit() | ||
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| unittest.main(verbosity=2, failfast=VyOSUnitTestSHIM.TestCase.debug_on()) | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,138 @@ | ||||||||||||||||||
| #!/usr/bin/env python3 | ||||||||||||||||||
| # | ||||||||||||||||||
| # Copyright VyOS maintainers and contributors <[email protected]> | ||||||||||||||||||
| # | ||||||||||||||||||
| # This program is free software; you can redistribute it and/or modify | ||||||||||||||||||
| # it under the terms of the GNU General Public License version 2 or later as | ||||||||||||||||||
| # published by the Free Software Foundation. | ||||||||||||||||||
| # | ||||||||||||||||||
| # This program is distributed in the hope that it will be useful, | ||||||||||||||||||
| # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||||||||||||||||||
| # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||||||||||||||||||
| # GNU General Public License for more details. | ||||||||||||||||||
| # | ||||||||||||||||||
| # You should have received a copy of the GNU General Public License | ||||||||||||||||||
| # along with this program. If not, see <http://www.gnu.org/licenses/>. | ||||||||||||||||||
|
|
||||||||||||||||||
| from sys import exit | ||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||
|
|
||||||||||||||||||
| from vyos.config import Config | ||||||||||||||||||
| from vyos.template import render | ||||||||||||||||||
| from vyos.utils.process import call, run | ||||||||||||||||||
| from vyos import ConfigError | ||||||||||||||||||
| from vyos import airbag | ||||||||||||||||||
|
|
||||||||||||||||||
| airbag.enable() | ||||||||||||||||||
|
|
||||||||||||||||||
| watchdog_config_dir = Path('/run/systemd/system.conf.d') | ||||||||||||||||||
| watchdog_config_file = Path(watchdog_config_dir / 'watchdog.conf') | ||||||||||||||||||
| modules_load_directory = Path('/run/modules-load.d') | ||||||||||||||||||
| modules_load_file = Path(modules_load_directory / 'watchdog.conf') | ||||||||||||||||||
|
Comment on lines
+28
to
+31
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Don't need to encapsulate the file paths in another |
||||||||||||||||||
| WATCHDOG_DEV = Path('/dev/watchdog0') | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def get_config(config=None): | ||||||||||||||||||
| if config: | ||||||||||||||||||
| conf = config | ||||||||||||||||||
| else: | ||||||||||||||||||
| conf = Config() | ||||||||||||||||||
| base = ['system', 'watchdog'] | ||||||||||||||||||
|
|
||||||||||||||||||
| if not conf.exists(base): | ||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
| watchdog = conf.get_config_dict( | ||||||||||||||||||
| base, key_mangling=('-', '_'), get_first_key=True, with_recursive_defaults=True | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| return watchdog | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def verify(watchdog): | ||||||||||||||||||
| if watchdog is None: | ||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
| module = watchdog.get('module') | ||||||||||||||||||
| device_exists = WATCHDOG_DEV.exists() | ||||||||||||||||||
|
|
||||||||||||||||||
| # Require a usable watchdog: either device already present or a module provided | ||||||||||||||||||
| if not module and not device_exists: | ||||||||||||||||||
| raise ConfigError( | ||||||||||||||||||
| "No watchdog device found at /dev/watchdog0 and no module configured. " | ||||||||||||||||||
| "Either configure 'system watchdog module <name>' or ensure the kernel auto-loads a watchdog driver." | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| # If a module is provided, verify it resolves via modprobe (dry-run) | ||||||||||||||||||
| if module: | ||||||||||||||||||
| # Dry-run modprobe (-n) in quiet mode (-q) verifies availability without loading | ||||||||||||||||||
| rc = run(f'modprobe -n -q {module}') | ||||||||||||||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only issue here is that it doesn't verify the specified module is a valid watchdog driver. As per @dmbaturin's comment, probably best to build a validator or list of watchdog modules enabled in the VyOS kernel. |
||||||||||||||||||
| if rc != 0: | ||||||||||||||||||
| raise ConfigError( | ||||||||||||||||||
| f"Watchdog module '{module}' was not found or cannot be loaded" | ||||||||||||||||||
| ) | ||||||||||||||||||
|
|
||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def generate(watchdog): | ||||||||||||||||||
| # If watchdog node removed entirely, clean up everything | ||||||||||||||||||
| if watchdog is None: | ||||||||||||||||||
| watchdog_config_file.unlink(missing_ok=True) | ||||||||||||||||||
| modules_load_file.unlink(missing_ok=True) | ||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
| # Persist kernel module autoload on boot if specified (even if not enabled) | ||||||||||||||||||
| module = watchdog.get('module') | ||||||||||||||||||
| if module: | ||||||||||||||||||
| try: | ||||||||||||||||||
Firefishy marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
| modules_load_directory.mkdir(exist_ok=True) | ||||||||||||||||||
| modules_load_file.write_text(f"{module}\n") | ||||||||||||||||||
| except Exception as e: | ||||||||||||||||||
| print(f"Warning: Failed writing modules-load configuration: {e}") | ||||||||||||||||||
|
Comment on lines
+88
to
+92
|
||||||||||||||||||
| else: | ||||||||||||||||||
| # If module option removed, drop persisted autoload file | ||||||||||||||||||
| modules_load_file.unlink(missing_ok=True) | ||||||||||||||||||
|
|
||||||||||||||||||
| # Try to load kernel module if specified and /dev/watchdog0 is missing | ||||||||||||||||||
| if not WATCHDOG_DEV.exists(): | ||||||||||||||||||
| if module: | ||||||||||||||||||
| # Try to load the module using vyos call wrapper for logging/airbag integration | ||||||||||||||||||
| try: | ||||||||||||||||||
| call(f'modprobe {module}') | ||||||||||||||||||
| except Exception as e: | ||||||||||||||||||
| print(f"Warning: Could not load watchdog module '{module}': {e}") | ||||||||||||||||||
|
Comment on lines
+101
to
+104
|
||||||||||||||||||
| # Re-check for device | ||||||||||||||||||
| if not WATCHDOG_DEV.exists(): | ||||||||||||||||||
| print( | ||||||||||||||||||
| "Warning: /dev/watchdog0 not found. Systemd watchdog will not be enabled." | ||||||||||||||||||
| ) | ||||||||||||||||||
| watchdog_config_file.unlink(missing_ok=True) | ||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
| # Ensure the directory exists | ||||||||||||||||||
| watchdog_config_dir.mkdir(parents=True, exist_ok=True) | ||||||||||||||||||
|
|
||||||||||||||||||
| # Pass through configured time values directly as seconds | ||||||||||||||||||
| render(str(watchdog_config_file), 'system/watchdog.conf.j2', watchdog) | ||||||||||||||||||
|
|
||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| def apply(watchdog): | ||||||||||||||||||
Firefishy marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||
| # Reload systemd daemon to apply/unload the watchdog configuration | ||||||||||||||||||
| # The watchdog settings take immediate effect after systemd is reloaded | ||||||||||||||||||
| call('systemctl daemon-reload') | ||||||||||||||||||
|
|
||||||||||||||||||
| return None | ||||||||||||||||||
|
|
||||||||||||||||||
|
|
||||||||||||||||||
| if __name__ == '__main__': | ||||||||||||||||||
| try: | ||||||||||||||||||
| c = get_config() | ||||||||||||||||||
| verify(c) | ||||||||||||||||||
| generate(c) | ||||||||||||||||||
| apply(c) | ||||||||||||||||||
| except ConfigError as e: | ||||||||||||||||||
| print(e) | ||||||||||||||||||
| exit(1) | ||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally prefer set-time validation over commit-time validation whenever possible, since it provides a faster feedback and the user doesn't have to commit to discover that some options are wrong. Two questions from this perspective: