Skip to content

Commit dd0a1c9

Browse files
committed
First pass at troubleshooting guide
1 parent 53a6421 commit dd0a1c9

File tree

1 file changed

+349
-0
lines changed

1 file changed

+349
-0
lines changed

doc/dev/troubleshooting.md

Lines changed: 349 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,349 @@
1+
# Troubleshooting
2+
3+
## Where is ...?
4+
5+
### Instrument-specific bluesky code
6+
7+
This is located in each instrument's `inst` script area, with their other instrument-specific scripts.
8+
9+
`c:\Instrument\Settings\config\NDX<inst>\configurations\Python\inst\bluesky`
10+
11+
### This library (`ibex_bluesky_core`)
12+
13+
In a clean installation of IBEX, `ibex_bluesky_core` is installed as a python module via pip into the
14+
[uktena](https://github.com/IsisComputingGroup/uktena) environment which is in `c:\instrument\apps\python3`.
15+
16+
However, for instruments which have been testing bluesky and may have versions of `ibex_bluesky_core` with local
17+
modifications, a version of `ibex_bluesky_core` may be checked out into `c:\instrument\dev\ibex_bluesky_core` on the
18+
instrument, and this will have been editable-installed into the `uktena` environment.
19+
20+
### Other bluesky libraries (`ophyd_async`, `bluesky`, ...)
21+
22+
`ophyd_async` and `bluesky` are installed as python modules. via pip, into the
23+
[uktena](https://github.com/IsisComputingGroup/uktena) environment which is in `c:\instrument\apps\python3` because
24+
they are specified in `pyproject.toml` of `ibex_bluesky_core`, which itself is a dependency of `uktena`.
25+
26+
### Logs
27+
28+
Log files are stored in `C:\Instrument\Var\logs\bluesky`.
29+
30+
```{note}
31+
Older log files will be moved by the log rotation script to
32+
`\\isis\inst$\Backups$\stage-deleted\ndx<instrument>\Instrument\Var\logs\bluesky`
33+
```
34+
35+
The default log level is `INFO` and all messages from `ibex_bluesky_core`, `bluesky` and `ophyd_async` are captured.
36+
37+
If you need to increase this to `DEBUG` to isolate an issue, you can do so using
38+
{py:obj}`ibex_bluesky_core.log.set_bluesky_log_levels`.
39+
40+
### Scientist-facing data
41+
42+
Scientist-facing output files are written to `\\isis\inst$\NDX<inst>\user\test\scans\<current rb number>` by default.
43+
44+
The path can be customized by passing extra arguments to
45+
{py:obj}`ibex_bluesky_core.callbacks.file_logger.HumanReadableFileCallback` for the "human-readable" files, or
46+
{py:obj}`ibex_bluesky_core.callbacks.fitting.livefit_logger.LiveFitLogger` for the fit output files.
47+
48+
### Raw diagnostic data
49+
50+
Raw documents emitted by bluesky are stored in `C:\Instrument\Var\logs\bluesky\raw_documents` - these show the raw
51+
data emitted by bluesky scans. The filenames in this directory correspond to bluesky's scan ID, which is printed to
52+
the console at the end of each scan, and is also included as metadata in the scientist-facing output files.
53+
54+
These files are written by {py:obj}`ibex_bluesky_core.callbacks.document_logger.DocLoggingCallback`, which is subscribed
55+
to the run engine by default in {py:obj}`ibex_bluesky_core.run_engine.get_run_engine`.
56+
57+
```{note}
58+
Older raw documents will be moved by the log rotation script to
59+
`\\isis\inst$\Backups$\stage-deleted\ndx<instrument>\Instrument\Var\logs\bluesky\raw_documents`
60+
```
61+
62+
---
63+
64+
## How do I ...?
65+
66+
### Run a plan
67+
68+
To run a plan, you need to pass it to the `RE` object, which is available by default in the top-level python shell in
69+
the GUI.
70+
71+
For example:
72+
73+
```
74+
>>> RE(some_clever_plan(which, takes, some, arguments))
75+
```
76+
77+
Note that without the `RE(...)` call, the plan **does nothing**:
78+
79+
```
80+
>>> some_clever_plan(which, takes, some, arguments)
81+
<generator object scan at 0x000002286311D080>
82+
```
83+
84+
### Pause/stop/resume/abort a plan
85+
86+
While a plan is running, a single ctrl-c will stop gracefully after the next point has finished counting. A message
87+
like the following will be printed:
88+
89+
> A 'deferred pause' has been requested. The RunEngine will pause at the next checkpoint.
90+
> To pause immediately, hit Ctrl+C again in the next 10 seconds.
91+
> Deferred pause acknowledged. Continuing to checkpoint.
92+
93+
Two ctrl-c keystrokes within 10 seconds of each other will stop the plan immediately.
94+
95+
When a plan is paused, you will get a short exception message with a summary of your options:
96+
97+
```
98+
Pausing...
99+
Traceback (most recent call last):
100+
File "C:\Instrument\Apps\Python3\Lib\site-packages\IPython\core\interactiveshell.py", line 3579, in run_code
101+
exec(code_obj, self.user_global_ns, self.user_ns)
102+
File "<ipython-input-27-288a69b9f38e>", line 1, in <module>
103+
RE(bps.sleep(999))
104+
File "C:\Instrument\Apps\Python3\Lib\site-packages\bluesky\run_engine.py", line 976, in __call__
105+
raise RunEngineInterrupted(self.pause_msg) from None
106+
bluesky.utils.RunEngineInterrupted:
107+
Your RunEngine is entering a paused state. These are your options for changing
108+
the state of the RunEngine:
109+
110+
RE.resume() Resume the plan.
111+
RE.abort() Perform cleanup, then kill plan. Mark exit_stats='aborted'.
112+
RE.stop() Perform cleanup, then kill plan. Mark exit_status='success'.
113+
RE.halt() Emergency Stop: Do not perform cleanup --- just stop.
114+
```
115+
116+
At this point, you will be returned to an interactive shell. You must choose one of these options above before
117+
attempting further operations with the `RunEngine`.
118+
119+
- `RE.resume()` - resumes the scan from the point at which it was interrupted. The plan will complete as usual.
120+
- `RE.stop()` and `RE.abort()` are functionally identical - the plan will gracefully terminate, including running any
121+
cleanup actions and calling registered callbacks.
122+
- `RE.halt()` tells the RunEngine to drop dead and not do **anything** on the way out. This includes cleanup handlers,
123+
which for example may return motors to sensible positions or return DAE configuration to a state in which data can be
124+
taken.
125+
126+
If you are unsure exactly what state your `RunEngine` is in, `RE.state` can be used to check this. If you attempt to
127+
run another plan before choosing one of the options above, you will get an exception like:
128+
129+
```
130+
Traceback (most recent call last):
131+
File "C:\Instrument\Apps\Python3\Lib\site-packages\IPython\core\interactiveshell.py", line 3579, in run_code
132+
exec(code_obj, self.user_global_ns, self.user_ns)
133+
File "<ipython-input-29-288a69b9f38e>", line 1, in <module>
134+
RE(bps.sleep(999))
135+
File "C:\Instrument\Apps\Python3\Lib\site-packages\bluesky\run_engine.py", line 920, in __call__
136+
raise RuntimeError(f"The RunEngine is in a {self._state} state")
137+
RuntimeError: The RunEngine is in a paused state
138+
```
139+
140+
This means that you still need to choose how to continue from the previous interruption.
141+
142+
### Get the result of a plan
143+
144+
A plan will return a `RunEngineResult` object, which has a `plan_result` attribute:
145+
146+
```
147+
result = RE(some_plan())
148+
result.plan_result
149+
```
150+
151+
If a plan was interrupted and resumed later, the result is returned by the `RE.resume()` call:
152+
153+
```
154+
RE(some_plan())
155+
<KeyboardInterrupt>
156+
result = RE.resume()
157+
result.plan_result
158+
```
159+
160+
```{tip}
161+
If the result wasn't captured, it's still possible to access it in an IPython interactive shell using the special
162+
underscore variables, for example `result = _`.
163+
164+
This is not specific to bluesky - it's an IPython feature where `_` is bound to the return value of the last expression,
165+
`__` is bound to the result of the second-to-last expression, and so on.
166+
```
167+
168+
[Do not try to "hide" the `RunEngine` in a script/function](https://blueskyproject.io/bluesky/main/tutorial.html#plans-in-series).
169+
The `RE(...)` call should always be typed by the user, at the terminal.
170+
171+
### Connect a device
172+
173+
Bluesky plans need devices to be connected up-front. This is so that errors (such as PVs not existing) are detected
174+
as soon as possible, not mid-way through a scan.
175+
176+
Top-level plan wrappers should use the `ensure_connected` plan stub from `ophyd_async`:
177+
178+
```python
179+
from ophyd_async.plan_stubs import ensure_connected
180+
181+
182+
def top_level_plan(dae, block):
183+
yield from ensure_connected(dae, block)
184+
yield from scan(...)
185+
```
186+
187+
If you forget to do this, you will get a stack trace containing:
188+
189+
```
190+
NotImplementedError: No PV has been set as connect() has not been called
191+
```
192+
193+
### Debug `NotConnected` errors
194+
195+
If `ophyd_async` cannot connect to a PV, you will get an error that looks like:
196+
197+
```
198+
sample_changer2: NotConnected:
199+
readback: NotConnected: ca://TE:NDW2922:CS:SB:sample_changer2
200+
```
201+
202+
In the first instance, check that the relevant PV exists if you get it via `caget`.
203+
204+
If the device is a {py:obj}`block_rw<ibex_bluesky_core.devices.block.BlockRw>`, but the `:SP` record does not exist,
205+
you may use a `block_w` instead. This will both read and write to the same PV. Similarly, if `blockname:SP:RBV` does
206+
not exist, a {py:obj}`block_rw_rbv<ibex_bluesky_core.devices.block.BlockRwRbv>` cannot be used.
207+
208+
If the device is a {py:obj}`ibex_bluesky_core.devices.reflectometry.ReflParameter` and `redefine` fails to connect,
209+
pass `has_redefine=False` when constructing that parameter (this means you won't be able to redefine the position
210+
of this refl parameter).
211+
212+
If instead you get a `TypeError` that looks like:
213+
214+
```
215+
sample_changer: NotConnected:
216+
readback: TypeError: TE:NDW2922:CS:SB:sample_changer with inferred datatype float cannot be coerced to str
217+
```
218+
219+
This is because the **datatype of the underlying PV** does not match the **declared type in bluesky**. `ophyd_async`
220+
will not allow you to connect a `block_r(str, "some_block")` if `"some_block"` is a float-type PV. Every signal in
221+
`ophyd_async` is strongly typed.
222+
223+
### Change how `set` on a device behaves
224+
225+
For a {py:obj}`writable block<ibex_bluesky_core.devices.block.BlockRw>`, a `write_config` argument can be specified.
226+
See the options on {py:obj}`BlockWriteConfig<ibex_bluesky_core.devices.block.BlockWriteConfig>` for detailed options. These are specified
227+
when first creating the `block` object, which is likely to be in
228+
`\Instrument\Settings\config\NDX<inst>\configurations\Python\inst\bluesky\devices.py`, or dynamically created as part
229+
of a wrapper plan.
230+
231+
For complex cases, where a `set` being complete depends on multiple conditions, a custom `ophyd_async` device is usually
232+
the cleanest way to accomplish this. These can be defined in an instrument's `inst` scripts area if they are specific
233+
to one instrument, or can be promoted to `ibex_bluesky_core` if generally useful.
234+
235+
### Add extra sleeps into a plan
236+
237+
Generally you should not need to insert sleeps at the *plan* level - prefer instead to modify *devices* so that they
238+
accurately report when they have finished setting (including any necessary settle times, for example). Writable blocks
239+
have a {py:obj}`BlockWriteConfig<ibex_bluesky_core.devices.block.BlockWriteConfig>` which has a `settle_time_s`
240+
parameter for this purpose.
241+
242+
If you *really* need to sleep in a plan, rather than at the device level, use `yield from bps.sleep(duration_seconds)`.
243+
Do **not** just call `time.sleep()` in a plan - this will stall the event loop and interfere with keyboard interrupts,
244+
for example.
245+
246+
### Add exception-handling to a plan
247+
248+
In plans, because they are generators, you cannot just use `try-finally` or `try-except`.
249+
250+
Instead, bluesky provides wrappers that implement error handling:
251+
[`finalize_wrapper`](https://blueskyproject.io/bluesky/main/generated/bluesky.preprocessors.contingency_wrapper.html#bluesky.preprocessors.contingency_wrapper)
252+
and
253+
[`contingency_wrapper`](https://blueskyproject.io/bluesky/main/generated/bluesky.preprocessors.finalize_wrapper.html#bluesky.preprocessors.finalize_wrapper).
254+
255+
### Debug DAE failing to begin
256+
257+
Check the DAE can begin a run without using bluesky.
258+
259+
If you are counting using periods, and using the spectrum-data map, this imposes a limit of {math}`5000000` on number of
260+
`periods * (spectra+1) * (time channels+1)`. This means that scans with high numbers of points may fail, where those
261+
with fewer points will work.
262+
263+
### Debug a failed fit
264+
265+
Fitting will only ever start after at least as many points have been collected, as there are free parameters in the
266+
fitting model. For example, a guassian fit will not be attempted with less than 4 data points. This can occur if a
267+
scan is interrupted early, for example.
268+
269+
Fits can also fail if the data is particularly poor.
270+
271+
### Debug a bad fit
272+
273+
Consider whether you can use a different fitting model. Some models are more prone to getting stuck in local minima than
274+
others.
275+
276+
For peak-like data, `"com"` or `"max"` from bluesky's
277+
[`PeakStats callback`](https://blueskyproject.io/bluesky/main/callbacks.html#peakstats) can robustly give *an* answer,
278+
although often this will not be as good as a full fit and may not always be appropriate.
279+
280+
### Debug keyboard-interrupt problems
281+
282+
If a double ctrl-c does nothing, this is probably because the event loop is stalled - which is likely the result of a
283+
plan or device doing I/O or other synchronous operations directly, rather than via `asyncio` or `yield from`.
284+
285+
Find the offending function and replace it with it's bluesky-native equivalent. Python's `asyncio` module has a debug
286+
mode which warns on long-running tasks - this can help identify the offending call.
287+
288+
---
289+
290+
## What is this syntax ...?
291+
292+
### `async def` / `await` / `asyncio`
293+
294+
Functions declared as `async def` are [python **coroutines**](https://docs.python.org/3/library/asyncio-task.html#).
295+
296+
In a bluesky context, if you see an `async def` function, it is likely that you are looking at either
297+
an `ophyd_async` device class, or a `RunEngine` message handler.
298+
299+
In order to get the result of a coroutine, it needs to be `await`ed:
300+
301+
```python
302+
async def foo():
303+
result = await some_coroutine()
304+
```
305+
306+
In an `async def` function, do not perform blocking operations. For example, always use `await asyncio.sleep()`,
307+
and not `time.sleep()`. Similarly, do not use synchronous functions which perform I/O from a coroutine (e.g. calling
308+
`genie` APIs) - use the `async` APIs provided by `ophyd_async` instead.
309+
310+
As a general rule of thumb, any function that could reasonably take more than a few tens of milliseconds, should not be
311+
called directly from a coroutine.
312+
313+
### `yield` / `yield from` / `Generator`
314+
315+
Functions containing `yield` and/or `yield from` are [python **generators**](https://wiki.python.org/moin/Generators).
316+
317+
If you see `yield from` in a function, you are likely to be looking at a
318+
[bluesky plan](https://blueskyproject.io/bluesky/main/plans.html#).
319+
320+
`yield from` is the syntax used to delegate to subplans:
321+
322+
```python
323+
def some_plan():
324+
yield from move(...)
325+
yield from scan(...)
326+
```
327+
328+
Will perform an initial move followed by a scan.
329+
330+
Return values from plans can be captured:
331+
332+
```python
333+
def plan_that_returns():
334+
yield from something()
335+
return "hello, world"
336+
337+
def plan():
338+
returned_value = yield from plan_that_returns()
339+
```
340+
341+
Like `coroutines` above, long-running functions or functions which perform I/O should not be used in plans. Doing so
342+
can stall the `RunEngine`'s internal event loop, break keyboard-interrupts, and break rewindability. Use bluesky-native
343+
functionality instead - for example:
344+
- [`yield from bps.sleep()`](https://blueskyproject.io/bluesky/main/generated/bluesky.plan_stubs.sleep.html#bluesky.plan_stubs.sleep) instead of `time.sleep()`
345+
- [`yield from bps.mv(block_object, value)`](https://blueskyproject.io/bluesky/main/generated/bluesky.plan_stubs.mv.html#bluesky.plan_stubs.mv) instead of `g.cset(block_name, value)`
346+
- [`value = yield from bps.rd(block_object)`](https://blueskyproject.io/bluesky/main/generated/bluesky.plan_stubs.rd.html#bluesky.plan_stubs.rd) instead of `g.cget(block_name)["value"]`
347+
348+
If you must call an external function which may be long running or do I/O from a plan, review
349+
[call_sync](../plan_stubs/external_code).

0 commit comments

Comments
 (0)