Skip to content
68 changes: 68 additions & 0 deletions spec/formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,71 @@ the local variable takes precedence.

It is an error for a local variable definition to
refer to a local variable that's defined after it in the message.

## Error Handling

During the formatting of a message,
various errors may be encountered.
These are divided into the following categories:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick / editorial:
How do you feel about having a small list with titles only, almost like a TOC
Followed by the details, with examples.

It would help with the "big picture", vs something spread over many paragraphs.

These are divided into the following categories:
* Syntax errors
* Data Model errors
    * Variant Key Mismatch errors
    * Missing Fallback Variant errors
* Resolution errors
    * Unresolved Variable errors
    * Unknown Function errors
* Selection errors
    * Selector errors
* Formatting errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how this started out, before the examples got added...

Let's revisit this when we've more of the spec gathered up? The solution for this section is likely to look very similar to other sections as well.


- **Syntax errors** occur when the syntax representation of a message is invalid.
- **Resolution errors** occur when the runtime value of a part of a message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what this means.

Can we say "function resolution"?
There are very few things that can go wrong:

  • local variable definitions => similar to placeholders
  • plain text parts
  • placeholders => have a variable / literal part + function name + bag of options
  • selectors => have again variable(s) / literal part + function name + bag of options

We already have unresolved as a class of errors a bit below.
So, what else can go wrong in the (already parsed) parts above?
I think only function names?

So what about we call this section "Function resolution"?
Instead of "resolution errors", where "resolution" is not explained in the spec, and we don't even agree it is needed. We might agree that it is an implementation detail. So there is no need to mention it in the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the "resolution" and "formatting" errors that's proposed here is perhaps best seen by considering how a Placeholder containing an Expression is handled. Let's say that we have something like

{(foo) :message}

or

{(user_age) :global}

where :message and/or :global is a custom Function that uses its argument to look up a value from elsewhere, and that this value is then formatted as a part of the final message.

During the formatting of this, the custom code could then emit two different sorts of errors:

  1. Resolution Error if there's a failure in getting the value that is to be formatted, e.g. if no foo message is available or if the user_age global is not set.
  2. Formatting Error if the found value can't be formatted, e.g. because the foo message includes a variable reference that can't be resolved, or the user_age value turns out to have some unexpected shape.

Would you agree that these are different error categories, and that this categorical split could lead to different error handling in user code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user_age isn't set, isn't that an unresolved variable error?

If no foo message is available, that would be an internal error of the format function message, right? Does that mean "resolution error" is really "function internal error"?

In any case, I think I would move this item below some of the others here (perhaps to the bottom of the list), since I find myself thinking that this could also mean unresolved or formatting error when in fact this error would probably only occur later (only when the pattern is syntactically correct and all of the variables and functions have been resolved but there is still a problem).

Or am I still not understanding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both foo and user_age are literal values here, so the errors are coming from the :message and :global custom functions; from the PoV of the core implementation, the are no variables here to resolve.

The intent here is to allow for a custom formatting function to emit two different kinds of errors: Resolution and Formatting. This is meant to enable something like :message or :global to work from an end-user PoV as much as possible like core features such as $var.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I don't think what I am saying here affects the spec, but the answer above.

I think it is contradicting:

  • Resolution ... failure in getting the value that is to be formatted
  • Formatting ... includes a variable reference that can't be resolved ...

It is unclear what is the difference between the two: "get the value" and "resolve variable reference"
The "publicly visible" operation is probably "I have a variable named foo, I want to get the value"
There is no "reference" except deep in the implementation.

But that implementation detail should not "leak" in the kind of error I am getting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the syntax example for the above discussion is this placeholder:

{(foo) :message}

The foo here is not a variable reference, it's a literal value that a custom :message function is interpreting as a message identifier.

cannot be determined.

- **Unresolved Variable errors** occur when a variable reference cannot be resolved.

- **Selection errors** occur when message selection fails.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one bullet (Selection errors) with only one sub-bullet (Selector errors)
Is there a difference? Are there any other kind of "Selection errors" other then "Selector errors"

Otherwise feels redundant, like having:

* Syntax errors
   * Invalid syntax errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave the selection + selector error categories as is for now, but adjust them later after #322 is resolved.


- **Selector errors** are failures in the matching of a key to a specific selector.
- **Missing Fallback errors** occur when no Variant is selected
due to the message not including a Variant with only catch-all keys.

- **Formatting errors** occur during the formatting of a resolved value,
for example when encountering a value with an unsupported type
or an internally inconsistent set of options.

During selection, an expression handler must only emit Resolution and Selection errors.
During formatting, an expression handler must only emit Resolution and Formatting errors.

In all cases, when encountering an error,
a message formatter must provide some representation of the message.
An informative error or errors must also be separately provided.

When an error occurs in the syntax or resolution of an Expression or MarkupStart Option,
the Expression or MarkupStart in question is processed as if the option were not defined.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not mention MarkupStart, as there is no agreement on markup.

Copy link
Collaborator Author

@eemeli eemeli Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you objecting here out of principle, or do you specifically think that failures in Expression and Markup options should be handled differently?

This may allow for the fallback handling described below to be avoided,
though an error must still be emitted.

When an error occurs within a Selector,
the selector must not match any VariantKey other than the catch-all `*`
and a Selector error is emitted.
When selection fails to match any Variant,
an empty string is used as the formatted string representation of the message
and a Missing Fallback error is emitted.

When an error occurs in a Placeholder that is being formatted,
the fallback string representation of the Placeholder
always starts with U+007B LEFT CURLY BRACKET `{`
and ends with U+007D RIGHT CURLY BRACKET `}`.
Comment on lines +210 to +211
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have a bidi consideration here? If the string being formatted is in Arabic, we might emit an FSI before the { and a PDI after the }. Format patterns use lots of neutrals (such as $ and :) and look like gibberish in a bidi context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be covered by the text in PR #315? The intent there is to define a prefix and a suffix to a part's formatted string representation, which here would be something like {$foo}.

Between the brackets, the following contents are used:

- Expression with Literal Operand: U+0028 LEFT PARENTHESIS `(`
followed by the value of the Literal,
and then by U+0029 RIGHT PARENTHESIS `)`
- Expression with Variable Operand: U+0024 DOLLAR SIGN `$`
followed by the Variable Name of the Operand
- Expression with no Operand: U+003A COLON `:` followed by the Expression Name
- Markup start: U+002B PLUS SIGN `+` followed by the MarkupStart Name
- Markup end: U+002D HYPHEN-MINUS `-` followed by the MarkupEnd Name
- Otherwise: The U+FFFD REPLACEMENT CHARACTER `�` character
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: change this to "the string between { and }, as is.
I would assume we get here is none of the cases before was encountered.

So probably a syntax error (bad sigil, for example, or bad format for parameters):
"Encountered {?count} penguins" is more useful than "Encountered {�} penguins"
Or
"Encountered {$count :number invalid options ?? format} penguins"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is catching anything not covered by the above. I think that's currently an empty set, as syntax errors use {�} or {fallback string} for the whole message. So this should never happen, but let's still include it to make sure that we don't end up with undefined behaviour.

I would strongly prefer not including any explicit reference to syntax source here, as that would require keeping a reference to it in all implementations, and might not apply at all if the source is not an MF2 syntax message.


For example, the formatted string representation of the expression `{$foo :bar}`
would be `{$foo}` if the variable could not be resolved.

The formatted string representation of a message with an unrecoverable syntax error
is the concatenation of U+007B LEFT CURLY BRACKET `{`,
a string identifier for the message,
and U+007D RIGHT CURLY BRACKET `}`.
If an identifier is not available,
it is replaced with the U+FFFD REPLACEMENT CHARACTER `�` character,
resulting in the string `{�}`.