-
-
Couldn't load subscription status.
- Fork 599
Harmonize annotations with python/typeshed#8608 #1027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/typeshed#8608 introduced annotations for `create` which are not fully reflected here. In order to reflect that state into `jsonschema`, a new module, `jsonschema._typing` is introduced. The purpose of `_typing` is to be a singular place for the library to define type aliases and any typing-related utilities which may be needed. This will let us use aliases like `_typing.JsonValue` in many locations where any valid JSON datatype is accepted. The definitions can be refined over time as necessary. Initial definitions in `_typing` are: - Schema (any JSON object) - JsonObject (any JSON object) - JsonValue (any JSON value, including objects or sequences) `Schema` is just another name for `JsonObject`. Perhaps it is not needed, but the name may help clarify things to a reader. It is not obvious at present whether or not it is a good or bad idea to notate it as such, but a similar Schema alias is defined in typeshed and seems to be working there to annotate things accurately. These types are using `Mapping` and `Sequence` rather than `dict` or `list`. The rationale is that `jsonschema`'s logic does not dictate that the objects used *must* be defined in stdlib types or subtypes thereof. For example, a `collections.UserDict` could be used and should be accepted by the library (`UserDict` wraps but does not inherit from `dict`.)
|
Thanks! No worries on the time shortage, I hear you! Very much appreciate the PR, this is a great plan overall. Will leave some minor comments. (There's also #1022 from @DanielNoord outstanding to look at, which is driven by typing-related concerns, just in case you haven't seen it). |
| @@ -0,0 +1,14 @@ | |||
| """ | |||
| Type aliases and utilities to help manage `typing` usage and type annotations. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Definitely support adding this, thanks!)
| from collections.abc import Mapping, Sequence | ||
| import typing | ||
|
|
||
| Schema = Mapping[str, typing.Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schema is tricky to nail down for multiple reasons. Its precise type depends on the specific dialect of JSON Schema involved, and (speaking off the top of my head without double checking, which counts for lots of the comments I'm about to make --)
applicable_validatorsmeans that technically its type is literallyAny(or some dependent type), because someone can decide they're inventing a JSON Schema dialect where the number17is a schema and means something, andapplicable_validatorswill give the library what it needs to deal with that as a schema. I'm honestly not sure what to do about that.- Ignoring the above, the "internal" type to all JSON Schema "official" drafts is -- for later drafts,
bool | Mapping[str, typing.Any], and for earlier onesMapping[str, typing.Any]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely going to have to circle back to try to better grasp what's sensible here.
In line with my other comment about maybe accepting some technical inaccuracy in the annotations as a pragmatic approach, I'm instinctively inclined towards bool | Mapping[...].
|
|
||
| JsonObject = Mapping[str, typing.Any] | ||
|
|
||
| JsonValue = typing.Union[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incomplete -- someone can come along and invent validator callbacks which only work if someone deserialized into Decimal, and as long as they ensure they've done so in their loading of JSON, their code works. In other words, I'm somewhat afraid this is really just Any in "real life".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might have a tension here between annotations which are useful vs ones which are completely accurate.
If we annotate things as Any, we ensure that nothing which is correct at runtime would be flagged by a type checker. But I'm concerned that making this accurate with Any will also mean that we can't catch common errors.
My example would be some flask-like framework, and checking validate(request) vs validate(request.json).
Is it okay to suggest that anyone using type checking needs to use typing.cast or type: ignore in these cases?
The type really is Any for a lot of methods like json.load, but I hope my desire to provide a more tightly scoped type is understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to think about what I think honestly.
To me, all I am "convinced" is useful of at the minute (and I'm quite convinced it's very useful indeed) is that typing forces documentation to have correctly documented types. That's the reason I'm quite picky about making sure that things which work "accidentally" because we abuse some type internally do not in fact get represented as the type -- because later down the line I will not support those internal abuses if we need to change them for whatever reason. They're not part of the public API, they don't have stability guarantees.
Here, the opposite is true -- not documenting the type here actually means "regressing" something that is indeed public API, and which I very much do intend folks to rely on (by which I don't mean I have no reservations that it will bite me in the ass, but I do mean it's something I think people should consider public API at this point, type annotation or not, and I don't intend to break it).
And I know we don't actually regress, since I'm pretty sure there are tests covering the Any behavior at least in part -- but if the goal (at least the goal in my head) is "be able to say every object in jsonschema has its type documented explicitly", it complicates that goal if the documentation again needs tweaking to broaden types into their real public API.
(So... "I don't know yet" :D)
| def create( | ||
| meta_schema, | ||
| validators=(), | ||
| meta_schema: _typing.Schema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to double check (which I will be lazy and not do here) but Unlike _typing.Schema, meta-schemas (in the library, but possibly also according to the spec, I forget) must be objects I think, not bool | thing, so you'll need two types I suspect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes sense. I need to think harder about what to do with schemas more broadly, but we now have a clean place where we can define _typing.MetaSchema. I'll make that tweak.
| meta_schema, | ||
| validators=(), | ||
| meta_schema: _typing.Schema, | ||
| validators: Mapping[str, _ValidatorCallback] | tuple[()] = (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public API is just Mapping[str, _ValidatorCallback], not tuple. I know mypy or whatever is unhappy with that, so if that means just adding type: ignore to get it to shut up about the default arg let's do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm cool with that way of handling these!
I had some other ideas which I like less, but another option just occured to me.
With _typing we could do this:
# in the definition
validators: Mapping[str, _ValidatorCallback] = _typing.EmptySequence
# in _typing
EmptySequence = typing.cast(typing.Any, ())Because Any is a subtype of everything, the rules get wonky, and mypy and other checkers should be okay with this.
That way, _typing.EmptySequence can be used in any location where the empty tuple is used today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems also fine with me (although I think the fact that () works is more a quirk of dict than a generic empty sequence type, but I'll leave pedantry out until/unless it comes back to rear its head I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it help that I also considered
EmptyMap = pyrsistent.freeze({})
?
😁
I just know that this pattern occurs several times in the code, with () as an empty default. So it seems like it might be nice to sweep any messiness associated with it into _typing.
| format_checker: _format.FormatChecker = _format.draft202012_format_checker, | ||
| id_of: typing.Callable[[_typing.Schema], str] = _id_of, | ||
| applicable_validators: typing.Callable[ | ||
| [_typing.Schema], typing.Iterable[tuple[str, _ValidatorCallback]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This has to do with the above, the type for applicable_validators is really Any -> tuple(...))
|
Nice to see you have found some more time this! One question though, why did you choose to make the module private? A lot of the typing ends up in the public API as well, so users would then need to import from a private module to make their own annotations be correct. |
I appreciate the question -- and it's a bit of a balancing act, IMO. For example, we have in this PR being used in Is that particular type alias a public property of the library? I would say no -- and I think my inclination not to expose that particular type is agreeable to most developers. It's an internal convenience for authors here to be able to write things more clearly and succinctly, just like any other What about validate(typing.cast(jsonschema.typing.JsonValue, my_object))But that grows the scope of this changeset! Suddenly, I'm not just trying to make some annotations accurate, but I'm also proposing a new public API which I hope that makes sense -- and I do intend to get you a good reply on some other threads (notably #1017)! But until I do, thanks for the interest in and contributions towards a well-typed |
|
I'm going to close this, since I don't think there's any chance we would merge it (with how the code has evolved, TBH, I can't remember what my full plan was here, but it probably didn't involve my past couple of super-busy years (hey, have you heard of that PEP I wrote? Didn't plan on that! 😂 ). |
I haven't had time to focus on
jsonschematyping much recently, but I want to get things back on track. The first thing to do is to correct any drift betweenjsonschemaandtypeshed.I'm going to try to produce a steady stream of very small PRs which get things back in sync.
This is therefore hopefully the first of a few changes.
python/typeshed#8608 introduced annotations for
createwhich are not fully reflected here.In order to reflect that state into
jsonschema, a new module,jsonschema._typingis introduced. The purpose of_typingis to be a singular place for the library to define type aliases and any typing-related utilities which may be needed. This will let us use aliases like_typing.JsonValuein many locations where any valid JSON datatype is accepted.The definitions can be refined over time as necessary.
Initial definitions in
_typingare:Schemais just another name forJsonObject. Perhaps it is not needed, but the name may help clarify things to a reader. It is not obvious at present whether or not it is a good or bad idea to notate it as such, but a similar Schema alias is defined in typeshed and seems to be working there to annotate things accurately.These types are using
MappingandSequencerather thandictorlist. The rationale is thatjsonschema's logic does not dictate that the objects used must be defined in stdlib types or subtypes thereof. For example, acollections.UserDictcould be used and should be accepted by the library (UserDictwraps but does not inherit fromdict.)📚 Documentation preview 📚: https://python-jsonschema--1027.org.readthedocs.build/en/1027/