-
Notifications
You must be signed in to change notification settings - Fork 7
Make control over response parsing more fine-grained #125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Make control over response parsing more fine-grained #125
Conversation
Wow, Adam, I see you spent a lot of time on this. I have some concerns if the solution may not be too complex… But I’ll check the details first and get back to you. |
So yes it does allow some complexity... if you want it. But you don't have to use these new bits! |
I’ll start with some practical use cases I get in my mind for this feature, and I’ll compare the old and new proposal (feel free to blame me if I ignored other obvious use cases!). Two notes in advance:
1. Retrieve status of responseOriginal Version: http:get($URL, map { 'status-only': true() }) ? headers ? status New Solution: http:get($URL, map {
'parse-response': map { 'entity': $http:parse-mode-status }
} ? headers ? status 2. Parse result as CSVOriginal Version: http:get($URL, map { 'parse-bodies': false() }) ! csv:parse(?body) New Solution: http:get($URL, map {
'parse-response': map { 'entity': $http:parse-entity-raw }
}) ! csv:parse(?body) 3. Parse multipart response, return results with media-type text/html as XML:Original Version: http:get('http://expath.org/xyz', map { 'parse-bodies': false() }) ? body
[?headers?Content-Type = 'text/html'] ! html:parse(.) New Solution: http:get($URL, map {
'parse-response': map { 'mode': $http:parse-mode-multipart-raw } (: not sure? :)
}) ? body
[?headers?Content-Type = 'text/html'] ! html:parse(.) To be honest, I’m still not sure if I chose the correct options. And there may be combinations that create identical results. An example (but I may be wrong?): map { 'parse-response': map { 'entity': $http:parse-entity-none } }
map { 'parse-response': map { 'mode': $http:parse-mode-headers } } To summarize my feeling: I am not sure if we really need all the flexibility, and if we don’t lose our users here? Maybe we can find a simpler solution again? And maybe we provide some options for which there will hardly be any use cases in the wild (but which we’d all need to check through in an implementation)? Some thoughts on the status quo:
So maybe one helpful requirement could be that we can get results as binaries or as strings. I will choose the other extreme now and provide another minimal suggestion for the parsing option:
There are various things that are not covered by this proposal. Examples:
Maybe it’s not so bad:
Talking about multipart bodies, one frequent use case is that the returned bodies have different media-types and, thus, need to be parsed differently. In that case, if the implicit processing is not good enough, the relevant bodies would need to be parsed in a subsequent step anyway. This is something (I believe) that is not really covered by any of our current proposals right now. It’s not covered by my last proposal either: Some results might be strings, others might be binary, but we’ll need to treat all of them identically. The EXPath Binary Module allows us to convert binary items to strings (and probably most implementations have their own custom functions for doing that). But it’s something people might ask for if we provide them with a more complex solution. – On the other hand, the big majority of processed responses won’t be multipart, or can hopefully be processed implicitly, so maybe that’s something we shouldn’t lose too much time on. Many thoughts… @adamretter, what do you think? |
I’ll create a new issue soon in which I’ll summarize some requirements for a more flexible response handling as is available in 1.0. |
@ChristianGruen Let me first make some small corrections to your examples. It's better I do so in a separate comment rather than editing yours, so that we can see the differences. 1. Retrieve status of responseOriginal Version: http:get($URL, map { 'status-only': true() }) ? headers ? status (Christian's example) New Solution: http:get($URL, map {
'parse-response': map { 'entity': $http:parse-mode-status }
} ? headers ? status New Solution: http:get($URL, map {
'parse-response': map { 'mode': $http:parse-mode-status }
} ? headers ? status Note you need to use 2. Parse result as CSVOriginal Version: http:get($URL, map { 'parse-bodies': false() }) ! csv:parse(?body) (Christian's example) New Solution: http:get($URL, map {
'parse-response': map { 'entity': $http:parse-entity-raw }
}) ! csv:parse(?body) New Solution: http:get($URL, map {
'parse-response': map { 'mode': $http:parse-entity-text }
}) ! csv:parse(?body) Note I don't know the signature of the 3. Parse multipart response, return results with media-type text/html as XML:Original Version: http:get('http://expath.org/xyz', map { 'parse-bodies': false() }) ? body
[?headers?Content-Type = 'text/html'] ! html:parse(.) (Christian's example) New Solution: http:get($URL, map {
'parse-response': map { 'mode': $http:parse-mode-multipart-raw } (: not sure? :)
}) ? body
[?headers?Content-Type = 'text/html'] ! html:parse(.) New Solution: http:get($URL, map {
'parse-response': map { 'entity': $http:parse-entity-raw }
}) ? body
[?headers?Content-Type = 'text/html'] ! html:parse(.) Note: You need |
So to clarify, my approach was two pronged:
For example, prior to this PR, the HTTP Client Module had the ability to implicitly parse the HTTP response entity(s) to either XML However, as one example, prior to this PR if the HTTP response was returned with a bad media-type, but the user knew that the content was actually XML or JSON, they were left on their own to parse this into XML or JSON somehow. This is amplified when you consider that the HTTP Client 1.0 module also supported a HTML to XML parsing mechanism. Consider the use-case 3 from above: http:get($URL, map {
'parse-response': map { 'entity': $http:parse-entity-raw }
}) ? body
[?headers?Content-Type = 'text/html'] ! html:parse(.) This sadly replies on the user to handle the hard parsing bit. We should empower the user to do that if they wish, but we should not force them to. IMHO - If we have already had to implement such XML/JSON/HTML->XML parsing in our module implementations anyway, we should likely make it available to the user. As such, with this PR, use-case 3 from above (assuming a multipart response, with two parts, the first text and the second HTML) can simply be rewritten to: http:get($URL, map {
'parse-response': map { 'entity': ($http:parse-entity-text, $http:parse-entity-html-to-xml) }
}) NOTE we no longer need the Actually if the Media types in the HTTP response are correct, we could just have used ...but wait, `$http:parse-entity-auto happens to be the default, so we can further simplify our use-case 3 to just: http:get($URL) ...and yes, that gives us the HTML parsing to XML where expected. |
To address some of @ChristianGruen's comments:
So I am glad that you wrote some code examples, it is always good to see real syntax. My feeling is that the map nesting, e.g.
map {
'parse-response': map { 'entity': $http:parse-entity-raw }
} you could just write: map {
'parse-response': $http:parse-response-entity-raw
}
Exactly! I wanted to avoid this where easily possible.
Got you covered! That is already possible with this PR :-) If you take a look at the definition for the parse response option Where a multipart response is expected, a sequence of values may be specified. If there are less values than multipart parts, then the last value will apply to the remaining multipart parts. |
@adamretter: I see our ideas on response handling still differ. Maybe we could include some other people and see how they judge the different approaches? But I promise I will start another attempt to understand your proposal in full detail. Did you already have a look at my latest proposal in #127? We could incorporate the idea to also allow sequences for multipart responses (although we should keep in mind that such a solution requires that the order of multipart sections won’t change). And we should definitely open a separate issue to clarify how we want to handle HTML to XML conversion. In the current specification, there is no parsing rule for HTML data, so |
Yes, I thought that this virtual And I mixed up the usage of And (see above)… map { 'parse-response': map { 'entity': $http:parse-entity-none } }
map { 'parse-response': map { 'mode': $http:parse-mode-headers } } …won’t these two queries yield the same response? I’m still wondering if there won’t be less confusion if we only had one option? |
This gives the user much more control over the response parsing, including the ability to choose the parser for the response when they know better than the response's content type.
Probably needs some editorial to make the language more human friendly, but assuming people are happy with the approach I think the editorial can come in a later PR.
Closes #108