-
Notifications
You must be signed in to change notification settings - Fork 17
Fix line ending bug #90
Conversation
|
@jonasfj
|
|
I'm down for taking a look at this. I saw you had nice comments explaining some of the methods. That's awesome. Feel free to give some review hints in the PR description :) Also if any of these changes could be done independently, please do consider submitting smaller independent PRs. It's much easier to review, which often leads to faster turnaround time. |
2555b32 to
c7aec85
Compare
|
I've tried to split the 22 commits over 3 PRs based on the changes I made. |
lib/src/utils.dart
Outdated
| /// [YamlList] or a top-level [YamlScalar] of the [yaml] string provided. | ||
| /// | ||
| /// [currentEndOffset] represents the end offset of [YamlScalar] or [YamlList] | ||
| /// or [YamlMap] being replaced, that is, `end + 1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably better illustrated with an example.
/// ```dart
/// final yaml = 'my_key: "replaced-value"\n';
/// // ^--- currentEndOffset points to the newline
/// fimal currentEndOffset = 24;
/// ```ofcourse, I might have counted wrong here 🤣
Also does this mean that currentEndOffset can be greater than the length of the yaml document? (currentEndOffset == yaml.length, so yaml[currentEndOffset] might throw)
Example: k: v will have currentEndOffset: 4 (if I'm replacing v, right).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, if I'm asking dumb questions here, but it might be worth making some examples.
And I think we should make sure we're doing it right, if we get he wrong invariants we'll have a lot of bugs to fix :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also does this mean that
currentEndOffsetcan be greater than the length of theyamldocument? (currentEndOffset == yaml.length, soyaml[currentEndOffset]might throw)
currentEndOffset shouldn't be greater than yaml.length when passed in as an argument. May be I should throw if it is?
Example:
k: vwill havecurrentEndOffset: 4(if I'm replacingv, right).
I think 3 since it's always zero-based.
Sorry, if I'm asking dumb questions here, but it might be worth making some examples.
And I think we should make sure we're doing it right, if we get he wrong invariants we'll have a lot of bugs to fix :D
Sure, will add examples.
lib/src/utils.dart
Outdated
| /// Normalizes an encoded [YamlNode] encoded as a string by pruning any | ||
| /// dangling line-breaks. | ||
| /// | ||
| /// This function checks the last `YamlNode` of the [update] that is a | ||
| /// `YamlScalar` and removes any unwanted line-break within the | ||
| /// [updateAsString]. | ||
| /// | ||
| /// This is achieved by obtaining the chunk of the [yaml] that is after the | ||
| /// current node being replaced using its [nodeToReplaceEndOffset]. If: | ||
| /// 1. The chunk has any trailing line-break then the it is left untouched. | ||
| /// 2. The node being replaced with [update] is not the last node, then it | ||
| /// is left untouched. | ||
| /// 3. The terminal node in [update] is a `YamlScalar`, that is, | ||
| /// the last [YamlNode] within the [update] that is not a collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider adding some example.
Also why is it that we don't do this normalization when we create the updateAsString? Wouldn't that be more correct -- I don't know -- I probably don't understand all the context here, that's why I'm asking :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I’ll add an example.
I do that because yamlEncodeBlock has no context of which method called it or the current structure of the yaml document. It could be:
updateInListor_appendToBlockListor_insertInBlockListfor lists_replaceInBlockMapor_addToBlockMapfor block mapsYaml.updatefor top levelYamlNode.
Thus, the callers should make a call to the function to prune the trailing line break because they have an idea of the yaml's structure.
Additionally, I found it quite advantageous on our part to try and include line breaks for an existing YamlNode being replaced to edits. This reduces bugs and removes the need to try and guess where the next line break is.
It’s better to include it and check later if it was there just before suggesting an edit using a definite endOffset
|
Feel free to mark comments as "Resolved" once you've:
I think it would be useful to add some examples in the code. Especially for |
> Update function doc and inline comments > Make function return named record > Prefer named parameters > Refactor existing code referencing function
|
I've added commits I had moved to the other PRs. Without the changes, the The existing code was only scanning for the first instance of # Existing code works great here for edits
- valueToRemove # Comment with hyphen (-)
- nextValue
# Existing code fails horribly here for edits
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
- nextValue
I realized we can do it
|
| /// Returns the `endOffset` of the last comment extracted that is `end + 1` | ||
| /// and a `List<String> comments`. It is recommended (but not necessary) that | ||
| /// the caller checks the `endOffset` is still within the bounds of the [yaml]. | ||
| ({int endOffset, List<String> comments}) skipAndExtractCommentsInBlock( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give me a few examples of how this works.
Or maybe a few unit tests, even if we don't keep them, I'd like to understand a bit more what the intention here is.
It seems like it takes a string slice, and returns:
- Comments from within the slice (not sure why)
- Offset where the next non-comment thing in the slice begins, or ends? (or did I misunderstand that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it takes a string slice, and returns:
- Comments from within the slice (not sure why)
- Offset where the next non-comment thing in the slice begins, or ends? (or did I misunderstand that?)
Yes. It returns where the non-comment starts but with a nuanced approach based on the bool argument you pass into the greedy parameter.
Could you give me a few examples of how this works.
Yeah, sure. See the examples below.
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
- nextValuegreedyastruereturns theoffsetof the-indicating the start of the next value in the list.greedyasfalsereturns theoffsetof the last\n(line-break) + 1 which is just after the last closing bracket)
Or maybe a few unit tests, even if we don't keep them, I'd like to understand a bit more what the intention here is.
A nice example even without the unit tests would be an element in a list or map with multi-line comments being removed. For example:
Performing YamlEditor(yaml).remove([0]); on the string below:
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
- nextValue
# Expected output without this PR
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
- nextValue
# Expected output with PR
- nextValueThe offending code can be found here which is similar to _removeFromBlockMap:
https://github.com/dart-lang/yaml_edit/blob/35f4248c7bbba289b3899fa55486e2f31ef1a8c5/lib/src/list_mutations.dart#L343-L374
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disclaimer: These thoughts are not fully conceived, I'm still trying to understand.
So if the input for YamlEditor(yaml).remove([0]) is:
- valueToRemove # Comment with hyphen (-)
# but spanning (oops, hyphen "-")
#
# multiple lines (another hyphen "-")
# nextValue is awesome:
- nextValueThen with this PR we'd get:
- nextValueRight?
I'm just highlighting this example, because detecting that a comment is spanning multiple lines is difficult.
It's kind of obvious to us humans that the comment # nextValue is awesome: is associated with - nextValue and that all the comments with equal indentation are related to - valueToRemove.
But if we tried to make a heuristic to detect this, such heuristic would fail if comments aren't perfectly aligned 🙈 I can see no end of corner cases 🤣
So maybe we have to ask, when faced with YamlEditor(yaml).remove([0]) on:
# A
# B
- valueToRemove # C
# D
# E
# F
# G
- nextValue # H
# IWhat comments do we wish to remove / preserve?
At the top of my head, I'm thinking:
HandIshould definition be preserved.- Removing
Cseems reasonable. - Do we want to treat
AandBthe same way?- Would we want to detect that
AandBare different based on indentation?
- Would we want to detect that
- Do we want to treat
D,E,FandGthe same way?- Would we want to detect that
DandEare not the same based on indentation? - Would we want to detect that
EandFare not the same based on empty line? - Would we want to detect that
FandGare not the same based on indentation?
- Would we want to detect that
I'm suspecting that maybe it's simplest to say that:
AandBare treated the same way: we preserve them.D,E,FandGare treated the same way: we preserve them.
It could be that it's best to say retain comments, unless they are inside the value we're removing (or on the same line). We risk leaving some half gabled comments, but which is worse:
- Removing too many comments?
- Retaining half a comment related to a value that's been removed?
If we make heuristics for all this to be more accurate, don't we risk things becoming extremely complicated? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right?
Yes.
Funnily enough, I totally understand your point of view 😄. That’s how my head was spinning before I wrote skipAndExtractComments.
However, I reduced this PR’s scope to actually do what YamlEditor currently WANTS to do while ensuring we can have folded/literal strings. This is still a PoC.
From the PR, you can see I didn’t change how YamlEditor works just nudged it to do what existing code wants to do.
Personally, I don’t see it as complexity but rather as a challenge to live up to the package’s description the RIGHT WAY since if we provide a way to preserve comments, wouldn’t it be great to at least provide a subjective way to remove them?
If you are open to it, I can come up with an issue by the end of the week with my view and where we can both float ideas on how to tackle this and once we settle on something I will submit a PR. If not, then please indicate what we can keep from this PR and what not to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, then please indicate what we can keep from this PR and what not to keep.
I'll have to admit that I'm not entirely sure what this PR does 🙈
Also that we have things like skipAndExtractCommentsInBlock but we only use endOffset from it, scares me a bit. It might actually makes sense, if I understood what the future plans for skipAndExtractCommentsInBlock was 🤣
If you are open to it, I can come up with an issue by the end of the week with my view and where we can both float ideas on how to tackle this and once we settle on something I will submit a PR.
I think that's a good idea!
Let's outline:
- What problems we're trying to solve.
- What the limitations to our approach will be.
- What will the downsides be? (if we're doing heuristics, there will be scenarios where it works less than perfect, what do those scenarios look like).
once we settle on something I will submit a PR.
Small PRs are MUCH easier to land!
If we can't do it all in one PR, maybe we should create an intermediate branch, try to land small PRs there and then eventually merge the branch into main.
There isn't a whole lot of development on main, so it's not like there'll be a lot of merge conflicts 🤣 (benefits of working in a small repository).
Of course, there is a non-trival risk that we get stuck somewhere along the way don't land anything.
But if we try this route we could perhaps consider writing a test cases upfront.
Maybe we could extend the setup in test/testdata to support more kinds of tests, and maybe also support tests that are known to fail now, but that we want to fix.
Sometimes it's a lot easier to discuss things through examples.
Personally, I don’t see it as complexity but rather as a challenge to live up to the package’s description the RIGHT WAY since if we provide a way to preserve comments, wouldn’t it be great to at least provide a subjective way to remove them?
Yes and no. Also scratching my head and wondering how ambitious a package description we wrote 🙈
Some background, this package started as part of GSoC, the original aim was to enable us to create a dart pub add <package> command.
Since then it's also be used for dart pub upgrade --major-versions, dart pub upgrade --tighten, and a few other things left and right.
We needed some way to preserve comments, because while JSON is a subset of YAML, I don't think anyone would like to use dart pub add if it parsed your pubspec.yaml added the package and then saved pubspec.yaml formatted as JSON. As people frequently have comments in their pubspec.yaml.
But we also don't really care to be perfectionist. Ideally, we just want to not cause comments and whitespace from parts of the YAML file unrelated to the modification we're making from being changed. But if a comment next to the thing we're changing becomes garbled or whitespace changes a bit, then perhaps that's okay'ish.
In theory I wouldn't object to this package having support for reading, stripping, modifying and removing comments. In theory I also wouldn't mind this package using heuristics to guess whether a comment belongs to a value being removed or not.
In practice this package is used to make small changes to YAML configuration files. If we don't limit the scope of the package, then it'll have an endless stream of bugs and feature requests.
It might be better declare that some behavior or feature is out of scope, than it is to have a buggy implementation. It's not like we have lot resources to invest in the maintenance of this package.
IMO, we should aim for (in order of priority):
- (1) Reliably produce:
- Valid YAML
- A semantically correct mutation
(_performEdit ensures this, by parsing and comparing the result, throwing if there are errors)
- (2) Avoid internal errors
- Bugs
- Inconsistencies caught by
_performEdit
- (3) Stable API and predictable behavior.
- (4) Preserve comments and whitespace in parts of the YAML document unrelated to the mutation.
- (5) Reasonable heuristics for preservation or removal of comments and whitespace in part of the YAML document affected by the mutation.
I think that when adding features or tweaking (5) we should consider what implication it might have for maintainability and remember that the objective is to modify small configuration files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to admit that I'm not entirely sure what this PR does 🙈
-
We can now encode strings with a trailing line-break as
ScalarStyle.FOLDEDorScalarStyle.LITERALinstead of falling back to double quotes. See: -
Because we can do
1above, it comes with a cost. The edit we suggest may have a line-break already since we ensure we preserve any dangling line-breaks after callingnormalizeEncodedBlockwhich:- Preserves line-breaks for
ScalarStyle.FOLDEDorScalarStyle.LITERALor if the value of theYamlScalaritself has a line-break - Checks if the node being replaced had a line-break and preserves it in the new encoded block
- Preserves line-breaks for
The cost is to ensure we include any existing line-break in the edit without trying to introduce any breaking behavior to the current YamlEditor functionality.
From my previous comment, YamlEditor CURRENTLY wants to get rid of inline comments in the terminal node but does it poorly causing issues with edits with some of the tests actually enabling the bug. (See some of the failing tests when this PR is run. I’ll highlight them on request)
Thus, why I added skipAndExtractComments which skips the inline comments if that’s what YamlEditor wanted to do.
I must admit, maybe the function is a bit overzealous and returns the comments it skipped but I thought why not.
Also, this PR was just a PoC. Nothing more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kekavc24 how about you start an issue outline what we want to do.
And we try to reach consensus on that first.
I think perhaps it's best to try write tests and make small changes (many small PRs are infinitely better, than large ones). It's more work to split logic into separate PRs. But the challenge is also getting there without introducing bugs (actually, it's getting there while convincing maintainers of this package that we're not introducing new bugs).
|
Closing as the dart-lang/yaml_edit repository is merged into the dart-lang/tools monorepo. Please re-open this PR there! |
I debugged dart-lang/tools#1944 further while submitting #91 and noted several issues that are closely coupled:
YamlListorYamlMapand never aYamlScalarThis PR:
YamlNodeencoded within a block.normalizeEncodeBlockfunction that dynamically prunes the additional line break if:YamlScalarisn't encoded asScalarStyle.LITERALorScalarStyle.FOLDED. Also fixes Literal and folded strings cannot handle "\n \n" at the end of a string dart-lang/tools#1944skipAndExtractCommentsfunction which greedily skips any comments and whitespace belonging to theYamlNodebeing replaced.Further changes made in #93
Contribution guidelines:
dart format.Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.