Skip to content

Java: Add data-flow consistency checks. #2921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 30, 2020

Conversation

aschackmull
Copy link
Contributor

This adds a number of consistency queries (i.e. queries that ought to return zero results) for the shared data-flow implementation. The main file is language-independent and can be synced across languages. The idea is that this should help document expectations and invariants in the shared implementation. For Java, these queries mostly give zero results, but extractor issues can result in nodes without callables, and the TypeFlow library can give multiple type bounds in some corner cases.

A few unresolved questions:

  • One of the checks needs a tiny bit of customisation via the predicate isImmutableOrUnobservable(Node n) to disregard arguments without PostUpdateNodes when they have been consciously excluded. Currently I've put this in DataFlowPrivate.qll, but perhaps this predicate ought to go in its own file?
  • The checks currently reside next to the implementation, as this was the easiest way to access the language-specific imports. Should this move to somewhere in <lang>/ql/test/ instead? A related question is how our plumming for consistency queries work at the moment and whether that gives some restrictions?

When we are happy with the structure, I can sync this to C# and C++.

jbj
jbj previously approved these changes Feb 27, 2020
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the proposed resolution for the two unresolved questions in the PR description.

The next step can be to duplicate this for other languages and import the new module from a few new one-line ql files in the dataflow test directories.

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschackmull, I have questions about two of the consistency checks. They currently don't hold for Go (because I went with a very lazy implementation of post-update nodes), and I'm wondering how bad that is.

}

query predicate postIsNotPre(PostUpdateNode n, string msg) {
n.getPreUpdateNode() = n and msg = "PostUpdateNode should not equal its pre-update node."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment briefly on what happens if this condition is violated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flow that exits a method call through a side-effect on an argument may re-enter the method. E.g. in the following you'll get wrong flow from source to sink if you have a call to someMethod where the PostUpdateNode for the argument equals the ArgumentNode:

someMethod(A a) {
  sink(a.f);
  a.f = source();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

exists(int c |
c = count(n.getPreUpdateNode()) and
c != 1 and
msg = "PostUpdateNode should have one pre-update node but has " + c + "."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is that a store to a PostUpdateNode that doesn't have a corresponding pre-update won't be able to exit through a parameter: in someMethod(A a) { a.f = source(); } the store targets the post-update read of a in the qualifier of a.f, combined with local value flow from the ParameterNode to the corresponding pre-update node, the store is observed as a side-effect of the method and can then exit through the parameter and reach a PostUpdateNode of an argument in a call to someMethod. I don't really have an overview of what would happen if a PostUpdateNode has multiple pre-update nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was mostly asking about the case of multiple pre-update nodes. Glad to hear it's not known to mess things up massively.

@max-schaefer
Copy link
Contributor

Thanks for your responses, @aschackmull. I think we'll stick with our current simple implementation of post-update nodes for Go, and switch to a more sophisticated model when we start seeing too much imprecision.

@MathiasVP
Copy link
Contributor

Should the new module be named DataFlowSanity (or something similar) for consistency with the InstructionSanity module in Instruction.qll?

@aschackmull
Copy link
Contributor Author

Should the new module be named DataFlowSanity (or something similar) for consistency with the InstructionSanity module in Instruction.qll?

For these kinds of queries "consistency" is the preferred term over "sanity".

Copy link
Contributor

@hvitved hvitved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great initiative, @aschackmull .

query predicate uniqueTypeRepr(Node n, string msg) {
exists(int c |
n instanceof RelevantNode and
c = count(getErasedRepr(n.getTypeBound())) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use getErasedNodeTypeBound here instead...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That requires an additional import of DataFlowImplCommon, so perhaps just leave as-is?

hvitved
hvitved previously approved these changes Mar 10, 2020
@jbj
Copy link
Contributor

jbj commented Mar 11, 2020

As it stands, this PR just introduces a lot of unreachable code in qll files. Is there any value to merging that before we have ql/qlref files that reference it?

For C/C++ (AST and IR), I suggest putting a qlref file in cpp/ql/test/library-tests/dataflow/fields, cpp/ql/test/library-tests/dataflow/dataflow-tests, and cpp/ql/test/library-tests/syntax-zoo.

@aschackmull
Copy link
Contributor Author

I've added the C++ tests as well as 4 additional checks related to toString and location.

@MathiasVP MathiasVP mentioned this pull request Mar 24, 2020
@aschackmull
Copy link
Contributor Author

@hvitved @jbj Anything holding this back? I'd like to get it in before it starts to bitrot.

Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to merge. I just hadn't noticed that 85d6b7c was pushed.

The results for C/C++ look wrong, but we can address that in a follow-up PR.

@hvitved
Copy link
Contributor

hvitved commented Mar 30, 2020

Let's get it in. I will add C# tests in a follow-up PR.

@hvitved hvitved merged commit 9fa9c10 into github:master Mar 30, 2020
@aschackmull aschackmull deleted the dataflow/consistency-checks branch March 30, 2020 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants