-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Java: Add data-flow consistency checks. #2921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java: Add data-flow consistency checks. #2921
Conversation
java/ql/src/semmle/code/java/dataflow/internal/DataFlowImplConsistency.qll
Outdated
Show resolved
Hide resolved
java/ql/src/semmle/code/java/dataflow/internal/DataFlowImplConsistency.qll
Show resolved
Hide resolved
java/ql/src/semmle/code/java/dataflow/internal/DataFlowImplConsistency.qll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the proposed resolution for the two unresolved questions in the PR description.
The next step can be to duplicate this for other languages and import the new module from a few new one-line ql files in the dataflow test directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschackmull, I have questions about two of the consistency checks. They currently don't hold for Go (because I went with a very lazy implementation of post-update nodes), and I'm wondering how bad that is.
} | ||
|
||
query predicate postIsNotPre(PostUpdateNode n, string msg) { | ||
n.getPreUpdateNode() = n and msg = "PostUpdateNode should not equal its pre-update node." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment briefly on what happens if this condition is violated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flow that exits a method call through a side-effect on an argument may re-enter the method. E.g. in the following you'll get wrong flow from source
to sink
if you have a call to someMethod
where the PostUpdateNode
for the argument equals the ArgumentNode
:
someMethod(A a) {
sink(a.f);
a.f = source();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
exists(int c | | ||
c = count(n.getPreUpdateNode()) and | ||
c != 1 and | ||
msg = "PostUpdateNode should have one pre-update node but has " + c + "." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example is that a store to a PostUpdateNode
that doesn't have a corresponding pre-update won't be able to exit through a parameter: in someMethod(A a) { a.f = source(); }
the store targets the post-update read of a
in the qualifier of a.f
, combined with local value flow from the ParameterNode
to the corresponding pre-update node, the store is observed as a side-effect of the method and can then exit through the parameter and reach a PostUpdateNode
of an argument in a call to someMethod
. I don't really have an overview of what would happen if a PostUpdateNode
has multiple pre-update nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I was mostly asking about the case of multiple pre-update nodes. Glad to hear it's not known to mess things up massively.
Thanks for your responses, @aschackmull. I think we'll stick with our current simple implementation of post-update nodes for Go, and switch to a more sophisticated model when we start seeing too much imprecision. |
Should the new module be named |
For these kinds of queries "consistency" is the preferred term over "sanity". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great initiative, @aschackmull .
cpp/ql/src/semmle/code/cpp/dataflow/internal/DataFlowImplConsistency.qll
Outdated
Show resolved
Hide resolved
query predicate uniqueTypeRepr(Node n, string msg) { | ||
exists(int c | | ||
n instanceof RelevantNode and | ||
c = count(getErasedRepr(n.getTypeBound())) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use getErasedNodeTypeBound
here instead...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires an additional import of DataFlowImplCommon
, so perhaps just leave as-is?
As it stands, this PR just introduces a lot of unreachable code in qll files. Is there any value to merging that before we have ql/qlref files that reference it? For C/C++ (AST and IR), I suggest putting a |
I've added the C++ tests as well as 4 additional checks related to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to merge. I just hadn't noticed that 85d6b7c was pushed.
The results for C/C++ look wrong, but we can address that in a follow-up PR.
Let's get it in. I will add C# tests in a follow-up PR. |
This adds a number of consistency queries (i.e. queries that ought to return zero results) for the shared data-flow implementation. The main file is language-independent and can be synced across languages. The idea is that this should help document expectations and invariants in the shared implementation. For Java, these queries mostly give zero results, but extractor issues can result in nodes without callables, and the TypeFlow library can give multiple type bounds in some corner cases.
A few unresolved questions:
isImmutableOrUnobservable(Node n)
to disregard arguments withoutPostUpdateNode
s when they have been consciously excluded. Currently I've put this inDataFlowPrivate.qll
, but perhaps this predicate ought to go in its own file?<lang>/ql/test/
instead? A related question is how our plumming for consistency queries work at the moment and whether that gives some restrictions?When we are happy with the structure, I can sync this to C# and C++.