-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Data flow through address-of operator (&) #1884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
526f91e
to
3794a0b
Compare
3794a0b
to
5041f59
Compare
Updated to fix semantic merge conflict with #1900. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of test nits. I'm nervous about the potential false positive impact - have we done any real-world testing?
@@ -245,6 +264,7 @@ edges | |||
| by_reference.cpp:51:10:51:20 | call to getDirectly | by_reference.cpp:50:17:50:26 | call to user_input | by_reference.cpp:51:10:51:20 | call to getDirectly | call to getDirectly flows from $@ | by_reference.cpp:50:17:50:26 | call to user_input | call to user_input | | |||
| by_reference.cpp:57:10:57:22 | call to getIndirectly | by_reference.cpp:56:19:56:28 | call to user_input | by_reference.cpp:57:10:57:22 | call to getIndirectly | call to getIndirectly flows from $@ | by_reference.cpp:56:19:56:28 | call to user_input | call to user_input | | |||
| by_reference.cpp:63:10:63:28 | call to getThroughNonMember | by_reference.cpp:62:25:62:34 | call to user_input | by_reference.cpp:63:10:63:28 | call to getThroughNonMember | call to getThroughNonMember flows from $@ | by_reference.cpp:62:25:62:34 | call to user_input | call to user_input | | |||
| by_reference.cpp:69:8:69:20 | call to nonMemberGetA | by_reference.cpp:68:21:68:30 | call to user_input | by_reference.cpp:69:8:69:20 | call to nonMemberGetA | call to nonMemberGetA flows from $@ | by_reference.cpp:68:21:68:30 | call to user_input | call to user_input | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the comment for this in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -12,7 +12,7 @@ struct Outer { | |||
}; | |||
|
|||
void absink(struct AB *ab) { | |||
sink(ab->a); // flow x3 [NOT DETECTED] | |||
sink(ab->a); // flow (three paths) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better if it were three separate sources. Losing a path would be easier to miss than losing a source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I don't think we have too many standard queries that would be affected by this. I just tested four data flow queries on MySQL, and this PR made no difference. I take comfort in the fact that my original data flow library prototype (from 2017) had flow through both The IR data flow libraries do not have flow through |
LGTM. Just did some real-world testing of some dataflow queries on Qt with and without this change. No differences in results or measurable differences in performance. @rdmarsh2 how do you feel about this PR now? |
That's reassuring. I'm happy to merge once the test conflict is resolved. |
The data flow library conflates pointers and their objects in some places but not others. For example, a member function call `x.f()` will cause flow from `x` of type `T` to `this` of type `T*` inside `f`. It might be ideal to avoid that conflation, but that's not realistic without using the IR. We've had good experience in the taint tracking library with conflating pointers and objects, and it improves results for field flow, so perhaps it's time to try it out for all data flow.
These new results seem better than the previous ones, but the previous ones are still there. Perhaps the `Buffer.qll` library could use some adjustment, but this seems like an improvement in isolation.
This addresses PR comments by @rdmarsh2.
e041052
to
b2df18a
Compare
Rebased to fix conflicts with #1881. |
This PR is a less ambitious alternative to #1778.