Skip to content

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Nov 11, 2021

Fixes point 2 in https://github.com/github/codeql-c-team/issues/732.

I recommend a commit-by-commit review. Here's an overview of what each commit does:

  • 10bca35 changes a test that tests path explanations. I thought C++: Better InlineExpectation tests for path-explanations #7051 did all the necessary changes to that test so I didn't have to do in this PR, but apparently, I was wrong. This commit is necessary to show some of the bad effects of a later commit (f308be7).
  • dbcd4d6 does what the title says of the PR says. This makes us lose a bunch of good flow (which is why we had ReferenceToInstruction in the list in the first place).
  • f308be7 fixes the lost flow. This is the most intricate part of this PR. An unfortunate side-effect of this commit is that it duplicates some paths that originate from argv.
  • Accepts the test changes. Notice the duplication of paths in the annotate_path_to_sink directory.

…h with the previous node (instead of its source). This gives a better overview of the path.
@MathiasVP MathiasVP added the C++ label Nov 11, 2021
@MathiasVP MathiasVP requested a review from a team as a code owner November 11, 2021 10:36
… interpret as a load. This makes use lose a bunch of flow, and we'll restore this flow in the next commit.
@MathiasVP MathiasVP force-pushed the remove-reference-to-as-load branch 2 times, most recently from 02a75a4 to 92af522 Compare November 11, 2021 11:07
@MathiasVP
Copy link
Contributor Author

Hm. Looks like this is breaking all sorts of things in the internal repo :(. I'll convert this to a draft and investigate.

@MathiasVP MathiasVP marked this pull request as draft November 11, 2021 11:51
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed your code changes, though for the changes inside IR dataflow this hasn't been especially deep.

Results look good to me (diff of commits 2..4).

Have you checked performance?

// If there is more than one predecessor for this node
// we specify the source location explicitly.
n > 1 and
exists(TaintedWithPath::PathNode pred | pred = getAPredecessor(node) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also check irTaint(source, pred, tag)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think it should.

@@ -376,13 +380,13 @@ module TaintedWithPath {
*/
class TaintTrackingConfiguration extends TSingleton {
/** Override this to specify which elements are sources in this configuration. */
predicate isSource(Expr source) { exists(getNodeForSource(source)) }
predicate isSource(Element source) { exists(getNodeForSource(source)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (and perhaps a few related ones) looks like it will break existing user code. Is there a way we can keep existing code that overrides isSource(Expr source) working?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's also what I'm worried about. I haven't yet found a good hack around this that would allow us to keep the isSource(Expr) working.

We introduced this TaintTrackingConfiguration mechanism in #3189 to get path explanations to the old queries that still used the security.TaintTracking library (which was changed to map to DefaultTaintTracking). As @jbj mentioned in here new queries really shouldn't be importing security.TaintTracking. The QLDoc on the security.TaintTracking file even says that users should prefer to use semmle.code.cpp.dataflow.TaintTracking when designing new queries. And luckily, since actual DataFlow::Configurations (which is the one you write if you're using the semmle.code.cpp.dataflow.TaintTracking library) doesn't have this Expr vs. Element problem since that's working purely with DataFlow::Nodes instead.

With all that said, I'm still in the process a couple of small bugs. Once I've fixed those, I'll go back and think about a solution that has better backwards compatibility.

@MathiasVP
Copy link
Contributor Author

I've reviewed your code changes, though for the changes inside IR dataflow this hasn't been especially deep.

Results look good to me (diff of commits 2..4).

Have you checked performance?

Thanks for the quick review! Performance looks fine. I've back-linked to the DCA run, so you should be able to find it now.

@MathiasVP MathiasVP force-pushed the remove-reference-to-as-load branch 2 times, most recently from 0bf7fc5 to c157080 Compare November 15, 2021 16:29
@MathiasVP MathiasVP force-pushed the remove-reference-to-as-load branch from c157080 to 44e32a6 Compare November 22, 2021 11:41
…st, it gives us some new good flow (yay). Second, it causes some duplication of results that uses 'argv' as a taint source. The duplication isn't very bad, though. And since it is only for paths that start at 'argv', I think we can live with it for now.
@MathiasVP MathiasVP force-pushed the remove-reference-to-as-load branch from 44e32a6 to 21167f4 Compare November 22, 2021 13:04
@MathiasVP
Copy link
Contributor Author

MathiasVP commented Nov 22, 2021

Completely getting rid of the path duplication turns out to be more tricky than I anticipated, so I've removed the additional code for this. This is just a UI problem for very specific cases:

  1. It requires the use of DefaultTaintTracking without an isSource override.
  2. It requires flow to start at argv.
  3. It requires a combination of both argv and &argv.

So I don't think we should spend too much time on it.

So now this PR just gets rid of an ugly hack left in there from #6825 and adds a bunch of new good flow. I'll take the PR out of draft once we have results from DCA.

@MathiasVP MathiasVP marked this pull request as ready for review November 23, 2021 08:43
@MathiasVP
Copy link
Contributor Author

I've looked at the result changes. All the new results are good (🎉!) and the two lost results are due to two results that are reported twice on main, but only reported once after this PR. So we don't actually lose any results.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the new results are good

Actually I think many of the cpp/unbounded-write results are false positives - though I think due to weaknesses in the query logic rather than data flow (and its only a @precision medium query).

The cpp/uncontrolled-arithmetic changes look great, as do the test changes.

👍

@MathiasVP
Copy link
Contributor Author

Actually I think many of the cpp/unbounded-write results are false positives - though I think due to weaknesses in the query logic rather than data flow (and its only a @precision medium query).

Indeed. By "good" I mean that they're all results that we expect from dataflow improvements, but really they're FPs caused by query logic. Thanks for the clarification!

@geoffw0 geoffw0 merged commit 3e1164f into github:main Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants