Skip to content

C++: IR dataflow edges through outparams #2704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rdmarsh2
Copy link
Contributor

This connects ReturnIndirectionInstructions to ReadSideEffectInstructions in the data flow library by adding a new case to TReturnKind. It doesn't cover flow into functions via indirect parameters - I think that will need some changes to the shared library to use a class similar to ReturnKind for argument IDs.

@rdmarsh2 rdmarsh2 added the C++ label Jan 28, 2020
@rdmarsh2 rdmarsh2 requested review from a team as code owners January 28, 2020 01:12
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks promising! Please add tests to demonstrate that it works.

@@ -743,7 +743,7 @@ class ReturnValueInstruction extends ReturnInstruction {
final Instruction getReturnValue() { result = getReturnValueOperand().getDef() }
}

class ReturnIndirectionInstruction extends Instruction {
class ReturnIndirectionInstruction extends VariableInstruction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until now I thought that ReturnIndirectionInstruction was for simulating a read of a returned pointer. I can see now it's for simulating a read of a pointer parameter. Please add QLDoc.

@jbj
Copy link
Contributor

jbj commented Jan 31, 2020

I've got the input parameter indirection case covered in #2737, so it's fine that it's missing from this PR.

@rdmarsh2 rdmarsh2 force-pushed the rdmarsh/cpp/ir-flow-through-outparams branch from 399cb85 to 677f0f0 Compare February 3, 2020 23:15
@rdmarsh2
Copy link
Contributor Author

There's internal tests failing now. The changes in CWE-120 look incorrect, the others are restoring results that were lost when we enabled the IR. I'll look into the CWE-120 changes before I open an internal PR.

@rdmarsh2
Copy link
Contributor Author

It looks like the new false positives in the CWE-120 test come via the Chi chain on argv. I think everything is working as currently intended on the IR side. The easy patch would be to add accesses to argv as barriers in the query to prevent that chi chain flow. The harder solution would be to do an interprocedural analysis to identify functions that take a pointer to non-const but don't write to it and avoid generating side effect writes.

@jbj
Copy link
Contributor

jbj commented Mar 11, 2020

It looks like the new false positives in the CWE-120 test come via the Chi chain on argv. I think everything is working as currently intended on the IR side. The easy patch would be to add accesses to argv as barriers in the query to prevent that chi chain flow. The harder solution would be to do an interprocedural analysis to identify functions that take a pointer to non-const but don't write to it and avoid generating side effect writes.

I don't fully understand that problem. Can you turn it into a test case so we can see the IR? As I wrote in my initial review, I'd also like to see test cases that demonstrate that the new flow is working as intended.

@rdmarsh2
Copy link
Contributor Author

I added an example of that to ssa.cpp - you'll want to look at the unsound version for the moment. There's already tests for the dataflow portion that I added to cpp/ql/test/library-tests/dataflow/DefaultTaintTracking/defaulttainttracking.cpp and cpp/ql/test/library-tests/dataflow/dataflow-tests/test.cpp in earlier commits.

@geoffw0
Copy link
Contributor

geoffw0 commented Mar 18, 2020

This PR will fix 2 out of the 3 SAMATE test regressions identified in https://git.semmle.com/Semmle/code/pull/36464 (the CWE-78 and CWE-197 ones).

Is there anything I can do to help with this PR?

@jbj
Copy link
Contributor

jbj commented Mar 18, 2020

For the record, I believe these are the changes in CWE-120/.../UnboundedWrite.expected from the internal repo that show problematic flow from argv:

Diff:
--- expected
+++ actual
@@ -2,4 +2,9 @@
 | main.c:49:5:49:10 | call to strcpy | This 'call to strcpy' with input from $@ may overflow the destination. | main.c:113:15:113:18 | argv | argv |
 | main.c:49:5:49:10 | call to strcpy | This 'call to strcpy' with input from $@ may overflow the destination. | main.c:114:15:114:18 | argv | argv |
 | main.c:153:7:153:12 | call to strcpy | This 'call to strcpy' with input from $@ may overflow the destination. | main.c:149:20:149:25 | call to getenv | call to getenv |
+| main.c:181:5:181:11 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | main.c:113:15:113:18 | argv | argv |
+| main.c:181:5:181:11 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | main.c:114:15:114:18 | argv | argv |
+| main.c:181:5:181:11 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | main.c:115:15:115:18 | argv | argv |
+| main.c:181:5:181:11 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | main.c:116:15:116:18 | argv | argv |
+| main.c:181:5:181:11 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | main.c:117:15:117:18 | argv | argv |
 | main.c:181:5:181:11 | call to sprintf | This 'call to sprintf' with input from $@ may overflow the destination. | main.c:180:12:180:15 | argv | argv |

@jbj
Copy link
Contributor

jbj commented Mar 18, 2020

Sorry for taking so long to follow up on this.

The additional results in CWE-120 don't look terrible to me, and I don't think they're blocking for this PR. The query still has the same primary-location results as before, and we can address the duplication of sources in a separate PR. All these result changes show that this is a disruptive PR that can uncover other latent bugs, so we'll want to run CPP-Differences. I'll trigger one now.

@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Apr 3, 2020

I've been working my way through the CPP-Differences results with the path explanations from #3189. I'm seeing a lot of results where fields are getting conflated when a ReturnIndirection reads a chi node and then a BufferMayWriteSideEffect has a may-write to a whole allocation. I think #3118 might fix that.

@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Apr 3, 2020

#3118 doesn't immediately fix those. I'll look at it some more tomorrow.

@MathiasVP MathiasVP mentioned this pull request Apr 5, 2020
@jbj jbj added this to the 1.24 milestone Apr 7, 2020
@jbj
Copy link
Contributor

jbj commented Apr 8, 2020

The CPP-Differences results have arrived: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1025/

@jbj
Copy link
Contributor

jbj commented Apr 8, 2020

@rdmarsh2 It looks like your latest merge from master accidentally added dozens of little test files from your working copy 😬. I recommend rebasing them away.

@MathiasVP
Copy link
Contributor

MathiasVP commented Apr 8, 2020

I had a deeper look at the results now and came to a different set of conclusions:

g/openjdk/jdk

cpp/unbounded-write

  • 5 new TP:
    • socketTransport.c:215
    • main.cpp:459
    • socketTransport.c:123
    • socketTransport.c:116
  • 3 New TP that are excluded by jbj/ql@a715866:
    • libproc_impl.c:179
    • symtab.c:213
    • symtab.c:240

These should be recoverable with field flow. Overall jbj/ql@a715866 reduces query results from 92 to 89.

cpp/path-injection

  • New TP (but excluded by jbj@a715866):
    • symtab.c:222
    • symtab.c:244
    • symtab.c:232
    • libproc_impl.c:184
      • This one should be recoverable with field flow.

Overall jbj/ql@a715866 reduces query results from 16 to 8. The remaining results look like true positives.

cpp/uncontrolled-allocation-size

  • New FP (but excluded by jbj@a715866):
    • gifalloc.c:91
    • gifalloc.c:79

Overall jbj/ql@a715866 reduces query results from 4 to 0. All of them seem like false positives.

cpp/uncontrolled-process-operation

  • New TP:
    • java_md_solinux.c:550

jbj/ql@a715866 keeps the 2 query results. They both look like true positives.

g/torvalds/linux

cpp/path-injection

  • 1 new TP:
    • gen_init_cpio.c:445

Overall jbj/ql@a715866 reduces query results from 25 to 23. Both results look like true positives.

@jbj
Copy link
Contributor

jbj commented Apr 8, 2020

Here are my notes about the three snapshots I looked at. TL;DR: cherry-picking jbj@a715866 will fix all new FPs but move four TPs that should be recoverable with field flow.

Git

  • New TP from TaintedPath.ql on git/git in daemon.c:1476.
  • Everything else is FP, and it's fixed by the FieldAddressInstruction exclusion but not by merging from master

Wireshark

  • cpp/uncontrolled-allocation-size: 16 new FPs in PR head, all suppressed by merging from master.
  • cpp/path-injection:
    • new TP in capture_sync.c, preserved by the conflation suppression
    • Four new FP in lemon.c, all suppressed by merging from master

PHP

  • cpp/uncontrolled-allocation-size: all 155 results (all new FPs) fixed by the FieldAddressInstruction exclusion but not by merging from master
  • cpp/path-injection
    • The FieldAddressInstruction exclusion removes TPs in pchar.c, util.c, main.c, and phpdbg_list.c, where field conflation plays the role of field flow and happens to match up the right field names. All results are essentially the same code, just in different main-routines. Can we recover these results with field flow?
    • With output indirections + the FieldAddressInstruction exclusion we get two new results over master, and they are TP.

rdmarsh2 and others added 3 commits April 8, 2020 12:32
@rdmarsh2 rdmarsh2 force-pushed the rdmarsh/cpp/ir-flow-through-outparams branch from b09ff54 to 9f40886 Compare April 8, 2020 21:11
@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Apr 8, 2020

NEW: cpp/unbounded-write (src/java.base/unix/native/libjli/java_md_solinux.c:395)

I think that characters from the getenv call do reach the source buffer of the sprintf, but the string is bounded after the call to snprintf (macro-expanded from JLI_Snprintf) in GetJVMPath. Probably a good place for a query-specific barrier.

NEW: cpp/uncontrolled-process-operation (src/java.base/unix/native/libjli/java_md_solinux.c:550)

Fixed by merging from the latest master.

Can we recover these results with field flow?

I think so, but I hit a join order issue with DataFlowImplCommon::Cached::FlowThrough::Final::parameterValueFlow0#ffff#cur_delta on my merge branch. It looks like the join ordering issue is introduced by something in #3118 but the tuple counts are much larger in the merge branch.

rdmarsh2 added 3 commits April 8, 2020 16:30
This would otherwise have lost a good qltest result at
CWE-134/semmle/funcs/funcsLocal.c:58:9:58:10
@rdmarsh2 rdmarsh2 added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Apr 8, 2020
@rdmarsh2
Copy link
Contributor Author

rdmarsh2 commented Apr 9, 2020

The UnknownType case for flow through partial reads is doing field conflation... I'll take that case out.

@alexet alexet changed the base branch from master to rc/1.24 April 9, 2020 16:03
@MathiasVP
Copy link
Contributor

I think so, but I hit a join order issue with DataFlowImplCommon::Cached::FlowThrough::Final::parameterValueFlow0#ffff#cur_delta on my merge branch. It looks like the join ordering issue is introduced by something in #3118 but the tuple counts are much larger in the merge branch.

I've just been hit by this join order issue as well. I'm investigating it now.

@rdmarsh2
Copy link
Contributor Author

Latest CPP-Differences finished: https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1032

@rdmarsh2
Copy link
Contributor Author

Looking through the latest openjdk results, all the results that @MathiasVP mentioned going away with a cherry-pick of a715866 are back in the current PR after restoring flows through from the source value operand of partial loads of arrays and unions. Results on the other snapshots line up with the master merge and cherry pick at 9f40886

g/openjdk/jdk

cpp/unbounded-write

  • TPs restored as of 3b59118:
    • libproc_impl.c:179
    • symtab.c:213
    • symtab.c:240
      These are all explained by partial loads through arrays.

cpp/path-injection

  • TPs restored as of 3b59118:
    • symtab.c:222
    • symtab.c:244
    • symtab.c:232
    • libproc_impl.c:184
      The results in symtab.c are all explained by partial loads through arrays. The result in libproc_impl.c flows through a field of array type in a struct that is allocated in the same function as the sink, and the flow goes through a call to strcpy. That might mean there's precise copies in the IR, but I need to do more digging to be sure.

cpp/uncontrolled-allocation-size

  • FPs restored as of 3b59118
    • gifalloc.c:91
    • gifalloc.c:79
      I think these are FPs but only because there's an implicit bound at this bitwise "and". I don't think we need to fix them as part of this PR.

@rdmarsh2
Copy link
Contributor Author

The cpp/path-injection result at libproc_impl.c:184 in openjdk comes from pointer/object conflation for modeled functions in DefaultTaintTracking.qll.

@rdmarsh2
Copy link
Contributor Author

I'm now convinced that none of these result changes come from new incorrect flow edges.

@MathiasVP @jbj I think this is mergeable once the QL changes are reviewed.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MathiasVP MathiasVP merged commit 721e9d5 into github:rc/1.24 Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants