Skip to content

C++: Taint flow to formatting function return values. #3533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented May 20, 2020

Strictly speaking, taint should flow from a tainted argument to a formatting function such as sprintf to the return value (the number of characters written / error code). This is much less important than the existing flow to the output parameter, but could conceivably come up as allowing a user to control the size of an allocation or something like that.

This is consistent with how we handle strftime (the only similar case I could find) and is the final part of https://jira.semmle.com/browse/CPP-466 (which I am keen to close).

The new test results in format.cpp are on lines that look like this:

sink(mysprintf(buffer, 256, "%s", string::source()));

@geoffw0 geoffw0 added the C++ label May 20, 2020
@geoffw0 geoffw0 requested a review from a team as a code owner May 20, 2020 17:07
@jbj
Copy link
Contributor

jbj commented May 25, 2020

Isn't the return value of sprintf very similar to the return value of strlen? And problematic for the same reasons. We might want this taint step in DefaultTaintTracking for the sake of backwards compatibility, but I'm less keen on adding it to models so all taint libraries have it.

I know I change my opinion about what taint means about once per month, but it's currently this: a tainted expression can be influenced by an attacker to have unusual values. An unusual value for an integer is almost never the length of a string.

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 26, 2020

a tainted expression can be influenced by an attacker to have unusual values.

I like this definition. 'unusual' is a bit open to interpretation, but it captures the idea that booleans and things like string lengths don't necessarily need to be considered tainted just because they're computed from things that are.

The old security taint tracking library did consider these return values tainted, but I don't think it was an important detail in practice (real world or samate tests, for example). The change was motivated by consistency with the old code and our handling of strftime. I'll:

  • close this PR
  • open another one to deal with strftime instead
  • close https://jira.semmle.com/browse/CPP-466 (all other parts already resolved)
  • look for somewhere suitable in our QLDoc to state your definition of taint (at least until someone comes up with a better one)

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 26, 2020

Do you think taint should flow through strlen in the models library? (it currently does)

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 26, 2020

Closed in favour of #3569.

@jbj
Copy link
Contributor

jbj commented May 27, 2020

Do you think taint should flow through strlen in the models library? (it currently does)

No, I don't think it should. Right now it's excluded in DefaultTaintTracking.qll, but I'd rather have it excluded in PureStrFunction.hasTaintFlow so the exclusion also reaches semmle.code.cpp.ir.dataflow.TaintTracking.

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 28, 2020

I'm slightly cautious because I don't want to port every hack from the old taint tracking library to the new one, but I agree. I'll make a PR for that change as well.

@geoffw0
Copy link
Contributor Author

geoffw0 commented May 29, 2020

The strlen PR: #3592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants