Skip to content

CPP: Don't taint the return value of strlen #3592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 29, 2020
Merged

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented May 29, 2020

Remove taint flow to the return value of strlen and similar functions. The current thinking is that string lengths should not be considered tainted - see #3533 where we decided we didn't want the returned lengths from formatting functions to be considered tainted (and discussed strlen), and #3569 where we removed taint flow to the return value of strftime for this reason. This approach should lead to better query results (fewer false positives where the existence of taint is highly dubious).

This approach can be seen as old wisdom from the security taint library (now DefaultTaintTracking), which had long made an exception of strlen for practical reasons. Now that this is encoded in our models library, I was able to remove one of these exceptions from DefaultTaintTracking without affecting results.

@geoffw0 geoffw0 added the C++ label May 29, 2020
@geoffw0 geoffw0 requested a review from a team as a code owner May 29, 2020 13:19
override predicate hasSpecificReadSideEffect(ParameterIndex i, boolean buffer) {
getParameter(i).getUnspecifiedType() instanceof PointerType and
buffer = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a good alternative to repeating all these predicates? As I understand it, all we want is to exclude strlen and friends from the taint rule. Could that be done with a subclass that just overrides the taint predicate to none()? It might require repeating part of the charpred, but maybe that can be pulled out to a private predicate or class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could certainly avoid the repetition with this strategy, but I think there will be a cost in comprehensibility. I consider the repeated predicates here to have a very low mental cost, because this kind of repetition is expected across the models library and the repeated predicate bodies themselves are either trivial or very simple. Like rows in a database.

If we do make this change, I'd definitely want to add the helper predicate for the StrLenFunction charpred, because otherwise we'd be solving repetition in one place by adding repetition in another. That means we'd be adding a helper predicate and complicating the class hierarchy. Probably slightly fewer lines of code, but arguably harder to read overall.

Another approach for improving things here might be to add helpful tools / defaults to the abstract classes in the model interfaces, so that this stuff can be expressed more concisely in the most common cases (e.g. having the three AliasFunction predicates default to none() so that you only need to override the one(s) you want). That will make the models more concise, though potentially mistakes might be harder to spot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced by your argument that the intended way to use models is to inherit from all relevant interface classes every time. If that's too much boiler-plate, we should solve it everywhere and not just for strlen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants