Skip to content

CPP: Don't taint the return value of strlen #3592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion change-notes/1.25/analysis-cpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ The following changes in version 1.25 affect C/C++ analysis in all applications.
};
```
* The security pack taint tracking library (`semmle.code.cpp.security.TaintTracking`) now considers that equality checks may block the flow of taint. This results in fewer false positive results from queries that use this library.

* The length of a tainted string (such as the return value of a call to `strlen` or `strftime` with tainted parameters) is no longer itself considered tainted by the `models` library. This leads to fewer false positive results in queries that use any of our taint libraries.
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,7 @@ private predicate predictableInstruction(Instruction instr) {
* Note that the list itself is not very principled; it consists of all the
* functions listed in the old security library's [default] `isPureFunction`
* that have more than one argument, but are not in the old taint tracking
* library's `returnArgument` predicate. In addition, `strlen` is included
* because it's also a special case in flow to return values.
* library's `returnArgument` predicate.
*/
predicate predictableOnlyFlow(string name) {
name = "strcasestr" or
Expand All @@ -43,7 +42,6 @@ predicate predictableOnlyFlow(string name) {
name = "strchrnul" or
name = "strcmp" or
name = "strcspn" or
name = "strlen" or // special case
name = "strncmp" or
name = "strndup" or
name = "strnlen" or
Expand Down
59 changes: 47 additions & 12 deletions cpp/ql/src/semmle/code/cpp/models/implementations/Pure.qll
Original file line number Diff line number Diff line change
Expand Up @@ -20,26 +20,15 @@ class PureStrFunction extends AliasFunction, ArrayFunction, TaintFunction, SideE
name = "strpbrk" or
name = "strcmp" or
name = "strcspn" or
name = "strlen" or
name = "strncmp" or
name = "strnlen" or
name = "strrchr" or
name = "strspn" or
name = "strtod" or
name = "strtof" or
name = "strtol" or
name = "strtoll" or
name = "strtoq" or
name = "strtoul" or
name = "wcslen"
)
or
hasGlobalName(name) and
(
name = "_mbslen" or
name = "_mbslen_l" or
name = "_mbstrlen" or
name = "_mbstrlen_l"
name = "strtoul"
)
)
}
Expand Down Expand Up @@ -90,6 +79,52 @@ class PureStrFunction extends AliasFunction, ArrayFunction, TaintFunction, SideE
}
}

class StrLenFunction extends AliasFunction, ArrayFunction, SideEffectFunction {
StrLenFunction() {
exists(string name |
hasGlobalOrStdName(name) and
(
name = "strlen" or
name = "strnlen" or
name = "wcslen"
)
or
hasGlobalName(name) and
(
name = "_mbslen" or
name = "_mbslen_l" or
name = "_mbstrlen" or
name = "_mbstrlen_l"
)
)
}

override predicate hasArrayInput(int bufParam) {
getParameter(bufParam).getUnspecifiedType() instanceof PointerType
}

override predicate hasArrayWithNullTerminator(int bufParam) {
getParameter(bufParam).getUnspecifiedType() instanceof PointerType
}

override predicate parameterNeverEscapes(int i) {
getParameter(i).getUnspecifiedType() instanceof PointerType
}

override predicate parameterEscapesOnlyViaReturn(int i) { none() }

override predicate parameterIsAlwaysReturned(int i) { none() }

override predicate hasOnlySpecificReadSideEffects() { any() }

override predicate hasOnlySpecificWriteSideEffects() { any() }

override predicate hasSpecificReadSideEffect(ParameterIndex i, boolean buffer) {
getParameter(i).getUnspecifiedType() instanceof PointerType and
buffer = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see a good alternative to repeating all these predicates? As I understand it, all we want is to exclude strlen and friends from the taint rule. Could that be done with a subclass that just overrides the taint predicate to none()? It might require repeating part of the charpred, but maybe that can be pulled out to a private predicate or class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could certainly avoid the repetition with this strategy, but I think there will be a cost in comprehensibility. I consider the repeated predicates here to have a very low mental cost, because this kind of repetition is expected across the models library and the repeated predicate bodies themselves are either trivial or very simple. Like rows in a database.

If we do make this change, I'd definitely want to add the helper predicate for the StrLenFunction charpred, because otherwise we'd be solving repetition in one place by adding repetition in another. That means we'd be adding a helper predicate and complicating the class hierarchy. Probably slightly fewer lines of code, but arguably harder to read overall.

Another approach for improving things here might be to add helpful tools / defaults to the abstract classes in the model interfaces, so that this stuff can be expressed more concisely in the most common cases (e.g. having the three AliasFunction predicates default to none() so that you only need to override the one(s) you want). That will make the models more concise, though potentially mistakes might be harder to spot?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced by your argument that the intended way to use models is to inherit from all relevant interface classes every time. If that's too much boiler-plate, we should solve it everywhere and not just for strlen.

}

class PureFunction extends TaintFunction, SideEffectFunction {
PureFunction() {
exists(string name |
Expand Down
2 changes: 2 additions & 0 deletions cpp/ql/src/semmle/code/cpp/security/TaintTracking.qll
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
/*
* Support for tracking tainted data through the program.
*
* Prefer to use `semmle.code.cpp.dataflow.TaintTracking` when designing new queries.
*/

import semmle.code.cpp.ir.dataflow.DefaultTaintTracking
4 changes: 4 additions & 0 deletions cpp/ql/src/semmle/code/cpp/security/TaintTrackingImpl.qll
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
/**
* DEPRECATED: we now use `semmle.code.cpp.ir.dataflow.DefaultTaintTracking`,
* which is based on the IR but designed to behave similarly to this old
* libarary.
*
* Provides the implementation of `semmle.code.cpp.security.TaintTracking`. Do
* not import this file directly.
*/
Expand Down
21 changes: 21 additions & 0 deletions cpp/ql/test/library-tests/dataflow/taint-tests/format.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,24 @@ void test1()
sink(buffer); // tainted [NOT DETECTED]
}
}

// ----------

size_t strlen(const char *s);
size_t wcslen(const wchar_t *s);

void test2()
{
char *s = string::source();
wchar_t *ws = wstring::source();
int i;

sink(strlen(s));
sink(wcslen(ws));

i = strlen(s) + 1;
sink(i);

sink(s[strlen(s) - 1]); // tainted
sink(ws + (wcslen(ws) / 2)); // tainted
}
20 changes: 20 additions & 0 deletions cpp/ql/test/library-tests/dataflow/taint-tests/localTaint.expected
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,26 @@
| format.cpp:135:39:135:45 | ref arg & ... | format.cpp:135:40:135:45 | buffer [inner post update] | |
| format.cpp:135:39:135:45 | ref arg & ... | format.cpp:136:8:136:13 | buffer | |
| format.cpp:135:40:135:45 | buffer | format.cpp:135:39:135:45 | & ... | |
| format.cpp:147:12:147:25 | call to source | format.cpp:151:14:151:14 | s | |
| format.cpp:147:12:147:25 | call to source | format.cpp:154:13:154:13 | s | |
| format.cpp:147:12:147:25 | call to source | format.cpp:157:7:157:7 | s | |
| format.cpp:147:12:147:25 | call to source | format.cpp:157:16:157:16 | s | |
| format.cpp:148:16:148:30 | call to source | format.cpp:152:14:152:15 | ws | |
| format.cpp:148:16:148:30 | call to source | format.cpp:158:7:158:8 | ws | |
| format.cpp:148:16:148:30 | call to source | format.cpp:158:20:158:21 | ws | |
| format.cpp:154:6:154:11 | call to strlen | format.cpp:154:6:154:18 | ... + ... | TAINT |
| format.cpp:154:6:154:18 | ... + ... | format.cpp:154:2:154:18 | ... = ... | |
| format.cpp:154:6:154:18 | ... + ... | format.cpp:155:7:155:7 | i | |
| format.cpp:154:18:154:18 | 1 | format.cpp:154:6:154:18 | ... + ... | TAINT |
| format.cpp:157:7:157:7 | s | format.cpp:157:7:157:22 | access to array | TAINT |
| format.cpp:157:9:157:14 | call to strlen | format.cpp:157:9:157:21 | ... - ... | TAINT |
| format.cpp:157:9:157:21 | ... - ... | format.cpp:157:7:157:22 | access to array | TAINT |
| format.cpp:157:21:157:21 | 1 | format.cpp:157:9:157:21 | ... - ... | TAINT |
| format.cpp:158:7:158:8 | ws | format.cpp:158:7:158:27 | ... + ... | TAINT |
| format.cpp:158:7:158:27 | ref arg ... + ... | format.cpp:158:7:158:8 | ws [inner post update] | |
| format.cpp:158:13:158:18 | call to wcslen | format.cpp:158:13:158:26 | ... / ... | TAINT |
| format.cpp:158:13:158:26 | ... / ... | format.cpp:158:7:158:27 | ... + ... | TAINT |
| format.cpp:158:26:158:26 | 2 | format.cpp:158:13:158:26 | ... / ... | TAINT |
| stl.cpp:67:12:67:17 | call to source | stl.cpp:71:7:71:7 | a | |
| stl.cpp:68:16:68:20 | 123 | stl.cpp:68:16:68:21 | call to basic_string | TAINT |
| stl.cpp:68:16:68:21 | call to basic_string | stl.cpp:72:7:72:7 | b | |
Expand Down
2 changes: 2 additions & 0 deletions cpp/ql/test/library-tests/dataflow/taint-tests/taint.expected
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
| format.cpp:100:8:100:13 | buffer | format.cpp:99:30:99:43 | call to source |
| format.cpp:105:8:105:13 | buffer | format.cpp:104:31:104:45 | call to source |
| format.cpp:110:8:110:14 | wbuffer | format.cpp:109:38:109:52 | call to source |
| format.cpp:157:7:157:22 | access to array | format.cpp:147:12:147:25 | call to source |
| format.cpp:158:7:158:27 | ... + ... | format.cpp:148:16:148:30 | call to source |
| stl.cpp:71:7:71:7 | a | stl.cpp:67:12:67:17 | call to source |
| stl.cpp:73:7:73:7 | c | stl.cpp:69:16:69:21 | call to source |
| stl.cpp:75:9:75:13 | call to c_str | stl.cpp:69:16:69:21 | call to source |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
| format.cpp:157:7:157:22 | (int)... | format.cpp:147:12:147:25 | call to source |
| format.cpp:157:7:157:22 | access to array | format.cpp:147:12:147:25 | call to source |
| format.cpp:158:7:158:27 | ... + ... | format.cpp:148:16:148:30 | call to source |
| stl.cpp:71:7:71:7 | (const char *)... | stl.cpp:67:12:67:17 | call to source |
| stl.cpp:71:7:71:7 | a | stl.cpp:67:12:67:17 | call to source |
| taint.cpp:8:8:8:13 | clean1 | taint.cpp:4:27:4:33 | source1 |
Expand Down