-
Notifications
You must be signed in to change notification settings - Fork 363
improve validation performance of long string, number and integer arrays #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The maintainers of this repo are fairly dormant these days. If you feel you can refactor the project in such a way as to meaningfully improve performance I don't think anyone would object, but none of the maintainers are likely to do it for you. As far as changing the API goes, I don't think small changes are necessarily a problem as long as they are well justified. |
|
I see no BC issues here and the tests are all good. I am +1. @shmax are you good with this? |
|
Well, I don't think he means this to be ready for merge--it's more of a demonstration of the core problem. Some of the concepts in this PR are sort of quick-and-dirty and not fully realized. For example, @SereneAnt do you intend to flesh this out into a full solution? |
|
Sure, let me isolate bulk-optimized code into a separate class, say TypedItemsCollectionConstraint and apply the same approach for other primitive types. |
|
Ready for merge. |
| // just one type definition for the whole array | ||
| foreach ($value as $k => $v) { | ||
| $initErrors = $this->getErrors(); | ||
| if (SimpleItemsConstraint::isSupported($schema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a little inconsistent with the rest of the codebase... the constraint classes don't generally have functional dependencies on each other outside of methods specified in ConstraintInterface ( eg. check() ). Is a new constraint class really necessary? What do you think about just doing everything you need to do inside of CollectionConstraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, moved the stuff into CollectionConstraint.
The next big optimization would be lazy $path calculation.
For big arrays, it should boost performance up to 2x-3x.
| * @param \stdClass $schema | ||
| * @return bool | ||
| */ | ||
| private function isSupported($schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSupported is a little vague. Can you think of a better function name that better describes what it does? Given that you only call it in one place, another option would be to just remove it and inline the code you have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method has been inlined.
| } | ||
| } | ||
|
|
||
| const OPTIMIZED_ITEM_TYPES = array('string', 'number', 'integer'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to come up with some other way to encode these values, unfortunately; older versions of PHP don't allow assigning arrays to constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| * @param \stdClass $schema | ||
| * @return bool | ||
| */ | ||
| private function isOptimized($schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you're inlining it on line 63, you can remove this, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops)
done
|
LGTM |
| $this->assertTrue($validator->isValid(), print_r($validator->getErrors(), true)); | ||
| } | ||
|
|
||
| private static function millis() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reserved for future use. This unit test looks rather like performance test.
No idea how to organize performance testing in PHP properly.
If you insist - I'll kill that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to add trait named PerformanceTest with a conditional output of running time to console. Since we support PHP 5.3 (no traits), I have no idea of a proper way of doing that.
Validation is unexpectedly slow for large uniform arrays with simple json schema like below:
For 100K string arrays validation lasts for about 2000 ms.
With such a workaround, the performance gain is 3X.
It can be improved twice more with lazy $path calculation on validation error.
Please note, the proposed fix is not the 'true' way - major refactoring with api changes is required,
it rather pinpoints the problem.