Skip to content

Conversation

@SereneAnt
Copy link
Contributor

Validation is unexpectedly slow for large uniform arrays with simple json schema like below:

{
  "type": "array",
    "items": {
      "type": "string"
    }
}

For 100K string arrays validation lasts for about 2000 ms.
With such a workaround, the performance gain is 3X.
It can be improved twice more with lazy $path calculation on validation error.

Please note, the proposed fix is not the 'true' way - major refactoring with api changes is required,
it rather pinpoints the problem.

@shmax
Copy link
Collaborator

shmax commented Nov 9, 2016

The maintainers of this repo are fairly dormant these days. If you feel you can refactor the project in such a way as to meaningfully improve performance I don't think anyone would object, but none of the maintainers are likely to do it for you. As far as changing the API goes, I don't think small changes are necessarily a problem as long as they are well justified.

@bighappyface
Copy link
Collaborator

I see no BC issues here and the tests are all good.

I am +1. @shmax are you good with this?

@shmax
Copy link
Collaborator

shmax commented Nov 9, 2016

Well, I don't think he means this to be ready for merge--it's more of a demonstration of the core problem. Some of the concepts in this PR are sort of quick-and-dirty and not fully realized. For example, isSimpleSchema just checks to see if the type is "string", but what does that mean? Is a string type any simpler, than say, an integer type?

@SereneAnt do you intend to flesh this out into a full solution?

@SereneAnt
Copy link
Contributor Author

SereneAnt commented Nov 9, 2016

Sure, let me isolate bulk-optimized code into a separate class, say TypedItemsCollectionConstraint and apply the same approach for other primitive types.
I'll try to not change api drastically.

@bighappyface
Copy link
Collaborator

With API changes coming in this PR I will combine them with others from @shmax to roll another major release. In the meantime I am going to roll a patch release for the changes in #330

@shmax shmax mentioned this pull request Nov 10, 2016
@SereneAnt SereneAnt changed the title improve validation performance of long string arrays improve validation performance of long string, number and integer arrays Nov 16, 2016
@SereneAnt
Copy link
Contributor Author

Ready for merge.
No API changes made to keep things solid.

// just one type definition for the whole array
foreach ($value as $k => $v) {
$initErrors = $this->getErrors();
if (SimpleItemsConstraint::isSupported($schema)) {
Copy link
Collaborator

@shmax shmax Nov 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a little inconsistent with the rest of the codebase... the constraint classes don't generally have functional dependencies on each other outside of methods specified in ConstraintInterface ( eg. check() ). Is a new constraint class really necessary? What do you think about just doing everything you need to do inside of CollectionConstraint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, moved the stuff into CollectionConstraint.
The next big optimization would be lazy $path calculation.
For big arrays, it should boost performance up to 2x-3x.

* @param \stdClass $schema
* @return bool
*/
private function isSupported($schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSupported is a little vague. Can you think of a better function name that better describes what it does? Given that you only call it in one place, another option would be to just remove it and inline the code you have here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method has been inlined.

}
}

const OPTIMIZED_ITEM_TYPES = array('string', 'number', 'integer');
Copy link
Collaborator

@shmax shmax Nov 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to come up with some other way to encode these values, unfortunately; older versions of PHP don't allow assigning arrays to constants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* @param \stdClass $schema
* @return bool
*/
private function isOptimized($schema)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you're inlining it on line 63, you can remove this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops)
done

@shmax
Copy link
Collaborator

shmax commented Nov 27, 2016

LGTM

$this->assertTrue($validator->isValid(), print_r($validator->getErrors(), true));
}

private static function millis()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reserved for future use. This unit test looks rather like performance test.
No idea how to organize performance testing in PHP properly.
If you insist - I'll kill that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to add trait named PerformanceTest with a conditional output of running time to console. Since we support PHP 5.3 (no traits), I have no idea of a proper way of doing that.

@shmax shmax mentioned this pull request Dec 7, 2016
@bighappyface bighappyface merged commit 9da41b0 into jsonrainbow:master Dec 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants