-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Stop Redundantly Serializing ShardId in BulkShardResponse #56094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop Redundantly Serializing ShardId in BulkShardResponse #56094
Conversation
When reading/writing the individual doc responses in the context of a bulk shard response there is no need to serialize the `ShardId` over and over. This can waste a lot of memory when handling large bulk requests.
|
Pinging @elastic/es-distributed (:Distributed/CRUD) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, nice find. I left some hopefully-small suggestions.
server/src/main/java/org/elasticsearch/action/bulk/BulkItemResponse.java
Show resolved
Hide resolved
| out.writeByte((byte) 1); | ||
| } else if (response instanceof UpdateResponse) { | ||
| out.writeByte((byte) 3); // make 3 instead of 2, because 2 is already in use for 'no responses' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added that check :)
|
|
||
| // needed for deserialization | ||
| protected DocWriteResponse(StreamInput in) throws IOException { | ||
| super(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert anything about in.getVersion() here? Not looked in detail, but I hope that this constructor will become obsolete after this change is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite I think. We still have TransportIndexAction which still uses and logically needs the full deserialization on an IndexResponse as far as I can tell and won't go away in 8 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we get rid of that yet? It was deprecated over 3 years ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right in 8 we can ... opening a PR for that sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh nevermind, technically we can remove this now but it's a far from trivial change to do so (we're still using that action all over the place in tests). I don't think I can do that in the short-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, can we mention IndexAction in this method's Javadoc so we keep track of the dependency.
server/src/main/java/org/elasticsearch/action/DocWriteResponse.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| public void writeThin(StreamOutput out) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to leave BulkItemResponse#writeTo() as dead code that mostly duplicates this. Does that mean that we don't need BulkItemResponse implements Writeable? If so, can you remove that and rename these methods more appropriately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still using it in org.elasticsearch.action.bulk.BulkItemRequest#writeTo (though now that I looked over that, we might not have to ever write the ShardId for that one as well logically because it's always part of a BulkShardRequest which never has null for the ShardId).
To me it seems easier to leave it Writable for now so we have the symmetry with the constructor and rename (and not make it a Writable) after working out how to not write the shard id in that last spot as well. (that's easier to do after we clean up the request side of things in #56092 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
…lization-replication-response
…lization-replication-response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one minor request for a comment but this doesn't need another look.
edit: #56094 (comment) is the request I meant.
|
Thanks David! |
…56866) When reading/writing the individual doc responses in the context of a bulk shard response there is no need to serialize the `ShardId` over and over. This can waste a lot of memory when handling large bulk requests.
Just like #56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire
Just like elastic#56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire
Just like #56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire
When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the
ShardIdover and over. This can waste a lot of memory when handling large bulk
requests.