Stop Redundantly Serializing ShardId in BulkShardResponse #56094

original-brownbear · 2020-05-04T07:03:17Z

When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the ShardId
over and over. This can waste a lot of memory when handling large bulk
requests.

When reading/writing the individual doc responses in the context of a bulk shard response there is no need to serialize the `ShardId` over and over. This can waste a lot of memory when handling large bulk requests.

elasticmachine · 2020-05-04T07:03:19Z

Pinging @elastic/es-distributed (:Distributed/CRUD)

DaveCTurner

Looks reasonable, nice find. I left some hopefully-small suggestions.

server/src/main/java/org/elasticsearch/action/bulk/BulkItemResponse.java

DaveCTurner · 2020-05-04T11:02:30Z

server/src/main/java/org/elasticsearch/action/bulk/BulkItemResponse.java

+                out.writeByte((byte) 1);
+            } else if (response instanceof UpdateResponse) {
+                out.writeByte((byte) 3); // make 3 instead of 2, because 2 is already in use for 'no responses'
+            }


Sure, added that check :)

DaveCTurner · 2020-05-04T11:05:33Z

server/src/main/java/org/elasticsearch/action/DocWriteResponse.java

+
    // needed for deserialization
    protected DocWriteResponse(StreamInput in) throws IOException {
        super(in);


Can we assert anything about in.getVersion() here? Not looked in detail, but I hope that this constructor will become obsolete after this change is complete.

Not quite I think. We still have TransportIndexAction which still uses and logically needs the full deserialization on an IndexResponse as far as I can tell and won't go away in 8 right?

Can't we get rid of that yet? It was deprecated over 3 years ago.

Right in 8 we can ... opening a PR for that sec

Urgh nevermind, technically we can remove this now but it's a far from trivial change to do so (we're still using that action all over the place in tests). I don't think I can do that in the short-term.

Ok, can we mention IndexAction in this method's Javadoc so we keep track of the dependency.

server/src/main/java/org/elasticsearch/action/DocWriteResponse.java

DaveCTurner · 2020-05-04T11:15:12Z

server/src/main/java/org/elasticsearch/action/bulk/BulkItemResponse.java

        }
    }
+
+    public void writeThin(StreamOutput out) throws IOException {


This seems to leave BulkItemResponse#writeTo() as dead code that mostly duplicates this. Does that mean that we don't need BulkItemResponse implements Writeable? If so, can you remove that and rename these methods more appropriately?

We're still using it in org.elasticsearch.action.bulk.BulkItemRequest#writeTo (though now that I looked over that, we might not have to ever write the ShardId for that one as well logically because it's always part of a BulkShardRequest which never has null for the ShardId).
To me it seems easier to leave it Writable for now so we have the symmetry with the constructor and rename (and not make it a Writable) after working out how to not write the shard id in that last spot as well. (that's easier to do after we clean up the request side of things in #56092 )

…lization-replication-response

DaveCTurner

LGTM, one minor request for a comment but this doesn't need another look.

edit: #56094 (comment) is the request I meant.

original-brownbear · 2020-05-04T14:00:01Z

Thanks David!

) When reading/writing the individual doc responses in the context of a bulk shard response there is no need to serialize the `ShardId` over and over. This can waste a lot of memory when handling large bulk requests.

…56866) When reading/writing the individual doc responses in the context of a bulk shard response there is no need to serialize the `ShardId` over and over. This can waste a lot of memory when handling large bulk requests.

Just like #56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire

Just like elastic#56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire

Just like #56094 but for the request side. Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id. Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately: * 8 bytes for the `ShardId` object (itself + one field) * + another 4 bytes for the `int` in the `ShardId` * 16 bytes (two fields + the instance itself + the padding) for the `Index` object * + 30 bytes for the `Index` uuid string * + all the bytes in the index name string => 60+ bytes per bulk request item saved on heap and over the wire

Stop Redundantly Serializing ShardId in BulkShardResponse

3f7b8e6

When reading/writing the individual doc responses in the context of a bulk shard response there is no need to serialize the `ShardId` over and over. This can waste a lot of memory when handling large bulk requests.

original-brownbear added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.8.0 labels May 4, 2020

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020

original-brownbear requested review from DaveCTurner and henningandersen May 4, 2020 09:18

DaveCTurner reviewed May 4, 2020

View reviewed changes

original-brownbear added 3 commits May 4, 2020 13:39

Merge remote-tracking branch 'elastic/master' into more-compact-seria…

c70fb36

…lization-replication-response

CR: comments

9b083a7

rename

478595f

original-brownbear requested a review from DaveCTurner May 4, 2020 12:32

Merge remote-tracking branch 'elastic/master' into more-compact-seria…

8e2e096

…lization-replication-response

DaveCTurner approved these changes May 4, 2020

View reviewed changes

javadoc with link

328e078

original-brownbear added the backport pending label May 4, 2020

original-brownbear merged commit c0ee6d2 into elastic:master May 4, 2020

original-brownbear deleted the more-compact-serialization-replication-response branch May 4, 2020 14:00

This was referenced May 6, 2020

Stop Redundantly Serializing Index Name in ReplicationRequest #56092

Closed

Save Shard ID Serializations in Bulk Requests #56209

Merged

original-brownbear mentioned this pull request May 11, 2020

Deduplicate Strings in REST Bulk Request Parsing #56506

Merged

original-brownbear added v7.9.0 and removed v7.8.0 backport pending labels May 17, 2020

original-brownbear mentioned this pull request May 17, 2020

Stop Redundantly Serializing ShardId in BulkShardResponse (#56094) #56866

Merged

original-brownbear mentioned this pull request Jun 23, 2020

Save Shard ID Serializations in Bulk Requests (#56209) #58414

Merged

breskeby mentioned this pull request Jun 23, 2020

Remove minimumRuntime sourceSet from build-tools #58417

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Stop Redundantly Serializing ShardId in BulkShardResponse #56094

Stop Redundantly Serializing ShardId in BulkShardResponse #56094

Uh oh!

Conversation

original-brownbear commented May 4, 2020

Uh oh!

elasticmachine commented May 4, 2020

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DaveCTurner left a comment •

edited

Loading