Skip to content

Conversation

@original-brownbear
Copy link
Contributor

When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the ShardId
over and over. This can waste a lot of memory when handling large bulk
requests.

When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the `ShardId`
over and over. This can waste a lot of memory when handling large bulk
requests.
@original-brownbear original-brownbear added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.8.0 labels May 4, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, nice find. I left some hopefully-small suggestions.

out.writeByte((byte) 1);
} else if (response instanceof UpdateResponse) {
out.writeByte((byte) 3); // make 3 instead of 2, because 2 is already in use for 'no responses'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added that check :)


// needed for deserialization
protected DocWriteResponse(StreamInput in) throws IOException {
super(in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert anything about in.getVersion() here? Not looked in detail, but I hope that this constructor will become obsolete after this change is complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite I think. We still have TransportIndexAction which still uses and logically needs the full deserialization on an IndexResponse as far as I can tell and won't go away in 8 right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we get rid of that yet? It was deprecated over 3 years ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right in 8 we can ... opening a PR for that sec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh nevermind, technically we can remove this now but it's a far from trivial change to do so (we're still using that action all over the place in tests). I don't think I can do that in the short-term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, can we mention IndexAction in this method's Javadoc so we keep track of the dependency.

}
}

public void writeThin(StreamOutput out) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to leave BulkItemResponse#writeTo() as dead code that mostly duplicates this. Does that mean that we don't need BulkItemResponse implements Writeable? If so, can you remove that and rename these methods more appropriately?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still using it in org.elasticsearch.action.bulk.BulkItemRequest#writeTo (though now that I looked over that, we might not have to ever write the ShardId for that one as well logically because it's always part of a BulkShardRequest which never has null for the ShardId).
To me it seems easier to leave it Writable for now so we have the symmetry with the constructor and rename (and not make it a Writable) after working out how to not write the shard id in that last spot as well. (that's easier to do after we clean up the request side of things in #56092 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor request for a comment but this doesn't need another look.

edit: #56094 (comment) is the request I meant.

@original-brownbear
Copy link
Contributor Author

Thanks David!

@original-brownbear original-brownbear merged commit c0ee6d2 into elastic:master May 4, 2020
@original-brownbear original-brownbear deleted the more-compact-serialization-replication-response branch May 4, 2020 14:00
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request May 17, 2020
)

When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the `ShardId`
over and over. This can waste a lot of memory when handling large bulk
requests.
original-brownbear added a commit that referenced this pull request May 17, 2020
…56866)

When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the `ShardId`
over and over. This can waste a lot of memory when handling large bulk
requests.
original-brownbear added a commit that referenced this pull request Jun 23, 2020
Just like #56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.

Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:

* 8 bytes for the `ShardId` object (itself + one field)
   * + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
   * + 30 bytes for the `Index` uuid string
   * + all the bytes in the index name string

=> 60+ bytes per bulk request item saved on heap and over the wire
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jun 23, 2020
Just like elastic#56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.

Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:

* 8 bytes for the `ShardId` object (itself + one field)
   * + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
   * + 30 bytes for the `Index` uuid string
   * + all the bytes in the index name string

=> 60+ bytes per bulk request item saved on heap and over the wire
original-brownbear added a commit that referenced this pull request Jun 23, 2020
Just like #56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.

Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:

* 8 bytes for the `ShardId` object (itself + one field)
   * + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
   * + 30 bytes for the `Index` uuid string
   * + all the bytes in the index name string

=> 60+ bytes per bulk request item saved on heap and over the wire
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants