Skip to content

Conversation

@lmerino-ep
Copy link
Contributor

@lmerino-ep lmerino-ep commented Aug 23, 2022

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This fixes the issue of incorrect characters showed on IPTC tags that contains non-English characters.

@CLAassistant
Copy link

CLAassistant commented Aug 23, 2022

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@brianpopow brianpopow self-assigned this Aug 23, 2022
Copy link
Collaborator

@brianpopow brianpopow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmerino-ep thanks for the contribution. I have added a test for making sure the envelope data is written when UTF-8 data is present. I think this is good to go into main now.

@lmerino-ep
Copy link
Contributor Author

lmerino-ep commented Aug 24, 2022 via email

/// <summary>
/// This value marks that UTF-8 encoding is used in application records.
/// </summary>
private static readonly byte[] CodedCharacterSetUtf8Value = { 0x1B, 0x25, 0x47 };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static readonly byte[] CodedCharacterSetUtf8Value = { 0x1B, 0x25, 0x47 };
// Uses C#'s optimization to refer to the data segment in the assembly direclty, no allocation occurs.
private static ReadOnlySpan<byte> CodedCharacterSetUtf8Value => new byte[]{ 0x1B, 0x25, 0x47 };

And update WriteRecord (L294) to take a ROS.

if (hasValuesInUtf8)
{
// Additional length for UTF-8 Tag.
length += 5 + CodedCharacterSetUtf8Value.Length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cool thing is that with the above suggestion the JIT will treat CodedCharacterSetUtf8Value.Length as constant (3) here.

Comment on lines 296 to 300
this.Data[offset++] = IptcTagMarkerByte;
this.Data[offset++] = (byte)recordNumber;
this.Data[offset++] = recordBinaryRepresentation;
this.Data[offset++] = (byte)(recordData.Length >> 8);
this.Data[offset++] = (byte)recordData.Length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the JIT needs to introduce a bound check for each access to Data. These can be avoided by

Span<byte> data = this.Data.AsSpan(offset, 5);
data[0] = IptcTagMarkerByte;
data[1] = recordNumber;
data[2] = recordBinaryRepresentation;
data[3] = (byte)(recordData.Length >> 8);
data[4] = (byte)recordData.Length;
offset += 5;

So if the AsSpan succeeds the JIT knows that all index accesses are within bounds, so no bounds check needs to be emitted (tested with .NET 6).

this.Data[offset++] = (byte)recordData.Length;
if (recordData.Length > 0)
{
Buffer.BlockCopy(recordData, 0, this.Data, offset, recordData.Length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will become

Suggested change
Buffer.BlockCopy(recordData, 0, this.Data, offset, recordData.Length);
recordData.CopyTo(this.Data.AsSpan(offset));

@brianpopow
Copy link
Collaborator

thanks @gfoidl, I have added your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPTC tags written on jpg files that contains non-English characters can't be correctly displayed on external apps

4 participants