Skip to content

Conversation

Mr0grog
Copy link
Member

@Mr0grog Mr0grog commented Jan 24, 2025

Some recent additions to our upload script (edgi-govdata-archiving/web-monitoring-processing#855) are now storing response bodies in S3 with the content type binary/octet-stream. This isn’t a valid media type (it should be application/octet-stream), and this appears to be a change in boto3 (the AWS SDK) or maybe a difference between it and AWS SDKs for other languages. Regardless, we now have data stored this way and we should handle it the same as application/octet-stream (essentially: this content-type tells us nothing one way or the other, so ignore it).

Obviously we should fix the upload script, too, but that is a secondary concern vs. actual data we have stored.

Some recent additions to our upload script are now storing response bodies in S3 with the content type `binary/octet-stream`. This appears to be a change in boto3 (the AWS SDK), where it is now using that as a generic content type instead of `application/octet-stream`. This new type is not actually valid; I'm not sure why they're doing it, but this is functionally the same, and should not cause us to consider something as "definitely not HTML".

(Obviously we should fix the upload script, too, but that is a secondary concern.)
@Mr0grog Mr0grog merged commit e8c603c into main Jan 24, 2025
7 checks passed
@Mr0grog Mr0grog deleted the hotfix-boto-got-a-little-weird-with-content-types branch January 24, 2025 04:23
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-ops that referenced this pull request Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant