Skip to content

Conversation

bashtage
Copy link
Contributor

Ensure that Stata 118 files always use utf-8 encoding

@codecov
Copy link

codecov bot commented Jun 1, 2018

Codecov Report

Merging #21279 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21279   +/-   ##
=======================================
  Coverage   91.85%   91.85%           
=======================================
  Files         153      153           
  Lines       49549    49549           
=======================================
  Hits        45512    45512           
  Misses       4037     4037
Flag Coverage Δ
#multiple 90.25% <ø> (ø) ⬆️
#single 41.87% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b32fdc4...ac8c2c2. Read the comment docs.

@bashtage bashtage force-pushed the stata-uft8-decode branch from bc2d45e to dcbbb32 Compare June 1, 2018 13:56
@pep8speaks
Copy link

pep8speaks commented Jun 1, 2018

Hello @bashtage! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 05, 2018 at 07:20 Hours UTC

@jschendel jschendel added Unicode Unicode strings IO Stata read_stata, to_stata labels Jun 1, 2018
@bashtage bashtage force-pushed the stata-uft8-decode branch 2 times, most recently from 461dd94 to 4bdb2c4 Compare June 2, 2018 09:51
~~~~~~~~~~~~

-
- :func:`read_stata` and :class:`StataReader` have deprecated the ``encoding`` parameter. Stata files only support a single encoding and so this input has no effect. (:issue:`21244`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you say it didn't have any effect before, but still, we should not introduce deprecation warnings in a bug fix release. So would either move this to 0.24.0 or split the PR

@bashtage bashtage force-pushed the stata-uft8-decode branch from 4bdb2c4 to c285003 Compare June 4, 2018 20:32
@bashtage
Copy link
Contributor Author

bashtage commented Jun 4, 2018

I removed the deprecation.

Ensure that Stata 118 files always use utf-8 encoding
@bashtage bashtage force-pushed the stata-uft8-decode branch from c285003 to ac8c2c2 Compare June 5, 2018 07:20
@jreback jreback modified the milestone: 0.23.1 Jun 5, 2018
@jreback
Copy link
Contributor

jreback commented Jun 5, 2018

@bashtage so the fix looks good. can your restore the encoding paramater then can merge into 0.23.1 (and then come back with another PR to deprecate it)? thxs

@jreback jreback added this to the 0.23.1 milestone Jun 5, 2018
@bashtage
Copy link
Contributor Author

bashtage commented Jun 6, 2018

I removed the deprecation warning but encoding is now a no-op, which is the correct behavior.

I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release.

@jorisvandenbossche
Copy link
Member

I'd rather not restore the (potentially) broken behavior where users are allowed to set an invalid encoding (e.g. latin-1 for more recent file formats), and this can wait to 24 if you think this is too much change for a bug fix release.

Yes, we should not remove the keyword for 0.23.1, even when it was a no-op, because that breaks people code who inadvertently used it. So I would add it back here, but then a next PR to deprecate it if you want.

@bashtage
Copy link
Contributor Author

bashtage commented Jun 6, 2018 via email

@jorisvandenbossche
Copy link
Member

Ah, sorry, missed that you only removed the internal passing through of the keyword. That's fine then!

@jorisvandenbossche jorisvandenbossche merged commit fbb47d6 into pandas-dev:master Jun 6, 2018
@jorisvandenbossche
Copy link
Member

@bashtage Thanks!

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Ensure that Stata 118 files always use utf-8 encoding
@bashtage bashtage deleted the stata-uft8-decode branch March 21, 2019 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_stata always uses 'utf8'
5 participants