Skip to content

Conversation

@kojiromike
Copy link
Contributor

@kojiromike kojiromike commented Dec 16, 2019

Jira

  • My PR addresses AVRO-2656
  • ...and references it in the PR title.
  • My PR does not add any new dependencies.

Tests

  • My PR enhances the existing tests and test kit so that we test in both python2.7 and python3.5.

Commits

Documentation

  • My PR does not change the documentation, because it is already too big; however, the python guides will need to account for python3 support...

@kojiromike kojiromike self-assigned this Dec 16, 2019
@kojiromike kojiromike force-pushed the AVRO-2656/python3 branch 2 times, most recently from d164d04 to dda2d4c Compare December 16, 2019 00:59
@kojiromike
Copy link
Contributor Author

kojiromike commented Dec 16, 2019

Avro lang/py passes tests on all these Python versions, when I run it locally:

$ TOX_PARALLEL_NO_SPINNER=1 time tox -p 2
GLOB sdist-make: /Users/michael/dev/avro/lang/py/setup.py
✔ OK py27 in 11.725 seconds
✔ OK pypy2 in 15.047 seconds
✔ OK py35 in 11.628 seconds
✔ OK py36 in 11.143 seconds
✔ OK py37 in 11.19 seconds
✔ OK py38 in 11.012 seconds
✔ OK pypy3 in 17.517 seconds
_________________________________________________________________________________________________ summary __________________________________________________________________________________________________
  py27: commands succeeded
  pypy2: commands succeeded
  py35: commands succeeded
  py36: commands succeeded
  py37: commands succeeded
  py38: commands succeeded
  pypy3: commands succeeded
  congratulations :)
       52.78 real        34.03 user         8.64 sys

It's quite fast once tox caches all the different versions' pip install packages. Installing the packages fresh adds about a minute for each one.

@kojiromike kojiromike force-pushed the AVRO-2656/python3 branch 2 times, most recently from 2ac1716 to 20dbf70 Compare December 19, 2019 13:12
"""Mixin for methods common to both reading and writing."""

block_count = 0
_meta = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be class variables? There's a self.block_count in one of the subclasses as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean versus instance variables? It doesn't really make a difference for most usage.

>>> class Foo(object):
...   bar = 123
...   def m(self):
...     self.bar = 'abc'
... 
>>> f = Foo()
>>> f.bar
123
>>> g = Foo()
>>> g.bar
123
>>> f.m()
>>> f.bar
'abc'
>>> g.bar
123

Here, setting block_count as a constant class variable just makes it a default for the instance. Things do get gnarly if you use a mutable type here, but mutable default types are to be avoided in any case.

@kojiromike kojiromike merged commit 4cd778a into apache:master Jan 4, 2020
@kojiromike kojiromike deleted the AVRO-2656/python3 branch January 4, 2020 18:45
@kojiromike
Copy link
Contributor Author

In #251 @justinfx asked

Question about how this will be released... Does this PR cause py2 user facing string values to become unicode objects in any cases? And do any of the apis calls now require a Unicode type as opposed to just any basestring? If so, will this be released as major version 2 to account for potentially breaking changes?

Python 2 is officially expired, so the avro project has to decide as a whole how much effort to put into continuing support if a bug comes up that is mainly due to differences in Python 2/3 string handling. My main goal here was to consolidate support into a single lang/py, the one with the more consistent, longer-tenured and pythonic api.

For most changes, the “ruling” under-api is that of python’s own json lib. A schema string has to be Unicode because that’s what json.loads requires. I don’t think that will result in any surprises for the lingering python2 users, and this change will be the first time python3 users can use avro at all from this api, so they shouldn’t have any held-over expectations to violate (at least as far as string types).

I would really love some help testing this code against real-world data in Python 3. Please open bugs and report your findings.

As for the release process, I need to discuss that with @cutting and @nielsbasjes and others, since this is a significant change.

@sekikn
Copy link
Contributor

sekikn commented Jan 7, 2020

Great work @kojiromike! I've just filed a few minor issues I've found:
https://issues.apache.org/jira/browse/AVRO-2675
https://issues.apache.org/jira/browse/AVRO-2676

ecopoesis pushed a commit to ecopoesis/avro that referenced this pull request Jan 8, 2020
* AVRO-2656: Python3 Support

* AVRO-2656: Consolidate DataFile Code

* AVRO-2656: Must Seek after Truncate

* AVRO-2656: Fix Protocol Unicode

* AVRO-2656: Add Additional Trove Classifiers
@julianjk
Copy link

Looking forward to having this available! @kojiromike , any updates to share on releasing these changes?

@julianjk
Copy link

@cutting , @nielsbasjes as you were mentioned to be involved in the discussion of the release process for these changes -- are there any updates or an ETA to share for these changes being released?

@kojiromike
Copy link
Contributor Author

@julianjk I am not sure. These changes are in master now, but they did not go out with 1.9.2, as they are too much for a point release. Therefore I surmise they will go out with 1.10.0.

@Fokko
Copy link
Contributor

Fokko commented Feb 17, 2020

1.10.0 is planned around May 2020: https://lists.apache.org/thread.html/rb9693e90a8141b2c9f0f9c901c488a079fa6245b2e4d475e022ab1e8%40%3Cdev.avro.apache.org%3E

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants