-
Notifications
You must be signed in to change notification settings - Fork 352
Implement select feature #772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fcae9ef to
44bc707
Compare
74cde71 to
4f2e49b
Compare
295e0a9 to
6a8bddf
Compare
9579def to
0a56f7f
Compare
Added a functional test |
harshavardhana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some formatting
|
@sinhaashish one more issue ~ python --version
Python 2.7.16
~ python select_object_content.py
Traceback (most recent call last):
File "select_object_content.py", line 70, in <module>
for d in data.stream(32*1024):
File "/home/harsha/repos/minio-py/minio/select_object_reader.py", line 289, in stream
x = self.read(num_bytes)
File "/home/harsha/repos/minio-py/minio/select_object_reader.py", line 269, in read
res = self.extract_message()
File "/home/harsha/repos/minio-py/minio/select_object_reader.py", line 130, in extract_message
if total_byte_parsed + byte_int(total_byte_length) > read_buffer:
File "/home/harsha/repos/minio-py/minio/select_object_reader.py", line 56, in byte_int
return int.from_bytes(data_bytes, byteorder='big')
AttributeError: type object 'int' has no attribute 'from_bytes' |
|
Also, @sinhaashish found another major performance problem in calculate_crc() function CRC32() is being initialized again and again. Rather than using PyCRC it is better to use the standard library function
diff --git a/minio/select_object_reader.py b/minio/select_object_reader.py
index 6c3d2ce..21e675a 100644
--- a/minio/select_object_reader.py
+++ b/minio/select_object_reader.py
@@ -15,8 +15,11 @@
# limitations under the License.
+from __future__ import unicode_literals
import io
-from PyCRC.CRC32 import CRC32
+import codecs
+
+from binascii import crc32
from xml.etree import cElementTree
from .error import InvalidXMLError
from xml.etree.cElementTree import ParseError
@@ -30,13 +33,11 @@ class CRCValidationError(Exception):
Raised in case of CRC mismatch
'''
-
def calcuate_crc(value):
'''
- Returns the CRC using PyCRC
+ Returns the CRC using crc32
'''
- return CRC32().calculate(value)
-
+ return crc32(value) & 0xffffffff
def validate_crc(current_value, expected_value):
'''
@@ -53,8 +54,7 @@ def byte_int(data_bytes):
'''
Convert bytes to big-endian integer
'''
- return int.from_bytes(data_bytes, byteorder='big')
-
+ return int(codecs.encode(data_bytes, 'hex'), 16)
class SelectObjectReader(object):
"""
@@ -291,4 +291,4 @@ class SelectObjectReader(object):
break
elif len(x) < num_bytes:
x += self.read(num_bytes-len(x))
- yield x.decode("utf-8")
+ yield str(x) if isinstance(x, bytearray) else x
diff --git a/setup.py b/setup.py
index 3d554ea..739c4ea 100644
--- a/setup.py
+++ b/setup.py
@@ -45,7 +45,6 @@ requires = [
'pytz',
'certifi',
'python-dateutil',
- 'PyCRC',
]
tests_requires = [ |
eb39c48 to
c29e76a
Compare
harshavardhana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for your patience.
Thanks for your much valued assistance. |
|
|
vadmeste
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
docs/API.md
Outdated
| |:---|:---|:---| | ||
| |``bucket_name`` |_string_ |Name of the bucket. | | ||
| |``object_name`` |_string_ |Name of the object. | | ||
| |``options`` | _Object_ | Query Options | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Object/SelectObjectOptions/ ?
docs/API.md
Outdated
|
|
||
| |Param |Type |Description | | ||
| |:---|:---|:---| | ||
| |``obj``|_Object_ |Select_object_reader object. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Object/SelectObjectReader/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
docs/API.md
Outdated
| record_data.write(d) | ||
|
|
||
| # Get the stats | ||
| print(data.stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/stats/stats()/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
minio/api.py
Outdated
| # Select Object Content | ||
| def select_object_content(self, bucket_name, object_name, opts): | ||
| """ | ||
| It filters the contents of an object based on a simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better comment would be Executes SQL requests on objects having data in CSV, JSON or Parquet formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
minio/select_object_reader.py
Outdated
| if len(chunked_message) == 0: | ||
| self.close() | ||
| return b'' | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can gain an indentation level here since there is a return just before else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Praveenrajmani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few comments. LGTM otherwise
| log_output = LogOutput(client.copy_object, 'test_copy_object_no_copy_condition') | ||
| test_copy_object_no_copy_condition(client, log_output) | ||
|
|
||
| if sys.version_info.major == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| log_output = LogOutput(client.get_bucket_notification, 'test_get_bucket_notification') | ||
| test_get_bucket_notification(client, log_output) | ||
|
|
||
| if sys.version_info.major == 3: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment on why we running on 3+ only ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
@sinhaashish not sure what I am missing here, but the example with AWS S3 doesn't work for me: The error is: |
|
@vadmeste you can comment |
3551173
|
@vadmeste removing |
|
@sinhaashish that is just spark select implementation limitation, the problem is XML marshalling bug on our end in minio-py, we are sending an incorrect XML to AWS S3 |
Selects and filters out data from stored object. Client for select feature
|
@vadmeste the code runs with S3 now. PTAL |
|
False ascertainment |
vadmeste
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM & tested
Selects and filters out data from stored object.
Client implementation for select feature
Closes #762