- 
                Notifications
    
You must be signed in to change notification settings  - Fork 352
 
Re-implement select_object_content implementation #793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
24320bb    to
    f9f3a17      
    Compare
  
    f9f3a17    to
    97a861c      
    Compare
  
    | 
           PR is updated with further changes @Praveenrajmani @sinhaashish PTAL  | 
    
97a861c    to
    aedeea3      
    Compare
  
    | 
           Breaking for this  
  | 
    
| 
           Unicode characters should be inputs for python as   | 
    
          from minio import Minio
from minio.error import ResponseError
from minio.select.options import (SelectObjectOptions, CSVInput,
                                  JSONInput, RequestProgress,
                                  ParquetInput, InputSerialization,
                                  OutputSerialization, CSVOutput,
                                  JsonOutput)
from minio.select.errors import (SelectCRCValidationError, SelectMessageError)
client = Minio('s3.amazonaws.com',
               access_key='ACCESSKEY',
               secret_key='SECRETKEY')
options = SelectObjectOptions(
    expression="select * from s3object",
    input_serialization=InputSerialization(
        compression_type="GZIP",
        csv=CSVInput(FileHeaderInfo="USE",
                     RecordDelimiter="\n",
                     FieldDelimiter=u'╦',
                     QuoteCharacter='"',
                     QuoteEscapeCharacter='"',
                     Comments="#",
                     AllowQuotedRecordDelimiter="FALSE",
                     ),
        # If input is JSON
        # json=JSONInput(Type="DOCUMENT",)
        ),
    output_serialization=OutputSerialization(
        csv=CSVOutput(QuoteFields="ASNEEDED",
                      RecordDelimiter="\n",
                      FieldDelimiter=u'╦',
                      QuoteCharacter='"',
                      QuoteEscapeCharacter='"',)
        # json = JsonOutput(
        #     RecordDelimiter="\n",
        #     )
        ),
    request_progress=RequestProgress(
        enabled="False"
        )
    )
try:
    data = client.select_object_content('wlk-data-wbrp', '20190612-00690-1/wlk-wbrp-part-0000.csv.gz', options)
    # Get the records
    with open('my-record-file', 'w') as record_data:
        for d in data.stream(10*1024):
            record_data.write(d)
    # Get the stats
    print(data.stats())
except SelectMessageError as err:
    print(err)
except SelectCRCValidationError as err:
    print(err)
except ResponseError as err:
    print(err) | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with different inputs and LGTM ,
Just SelectSelectCRCValidationError -> SelectCRCValidationError in examples/select_object_content.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This change fixes multiple issues - handles unicode boundaries properly for special delimiters - handle zero payload 'Cont' event messages - handle error messages properly
        d6a8826
      
    aedeea3    to
    d6a8826      
    Compare
  
    
This change fixes multiple issues