GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. If nothing happens, download GitHub Desktop and try again. If nothing happens, download Xcode and try again. If nothing happens, download the GitHub extension for Visual Studio and try again.
You can also pass a custom S3 client. For example if you want to zip files from a S3 compatible storage:. Example of s3-zip in combination with AWS Lambda. We use archiver to create archives. To pass your options to it, use setArchiverOptions method:. You can pass an array of objects with type EntryData to organize your archive. Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Change your preferences any time. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. While they provide ways to download and deal with files from S3 and GZipped files respectively, these do not help in dealing with a GZipped file located in S3.
Serving Compressed Files
How would I do this? However, I cannot implement the example in the second question given above because the file is not located locally, it requires downloading from S3.
I wasn't quite looking for this issue but I did feel like improving the quality of this thread by actually explaining why the already provided solution works.
No it's not because of the Scanner as is suggested. It's because the stream is being ungzipped by wrapping fileObj.
Learn more. How to download GZip file from S3? Ask Question. Asked 4 years, 9 months ago. Active 11 months ago. Viewed 9k times. What should I do? Why can't you ungzip it and then read it to a file? Have you tried getting the S3Object, wrapping it in an input data stream, wrapping that in a Gzip stream, and then writing it out to a file? Active Oldest Votes. I solved the issue using a Scanner instead of an InputStream.
I had the same issue before that I fixed a week or so ago, I put it into an gzipinput stream and read from it with a buffered reader. Really, that worked for you? Will it decompress the file in memory or on s3?Python Django Tutorial: Full-Featured Web App Part 13 - Using AWS S3 for File Uploads
Sargurunathan Balasubramanian Sargurunathan Balasubramanian 11 2 2 bronze badges. Moinuddin Quadri Sebastian Gozin Sebastian Gozin 9 1 1 bronze badge. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password.
Post as a guest Name. Email Required, but never shown. The Overflow Blog. Featured on Meta. Feedback on Q2 Community Roadmap. Technical site integration observational experiment live on Stack Overflow.Amazon Announcement.
This will produce production. Remove the.
The advantages are:. Just remember: If you make a change to a file that is cached in CloudFront, make sure you invalidate the cache after making this type of change. I've been looking for ways of making my site load faster and one way that I'd like to explore is making greater use of Cloudfront. Because Cloudfront was originally not designed as a custom-origin CDN and because it didn't support gzipping, I have so far been using it to host all my images, which are referenced by their Cloudfront cname in my site code, and optimized with far-futures headers.
There were workarounds to solve this issue, but essentially these didn't work. Now, it seems Amazon Cloudfront supports custom origin, and that it is now possible to use the standard HTTP Accept-Encoding method for serving gzipped content if you are using a Custom Origin [ link ]. I haven't so far been able to implement the new feature on my server. The blog post I linked to above, which is the only one I found detailing the change, seems to imply that you can only enable gzipping bar workarounds, which I don't want to useif you opt for custom origin, which I'd rather not: I find it simpler to host the coresponding fileds on my Cloudfront server, and link to them from there.
Despite carefully reading the documentation, I don't know:. Cloudfront connects to your server via HTTP 1. By default some webservers, including nginx, dosn't serve gzipped content to HTTP 1. This does have a side effect of making keep-alive connections not work for HTTP 1. Serving content that is gzipped on the fly through Amazon cloud front is dangerous and probably shouldn't be done. Basically if your webserver is gzipping the content, it will not set a Content-Length and instead send the data as chunked.
If the connection between Cloudfront and your server is interrupted and prematurely severed, Cloudfront still caches the partial result and serves that as the cached version until it expires.
The accepted answer of gzipping it first on disk and then serving the gzipped version is a better idea as Nginx will be able to set the Content-Length header, and so Cloudfront will discard truncated versions. We've made a few optimisations for uSwitch.
S3, however, does not support gzip compression by default and instead static websites can be used with cloudfront to enable gzip compression. Cloudfront is not expensive and can be used but there is also a workaround to serve gzipped file from S3.
To serve gzipped file from S3, the file needs to be compressed using gzip compression and the content-encoding metadata for the S3 object needs to be set to gzip. This will allow serving the file in gzipped format. Most modern browsers support gzip compression and this should be no problem, however, if the user uses older browsers that do not support gzip compression, then the user will not receive the content. With webservers like IISthe web server checks the Accept-Encoding request header and if present and the value is gzipthen the web server compresses and serves the content to the user.
If the Accept-Encoding request header is not present, then the uncompressed data is sent back. This is also somewhat true for Cloudfront, if the Accept-Encoding request header is present then Cloudfront compresses the content and then serves it.
Already on GitHub? Sign in to your account. Add a Content-Encoding header field for each compressed file and set the field value to gzip. It would be nice if s3cmd sync had an option to do this automatically, as uploading compressed files is the only way to serve compressed content from S3.
I intentionally removed the Content-Encoding header we used to send if the content was compressed with gzip already.
This was causing additional problems - one could not store and retrieve a compressed file in compressed encoding, S3's web servers would helpfully decompress the object during transit when we wanted to receive it in compressed encoding.
I'd be open to a more thoughtful solution that allows both behaviours: ability to download a compressed file in compressed format, and ability to download a compressed file in uncompressed format. We had neither correct before. I don't understand what you're saying. Adding the Content-Encoding property to an already uploaded archive clearly does not prevent uploading the archive.
Similarly, adding the property does not prevent downloading the archive in compressed form, as you can verify by following my use case:.
Here is an example archive, served via S3and via CloudFront. My request is simply for, say, s3cmd sync --guess-content-encoding to automatically set Content-Encoding for every uploaded file with a. Instead of s3cmd detecting gzip encoded content as what I think this issue asks for, couldn't it have a filter to gzip the content whilst uploading it and adding the header?
I think pretty much all browsers support gzip now. Pretty much all browsers support gzip, but not all of them, and other useful tools like curl don't sorry it or don't support it by default. The main reason to have s3cmd support content encoding, is to support cloudfront serving of gzip compressed content, and that requires both the original content and the gz files. The need to call s3cmd twice once for uncompressed files, once for compressed files means two CloudFront invalidations are generated.
There can be at most three CF invalidations in-flight, which means re-running my Makefile too quickly will cause one of the s3cmd calls to fail — hence the s3cmd … fallback.
There also appears to be a bug with s3cmd not invalidating the root index. From what I researched, all browsers support Gzip decoding, so I am not sure it's worth the effort with your dual approach which is confusing because IIUC you have to rewrite paths based on the browser header, which is a huge PITA because you need some dynamic component. Anyway, back to my proposal, doesn't --gzip-include sound reasonable? So I rather just have s3cmd do it in one step.
I'd also like to request this feature. Currently, when I specify --header Content-Encoding: gzipin S3 it sets: x-amz-meta-content-encoding: gzip instead. Chrome still unpacks and renders the file, but FF and Safari display garbled text.
In my case I don't need to specify the headers to upload gzip content. I upload everything with s3cmd:. Skip to content. Dismiss Join GitHub today GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together. Sign up. New issue. Jump to bottom. Copy link Quote reply. This comment has been minimized. Sign in to view.If you are sharing your website static content using Amazon Web Services, you can gzip your content and save some bandwidth.
The size of your files will be reduced and your website will load faster for your users. If you have a static website in Amazon, you should really consider using CloudFront since it is very cheap and is a CDN service.
However, this guide is not a tutorial of how to configure your website to use CloudFront or S3 because there are already many good tutorials for that. Since CloudFront uses an Origin Server to find your files and to distribute them among their servers, what I want to show here is that all you need to serve gzipped content in CloudFront is serving gzipped content at your Origin Server.
In a Custom Origin Server, like an IIS or Apache server, this is very easy to accomplish because you have just to set some configurations and you're ready to serve gzipped content. However, if your Origin Server is Amazon S3, you need to manually gzip each content and I will show how this is done. You can read more about it in this Better Explained guide, but if you don't want, look those following extracted images that sums it up:. There is a tricky part serving gzipped files in Amazon S3.
Since gzip is commonly done by the web server that zips and caches the content, S3 will not do it for you to save their CPU time avoiding compressing content.
So, what you need to do is to gzip it upfront and set the file Content-Encoding to gzip. The following guide shows how to do it. First, download a gzip program. You can use the official application or find another one that fits you better. The file myscrypt. However, do not gzip your images or other binaries contents. They are already highly compressed and the CPU cost to decompress them will not be worth it. The last step is to remove the gz part of the name and upload it to Amazon S3 setting the file Content-Encoding to gzip.
Since you'll need to do the same action for many files for each deploy, I highly recommend that you automate this process using Amazon APIs to upload your content.
To verify if your content is being gzipped, open your developer tool and check the Content-Enconding at the network tab. So, what is gzip? You can read more about it in this Better Explained guide, but if you don't want, look those following extracted images that sums it up: Serving gzipped files There is a tricky part serving gzipped files in Amazon S3.Browsers will honor the content-encoding header and decompress the content automatically. In practice, all real browsers accept it.
Most programming language HTTP libraries also handle it transparently but not boto3, as demonstrated above.
It is worth noting that curl does not detect compression unless you have specifically asked it to. I strongly recommend adding --compressed to your. Hi Vince, Can you please comment on this Stackoverflow Question. I've been trying to read, and avoid downloading, CloudTrail logs from S3 and had nearly given up on the get ['Body'].
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position ordinal not in range Skip to content. Instantly share code, notes, and snippets. Code Revisions 2 Stars 78 Forks Embed What would you like to do?
Embed Embed this gist in your website. Share Copy sharable link for this gist. Learn more about clone URLs. Download ZIP. How to store and retrieve gzip-compressed objects in AWS S3.
See the License for the specific language governing permissions and limitations under the License. We do not want to write to disk, so we use a BytesIO as a buffer. Reading it back requires this little dance, because GzipFile insists that its underlying file-like thing implement tell and seek, but boto3's io stream does not. This comment has been minimized.
Sign in to view. Copy link Quote reply. Good stuff, saved me in the world of Lambda : thanks. Very helpful. Thank you! Great code, I was looking for this online!
Thanks a lot for this! Looked all over for this!! Finally got it to work! Saved my day. This is a good fix, but I don't think it works for multi-file archives. Great code man, thnx! Thank you for sharing. I tried with python3. Here are the code. The decompression works, that's all I needed! Thanks for this code. However in certain cases I get this error on this line gz.