Bucket Management


When using SpinUp Object Storage, either as a repository for files or as a platform for publishing content to CDN, it’s helpful to have a plan in mind for designing your bucket structure and your maintenance plan for working with objects. Because object counts for some applications can reach the millions, you can simplify the process and improve performance when working with this content (and especially creating automation around this content) by following some recommendations for this platform.

Organize your buckets

Structuring for ease of use

The topmost layer of organization for your files in Object Storage is the bucket. Because your SpinUp account supports the creation of up to 500,000 buckets, we encourage you to structure your content in Object Storage into multiple buckets, as many as makes sense for your applications or tasks. For example, if you are hosting static website content to be published to CDN, then you can sort your objects into buckets by their file type: javascript, css, and images, and so on.

Structuring for better performance

Up until a certain point, you only need to consider organization for easy location of your objects. A bucket with 100 or 1000 objects tends to perform identically. However, if your object counts go into the hundreds of thousands, or millions, then splitting your content into multiple, smaller buckets not only aids in organization but in performance as well, because the platform is designed around consistent, reliable access to your files rather than high-performance IO speeds.

Write operations to individual buckets (uploads, deletions, and modifications of header data) are limited to 100 per second, and uploading a large number of objects into a single bucket might put you at risk of going over the rate limits, and therefore failed attempts to perform writes. In turn, having to reattempt failed write operations can negatively impact your application’s performance. Single buckets with high object counts might also encounter performance issues with listing the bucket contents, taking longer to reflect additions and deletions to the bucket.

Taking the previous example of website static content, suppose you need to plan Object Storage bucket structure for a website that hosts photos of real estate listings in multiple cities. Because these images can easily accumulate into millions of objects, a single bucket for “images” isn’t very descriptive or productive. For this type of environment, consider dividing up images into more granular categories such as upload date, geographic region, or some other logical grouping to distribute the number of objects across more than one bucket.

Structuring for website acceleration on CDN

For users that mainly use Object Storage to publish files to CDN, buckets should be organized by how often the objects inside them require changes, whether this means overwriting a file with a new object of the same name, or changing the metadata headers. The Time to Live (TTL) setting on each CDN-enabled bucket determines how often the CDN services request a fresh copy of the object from SpinUp Object Storage, so your files that require more frequent updates should be organized into buckets that can have a low TTL assigned. To minimize bandwidth costs, files that you anticipate to change infrequently (or never) after being uploaded can be placed in buckets with a high TTL value. A properly designed web application that has TTL times tuned correctly for its needs should only require a purge (a forced deletion of an object from the CDN edge nodes) for emergency removal of sensitive content from the CDN. Mass purges should not be part of a maintenance plan for simply refreshing CDN content whose TTL is set too high.

Planning ahead and bucket naming

Keeping the potential for those high object counts in mind, it’s far less painful to plan ahead than it is to rearrange your objects when problems arise. Buckets cannot be renamed after the fact. Instead, objects must be copied to a new bucket then deleted from the original bucket to avoid duplicate storage. Similarly, changing the virtual folder structure around an object requires making a copy with the correct file path, then deleting the original. Adding an incremental number to the end of the bucket names can also assist with the potential need for a future roll over to a new bucket if the number of objects becomes too large. For example, starting with a bucket named reports-00001 makes it simpler to increment the number and add more buckets that store the same type of files.

Keep a database of your bucket structure

Especially for users who are using Object Storage for file storage attached to their applications (rather than publishing to CDN), we recommended that you keep a database (or at least a listing) of your bucket structure and the objects in storage and update it each time objects are added or removed. This reduces the risk of naming conflicts and prevents the need to search through all containers to locate an object. This speeds up the time needed to make updates to any objects because locating them through a database is significantly faster than listing all your storage contents, then searching to find a match. You can verify bucket contents against your database or other record keeping by making an API HEAD request for that bucket and checking the X-Container-Object-Count in the response.

Removing content from Object Storage

Set an X-Delete-At header

You can use Object Storage to retain objects for only a set period of time, rather than retaining those objects until you explicitly delete them. To use this function, set the X-Delete-At metadata header on individual objects, either at the time of upload or with a follow-up request to modify headers, and the backend services for Object Storage will delete the object from storage on the requested date.

Multithread your bulk delete requests

Yet another reason to keep your content divded into multiple buckets, as we’ve emphasized here repeatedly, is for managing the removal of a large number of objects. If you’ve followed the best practices for organizing buckets outlined here, then you can multithread any scripts or automation you use to remove objects in bulk, rather than being throttled to approximately 100 deletes per second (assuming no other write operations are occurring at the same time).


Related Content