How to List All Objects on an S3 Bucket

Written by Tom Wilkins
Sat, 3 Oct 2020

Listing the contents of an S3 bucket has several use cases, this post shows you how with the AWS SDK

How to List All Objects on an S3 Bucket

The Simple Storage Service (S3) from AWS can be used to store data, host images or even a static website. It's essentially a file-system where files (or objects) can be stored in a directory structure.

There are many use cases for wanting to list the contents of the bucket. My use case involved a bucket used for static website hosting, where I wanted to use the contents of the bucket to construct an XML sitemap.

The AWS Software Development Kit (SDK) exposes a method that allows you to list the contents of the bucket, called listObjectsV2, which returns an entry for each object on the bucket looking like this:

{
Key: 'index.html',
LastModified: 2020-10-03T10:04:19.849Z,
Size: 149860,
StorageClass: 'STANDARD'
}

The only required parameter when calling listObjectsV2 is Bucket, which is the name of the S3 bucket. You must ensure that the environment where this code will be used has permissions to read from the bucket, whether that be a Lambda function or a user running on a machine.

Using listObjectsV2 will return a maximum of 1000 objects, which might be enough to cover the entire contents of your S3 bucket. But what if you have more than 1000 objects on your bucket?

Listing all S3 objects

As well as providing the contents of the bucket, listObjectsV2 will include meta data with the response. This includes IsTruncated and NextContinuationToken. We can use these to recursively call a function and return the full contents of the bucket, no matter how many objects are held there.

Here's an example using TypeScript:

import { S3 } from 'aws-sdk';

const s3 = new S3();

async function listObjects(
Bucket: string,
data: S3.ObjectList = [],
ContinuationToken: string | undefined = undefined
): Promise<S3.ObjectList> {
const response = await s3.listObjectsV2({ Bucket, ContinuationToken }).promise();
data.push(...response.Contents!);
if (response.IsTruncated) {
return listObjects(Bucket, data, response.NextContinuationToken);
}
return data;
}

Here I've used default arguments for data and ContinuationToken for the first call to listObjectsV2, the response then used to push the contents into the data array and then checked for truncation. If it is truncated the function will call itself with the data we have and the continuation token provided by the response.

This will continue to call itself until a response is received without truncation, at which point the data array it has been pushing into is returned, containing all objects on the bucket!

Thank you for reading

You have reached the end of this blog post. If you have found it useful, feel free to share it on Twitter using the button below.

Tags: TIL, Node.js, JavaScript, Blog, AWS, S3, AWS SDK, Serverless