Configure AWS access key and secret access key for enabling pyspark to read-write from an S3 bucket.
- Step 1: Create an S3 bucket
- Step 2: Create an IAM policy with specific S3 permissions (example below) and attach it to an IAM role.
Sample IAM policy which grants programmatic read-write access to the "test" S3 bucket:
- Step 3: Once the IAM user is created, note down the Access key ID and the Secret Access key.
- Step 4: To enable pyspark to read-write from an S3 bucket
- This will permanently save the environment variables for all users and user sessions. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are necessary for pyspark code to be able to read/write to an S3 bucket.