Um blog sobre nada

Um conjunto de inutilidades que podem vir a ser úteis

Archive for the ‘AWS’ Category

AWS Athena – Visualising usage and cost

Posted by Diego on June 10, 2021

This is a placeholder for this article:

https://aws.amazon.com/blogs/big-data/auditing-inspecting-and-visualizing-amazon-athena-usage-and-cost/

a fully working solution (CFN template) to analyze, amongst other things, how much data each query submitted to Athena scanned.

Posted in Athena, AWS | Leave a Comment »

AWS S3 – Copying objects between AWS accounts (TO and FROM)

Posted by Diego on December 21, 2020

There are two possible situation where you’d want to move S3 objects between different AWS accounts. You could be trying to copy an object FROM a different AWS account to your account, or you could be trying to copy an object that resides on your account TO a different AWS account. In both cases the approach is similar but slightly different.

OPTION 1 – Copy FROM another account
(you are on the destination account and want to copy the data from a source account)

First, add this POLICY to the source bucket:

  • DESTACCOUNT is the destination account ID
  • SOURCEACCOUNT: is the source account ID
  • YOURROLE: the role on the destination account that is performing the copy
  • SOURCEBUCKET is the name of the bucket where the data is
  • DESTINATIONBUCKET is the name of the bucket you want to copy the data to
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "DelegateS3Access",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:sts::DESTACCOUNT:assumed-role/YOURROLE"
            },
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::SOURCEBUCKET/*",
                "arn:aws:s3:::SOURCEBUCKET"
            ]
        }
    ]
}

Second, add this policy to the role that will perform the copy (YOURROLE):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::SOURCEBUCKET",
                "arn:aws:s3:::SOURCEBUCKET/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::DESTINATIONBUCKET",
                "arn:aws:s3:::DESTINATIONBUCKET/*"
            ]
        }
    ]
}

OPTION 2 – Copy TO another account
(for example, a lambda function copies the data from the account it runs to a different account)

First, add this policy to the destination bucket on the Destination Account

{
    "Version": "2012-10-17",
    "Id": "MyPolicyID",
    "Statement": [
        {
            "Sid": "mySid",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::SOURCEACCOUNT:root"
            },
            "Action": "s3:PutObject",
            "Resource": [
                "arn:aws:s3:::DESTINATIONBUCKET/*",
                "arn:aws:s3:::DESTINATIONBUCKET"
            ]
        }
    ]
}

Second, add the exact same policy mentioned on the “second” item above to the role that will perform the copy (YOURROLE);

IMPORTANT: Object Ownership

If you are copying from account A TO account B (a lambda running on A for example), the objects on account B be will be owned by the user that performed the copy on account A. That may (definitely will) cause problems on account B, so make sure to add the “bucket-owner-full-control” ACL when copying the object. For example:

s3 = boto3.resource('s3')    
copy_source = {
    'Bucket': 'sourceBucket',
    'Key': 'sourceKey'
}
bucket = s3.Bucket('destBucket')
extra_args = {'ACL': 'bucket-owner-full-control'}
bucket.copy(copy_source, 'destKey', extra_args)

Posted in AWS, S3 | Leave a Comment »

AWS S3 – ListObjectsV2 operation: Access Denied

Posted by Diego on May 27, 2020

Every now and then I fall for this. Upon getting a:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied

error, my first course of action is to try to add that exact permission to whatever role I need (something like: s3:ListObjectsV2)

That wont work (despite the fact that the cloudformation template will run just just fine!!) The correct permission to be given is: ‘s3:ListBucket‘, which is nevessary for the following operations:

  • GetObject
  • PutObject
  • CreateBucket

Posted in AWS, S3 | Leave a Comment »

26/02/2020 – AWS CodePipeline – Notifications EventTypeIds listed

Posted by Diego on February 26, 2020

Code Pipeline notification can be created very easily trough the AWS console, but in the spirit of 100% automation, I was trying to create them using CloudFormation. Very quicky I found out that this is a “AWS::CodeStarNotifications::NotificationRule” functionality, a new..ish AWS service that is not very well documented.

First thing you’ll notice is that you need to inform which “EventTypeIds” you want the pipeline to notify. This are the events you can select here:

At the time of writing, these events were not documented anywhere. I went through the CloudFormation documentation and I could see that the allowed values are not documented neither any reference has been provided for where this can be found.

So I reached out to AWS and their support acknowledged that this info was missing from their docs and provided me with the list below (which will probably be reflected in their docs pretty soon):

  1. CodePipeline pipelines:
    codepipeline-pipeline-action-execution-succeeded
    codepipeline-pipeline-action-execution-failed
    codepipeline-pipeline-stage-execution-started
    codepipeline-pipeline-pipeline-execution-failed
    codepipeline-pipeline-manual-approval-failed
    codepipeline-pipeline-pipeline-execution-canceled
    codepipeline-pipeline-action-execution-canceled
    codepipeline-pipeline-pipeline-execution-started
    codepipeline-pipeline-stage-execution-succeeded
    codepipeline-pipeline-manual-approval-needed
    codepipeline-pipeline-stage-execution-resumed
    codepipeline-pipeline-pipeline-execution-resumed
    codepipeline-pipeline-stage-execution-canceled
    codepipeline-pipeline-action-execution-started
    codepipeline-pipeline-manual-approval-succeeded
    codepipeline-pipeline-pipeline-execution-succeeded
    codepipeline-pipeline-stage-execution-failed
    codepipeline-pipeline-pipeline-execution-superseded
  2. CodeDeploy Applications
    codedeploy-application-deployment-succeeded
    codedeploy-application-deployment-failed
    codedeploy-application-deployment-started
  3. CodeCommit repositories
    codecommit-repository-comments-on-commits
    codecommit-repository-approvals-status-changed
    codecommit-repository-pull-request-source-updated
    codecommit-repository-pull-request-created
    codecommit-repository-approvals-rule-override
    codecommit-repository-comments-on-pull-requests
    codecommit-repository-pull-request-status-changed
    codecommit-repository-branches-and-tags-created
    codecommit-repository-pull-request-merged
    codecommit-repository-branches-and-tags-deleted
    codecommit-repository-branches-and-tags-updated
  4. CodeBuild projects:
    codebuild-project-build-state-failed
    codebuild-project-build-state-succeeded
    codebuild-project-build-phase-failure
    codebuild-project-build-phase-success
    codebuild-project-build-state-in-progress
    codebuild-project-build-state-stopped

Sample template:

Posted in AWS, CodePipeline | Leave a Comment »

21/02/2020 – AWS SNS – Lambda Notification not working when created from CloudFormation

Posted by Diego on February 21, 2020

Objective: Create a SNS topic subscription to a lambda function (when something publishes to the topic, we want to run a lambda function)

If we do it manually trough the console, it works just fine.

After creating the subscription, AWS will automatically add a trigger to the lambda function, which will allow the topic to invoke the lambda

Here, for example, on the “LambdaTest” topic, I created a subscription to the “test” lambda, and this is what I see on the lambda:

Problem: that will not happen if we create the topic + subscription using CloudFormation as AWS won’t create the trigger we see on the left.

“Common sense” would say that, you can create the lambda and the topic on CloudFormation (something like this):

and AWS will create the trigger automatically as well (like it does from the console) – but that is not the case.

You need to create the trigger yourself as well – which kind of creates a “chicken and egg” situation because the topic needs to point to the lambda (as a subscription) and the lambda needs a trigger (EventSource) needs to point to the topic.

Fortunately (or not – who knows?) from cloudformation you can create SNS subscription to lambdas that don’t yet exist (only the console enforces an existing lambda by throwing a “ResourceNotFoundException” error message).

Alternatively, you can add an “AWS::Lambda::Permission” to your fucntion, which allows the SNS Topic to call the Lambda Function. These are called “Resource-based policy” and enable you to grant usage permission to other accounts on a per-resource basis. You also use a resource-based policy to allow an AWS service to invoke your function.

Posted in AWS, SNS | Leave a Comment »

AWS CodePipeline – SNS Notifications using existing Topics

Posted by Diego on November 20, 2019

AWS recently release the functionaly of setting up notification for CodePipeline using SNS.

“You can now receive notifications about events in repositories, build projects, deployments, and pipelines when you use AWS CodeCommitAWS CodeBuildAWS CodeDeploy, and/or AWS CodePipeline. Notifications will come in the form of Amazon SNS notifications. Each notification will include a status message as well as a link to the resources whose event generated that notification.”

When I tested the functionalty the first time, I created the SNS topic using the console (during the Notification Rule creation) and everything worked as expected.

After the test, I decided to create the resources (specially the SNS topic) using cloud formation and I noticed that the notification weren’t being published to the topic anymore.

After some research I found this on the AWS documentation:

“If you want to use an existing Amazon SNS topic instead of creating a new one, in Targets, choose its ARN. Make sure the topic has the appropriate access policy,….”

And indeed I realised that, when the topic was  being created by the console, it added permission to “codestar” to publish to the topic…something that I never imagined necessary, because I didn’t know codestar was part of the equation.

In CloudFormation words, what I needed to do was something liem this:

FYI: the __default_statement_ID Sid is created automatically by cloudformation if you don’t specify a “TopicPolicy”. Since we are adding the “codestar” permission, we need to add the default statement (if, of course you actually need those permissions)

PipelineNotificationTopic:
    Type: AWS::SNS::Topic
    Properties: 
      DisplayName: MyTopicDisplayName
      TopicName: MyTopicName
  
  PipelineNotificationTopicPolicy:
    Type: AWS::SNS::TopicPolicy
    Properties: 
      PolicyDocument: 
        Version: '2008-10-17'
        Statement:
        - Sid: CodeNotification_publish
          Effect: Allow
          Principal:
            Service: codestar-notifications.amazonaws.com
          Action: SNS:Publish
          Resource: !Ref PipelineNotificationTopic

        - Sid: __default_statement_ID
          Effect: Allow
          Principal:
            AWS: "*"
          Action:
          - SNS:GetTopicAttributes
          - SNS:SetTopicAttributes
          - SNS:AddPermission
          - SNS:RemovePermission
          - SNS:DeleteTopic
          - SNS:Subscribe
          - SNS:ListSubscriptionsByTopic
          - SNS:Publish
          - SNS:Receive
          Resource: !Ref PipelineNotificationTopic
          Condition:
            StringEquals:
              AWS:SourceOwner: !Sub ${AWS::AccountId}

      Topics: 
        - !Ref PipelineNotificationTopic

Posted in AWS, CodePipeline, DevOps, SNS | 2 Comments »

AWS EMR – Notebook permissions

Posted by Diego on November 13, 2019

I recently was working on a “ EMR Notebook” attached to a cluster and noticed a strange behaviour as I was getting the infamous “An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied” error while trying to read data from S3 from a Python notebook.

So, I checked the notebook’s role (which is called EMR_Notebooks_DefaultRole and is the default role you are presented with when creating a Notebook) and noticed it HAD the “AmazonS3FullAccess” policy attached to it.

To my further amusement, I realized that I DIDN’T receive that error when running similar code from a PySpark notebook EVEN TOUGH the IAM role attached to the EMR Cluster that the notebook was connected, DIDN’T have S3 permissions;

Very quickly I realised that:

  1. Since I was running the code from a pyspark notebook, it was being submitted to the EMR cluster associated with the it and run as spark job.
  2. AND remembered that the job will run from the underlying EC2 instance(s) so it will assume the IAM role associated with them (which HAD read permission to S3);

So great, one problem “solved” – but I was still clueless why the NB didn’t have access to S3.

While debugging the logs associated with the boto3 call to S3, I came across an awkward response from the API call:


Response body: b'
<?xml version="1.0" encoding="UTF-8"?>n
<Error>
<Code>AuthorizationHeaderMalformed</Code>
<Message>The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-1'
</Message>
...

And I say “awkward”, because all my resources were created on eu-west-1….so why was it using eu-east-1?

So, I opened a terminal (on the notebook – not on ERM) and checked that the role it ACTUALLY uses when submitting API calls is called “prod-EditorInstanceRole

Upon further research, I’ve learned that this is a role maintained by AWS and it does not have permission to the resources in your account, because “EMR notebook” is a managed service and it is designed to grant only limited feature and permission, and it cannot assume customer role to access s3 resources; 😐

The ”eu-east-1” message, I can only assume is happening because that’s where the underlying EC2 instances are running from (?)

The role EMR_Notebooks_DefaultRole is the notebook service role which is used by the notebook service to manage the AWS resources, for example provisioning EMR cluster, loading and saving notebooks to s3. This role is not assumed to run command line or code on the notebook instance.

I guess that, if someone needs to use their IAM roles to manage resources from the notebook locally, they would need to launch an EMR cluster with JupyterHub.

In summary:

  • On a Pyspark notebook:
    • Commands will be issued to the EMR cluster, which means the permissions will be determined by the IAM role assigned to the worker EC2 instances (not the EMR’s role);
  • Python notebook:
    • The IAM role attached to it is used to manage resources (save the notebooks to S3 for example);
    • Command ran on it uses the “prod-EditorInstanceRole”;
  • If that’s not enough, create an EMR cluster with JupyterHub;

Posted in AWS, EMR | Leave a Comment »

AWS CodePipeline – Using SAM to create a Sample Template

Posted by Diego on November 11, 2019

This is very useful if you don’t want to write a template from scratch



  1. Install the AWS Serverless Application Model (SAM);
  2. Create the cloud formation template:
    1. Navigate to an empty folder
    2. run: sam init –location gh:aws-samples/cookiecutter-aws-sam-pipeline
    3. Follow the instructions
    4. (the error message is expected – I always get it but it works fine)
    5. adapt the files sample_buildspec.yaml and sample_pipeline.yaml as you like
  3. Run the pipeline.yaml to create the pipeline;
  4. As a further suggestion, place the buildspec.yaml and pipeline.yaml files at the root of your code commit repository to keep track on their changes;



Posted in AWS, CodePipeline, DevOps | Leave a Comment »

AWS Redshift – How to programmatically script body and signature of Functions

Posted by Diego on November 7, 2019

This is particularly useful if you want to:



  1. Script all functions in your database;
  2. Automatically manage permissions on your functions since you have to include the function’ signature on the grant\revoke statement;



SELECTproname, n.nspname||'.'||p.proname||'('||pg_catalog.oidvectortypes(p.proargtypes) ||')'assignature, prosrc as body
FROM pg_catalog.pg_namespace n 
     JOIN pg_catalog.pg_proc p ON pronamespace = n.oid



Posted in AWS, Redshift | Leave a Comment »

AWS SageMaker – “HeadObject not found” error when training SKLearn Model

Posted by Diego on October 18, 2019

The code below is a pretty straightforward example on how to create a Sklearn estimator and run a training job using SageMaker Python SDK.

import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()
role = get_execution_role()
script_path = 'myPythonFile.py'
source_dir = 'myFolder'

sklearn = SKLearn(
entry_point=script_path,
train_instance_type="ml.c4.xlarge",
role=role,
source_dir = source_dir,
sagemaker_session=sagemaker_session,
hyperparameters={'max_leaf_nodes': 10})


training_job = sklearn.fit({'train': train_input} , job_name='myJob')

It works perfectly fine if the “entry_point” script and the “source_dir” directory are on the same location as the code is being executed (a SageMaker notebook for example), however, if you try to use files located on S3, like so:

source_dir = "s3://mybucket/myfolder/ "

you will get one of the errors below at the “Invoking user training script” step:

  • “HeadObject not found”
  • “UnexpectedStatusException: Error for Training job testSMSM10: Failed. Reason: AlgorithmError: framework error”
  • “tarfile.ReadError: empty file”
  • “tarfile.EmptyHeaderError: empty header”

That happens because, if referencing S3, the source_dir must point to a .tar.gz file in a s3 bucket and not just the directory itself, which is not mentioned anywhere on the documentation

source_dir Path (absolute or relative) to a directory with any other training source code dependencies including the entry point file. Structure within this directory will be preserved when training on SageMaker.

So, your source_dir should be:

source_dir = "s3:// mybucket / myfolder /sklearn.tar.gz

where sklearn.tar.gz contains all the required files.

Oh, and BTW, it must be a tar.gz file, otherwise, you’ll get an error like “OSError: Not a gzip”

Posted in AWS, Machine Learning, SageMaker | Leave a Comment »