Skip to content

heyteacher/sam-forecast-automation-covid-19-ita

Repository files navigation

AWS SAM template for AWS Forecast automation

Liberpay GitHub license GitHub commit

A AWS SAM template for AWS Forecast process automation using AWS Step Functions state machine, based on a real case study: Forecast of new daily positive based on COVID-19 italian datasets.

This AWS SAM template is running in my AWS Account and push daily the forecast in this repository: https://github.com/heyteacher/COVID-19 (folder dati_json_forecast)

Furtermore datasets and forecasts are visualized by this charts dashboard https://heyteacher.github.io/COVID-19 an Angular 9 project hosted in this repository https://github.com/heyteacher/ng-covid-19-ita-charts

This AWS SAM template is general purpose, so can be adapted to other forecast based on AWS Forecast removing or replacing specific use case tasks.

It's difficult automate AWS Forecast process because:

  1. AWS Forecast tasks are long running proccess and cannot be start until previous step is succesfully finish

  2. AWS Forecast doesn't implements push notification (for example via AWS SNS) to inform the end of a task, so it isn't possible do create e event driven flow of AWS Forecast tasks. It's only possible to poll entity status after creation in order to understand if it's succesfully created.

Why automate AWS Forecast task using AWS Step Functions?

Because AWS Step Functions is a Serverless State Machine which orchestrate AWS Lambda implements AWS Forecast api calls managing AWS Forecast entities, and support Retry, Fallback and other flow controls.

Only the first state machine execution creates the persistent entities Dataset, Dataset Group and Predictor, while during daily next executions, the forecast will update creating Forecast Dataset Import Job, Forecast and Export Job

The AWS Step Functions is launched by a AWS Cloud Watch Event Rule which start following the rule expression defined into StateMachineEventRuleScheduleExpression parameter. But the forecast is generated only in day of week defined into ForecastDaysOfWeekExecution parameter.

AWS Step Functions Forecast

Below the daily flow of AWS Step Functions steps:

  1. Extend Dataset is a specific task of case study, you can drop it. Download from daily official dataset, extend it and push in configured Github repository. It retries until a new dataset is pushed into official repository

  2. CheckDaysOfWeekForecastExec is a simple inline lambda which set isToExecuteForecast = true if the day of week of today is in ForecastDaysOfWeekExecution parameter

  3. ChoiceForecastExecution is a choice on isToExecuteForecast: if true generate forecast otherwise go to Done task and exit

  4. CheckDatasetExist is the start state, check if the Dataset and (Dataset Group) exists.

    • If it doesn't exist means this is the first execution. CreateDatase create the Dataset and Dataset Group
  5. WaitGithubRawRefresh another specific task of case study which can be dopped. It wait some minute in order to be sure the github raw cache is refreshed after push

  6. CreateDatasetImportJob downloads from configured Github the dataset (new daily COVID-19 time series), trasform the data in order to match che Forecast dataset structure, upload into the S3 Input Bucket and create the daily Dataset Import Job

  7. CheckPredictorExists checks if the predictor exists.

    • If predictor doesn't exist (means this is the firt execution) run CreatePredictor which create the Predictor. It will be create if there is at least one Dataset Import Job loaded. Then run WaitPredictorCreation wait 50 minutes in order to be sure of Predictor creation

    • otherwise run WaitDatasetImport which sleep 5 minutes

  8. CreateForecast creates the daily Forecast based on Predictor updated by daily Dataset Import Job

  9. WaitForecastCreation sleeps 15 minutes in order be sure of forecast creation is finished

  10. CreateForecastExportJob exports the daily forecast in S3 Output Bucket. The upload wake up PushForecastInGithubFunction which download the forecast, ad push into configured Github repository (this AWS Lambda is specific of study case) v

  11. WaitExportJob sleeps 3 minutes in order to be sure of export is finished

  12. DeleteDatasetImportExportJob delete the daily Dataset Import Job and the daily Export Job

  13. WaitDeleteDatasetImportExportJob sleep 5 minutes in order to be sure of deletion is finished

  14. DeleteForecast deletes the daily Forecast

  15. Done the end state of workflow

Some tasks retries after a failure in order to wait that previous step is succesfully finished.

The AWS SAM Template assign the minimum permission to each AWS Lambda Functions in order to complete his task. All the entities (S3 Bucket, AWS Lambda Function, IAM Roles, AWS Step Functions, Event Rule) are created/updated/deleted by AWS SAM Template stack, so no manual activies is needes.

Note

  • this project is ispired by https://github.com/aws-samples/amazon-automated-forecast

  • BE CAREFULL if yoy try to create a stack from this SAM Template. First execuction costs 4,00 EUR circa and next daily execution costs 1,00 EUR circa.

  • I already run a stack in my AWS Account which produces forecast here https://github.com/heyteacher/COVID-19. So you can support this project making a donation Liberpay

  • Only AWS Forecast entities Predictor, the first Dataset Import Job, Dataset and Dataset Group must be deleted manually if you decide to delete AWS SAM Template stack.

  • All AWS Lambda are implemented in NodeJs 12.X

  • AWS Forecast doesn't implement epidemiological forecasting scenario like COVID-19 Italian new cases series, so the algorithm is choosen by PerformAutoML=True. I'm not an expert, so help is appreciated in algorithm tuning for these use case https://docs.aws.amazon.com/forecast/index.html

  • I spent a lot of time to improve the AWS SAM Template but I'm sure it could be better. So do not esitate so submit Issue or Pull Request

Install

  1. install nodejs aws-cli aws-sam-cli docker

  2. generare aws_ac-cess_key_id and aws_secret_access_key from a AWS user with the permissions for create/update/delete CloudFormation stacks

  3. create the github repository <GITHUB_REPO> in your account <GITHUB_USER>

  4. generate a <GITHUB_TOKEN> in https://github.com/settings/tokens with scope repo

  5. to test locally lambda functions (for example ExtendDataFunction)

    sam local invoke ExtendDataFunction \
    --parameter-overrides  GitHubToken=<GITHUB_TOKEN> GitHubRepo=<GITHUB_REPO> GitHubUser=<GITHUB_USER> 
    

    Useful bash scripts sam_local_invoke.sh.template and sam_local_invoke_push_github.sh can be customized in order to run locally lambda functions

Packaging e Deploying

Useful bash script deploy_stack.sh.template can be customized in order to automate stack deploy (steps package and deploy)

  1. delete old stack

    aws cloudformation delete-stack --stack-name forecast-automation-covid-19-ita
    
  2. package

    aws cloudformation package --template-file template.yaml \
    --output-template-file packaged.yaml \
    --s3-bucket <SAM_TEMPLATE_BUCKET>
    
  3. deploy

    aws cloudformation deploy --template-file packaged.yaml  \
    --stack-name forecast-automation-covid-19-ita \
    --capabilities CAPABILITY_IAM \
    --parameter-overrides  GitHubToken=<GITHUB_TOKEN> GitHubRepo=<GITHUB_REPO> GitHubUser=<GITHUB_USER> 
    
  4. show stack events

    aws cloudformation describe-stack-events --stack-name forecast-automation-covid-19-ita
    
  5. tail lambda logs (for example ExtendDataFunction)

    sam logs -n ExtendDataFunction --stack-name forecast-automation-covid-19-ita --tail
    
    

About

AWS forecast automation sam template based on COVID-19 italian datasets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published