Debugging Elastic Beanstalk

Recently, I’ve had the extremely mixed blessing of working with AWS and Elastic Beanstalk to deploy an app. This app has continuous delivery going using the Code* Suite. While not wonderful, CodePipeline/CodeBuild are at least straightforward.

Enter Elastic Beanstalk.

I’ve been attempting to use the CLI to trigger Elastic Beanstalk version updates. This would seem quite straight forward:

Elastic Beanstalk has Applications
Applications have Environments
Applications have Versions
Environments have deployed versions (which are called “Running version” in the AWS console but “version label” or “version-label” or “VersionLabel” in the CLI/API)

So, to update an Application, you just need to create a new Application Version and then update its Environment with a new Version Label. Bish bash bosh. In fact, that’s exactly what the eb deploy command does (https://github.com/aws/aws-elastic-beanstalk-cli/blob/master/ebcli/operations/deployops.py#L23).

My problems started here. All of the above worked fine. However, if the app was already deployed, it would (and as of the most recent updateof this post — will) not redeploy the app. In the UI, the manifests as “degraded” state where the “Running version” and the “deployed version” are not the same. Annoyingly, Elastic Beanstalk will want to revert to the previously successful version and it will tell you that its most recently deployed version is unexpected. I mean, if the revert worked, okay fair enough, but since the revert is failing, that just leaves things in an even more confusing state.

Naturally, what I try to do is redeploy the correct version manually. That’s where the wheels really come off. Manually redeploying either in the console or via the CLI or via continuous delivery results in an unsuccessful command execution and the entire environment becomes unrecoverable.

So, what on earth is going wrong here? Well, in the AWS Console, I don’t get much:

ERROR During an aborted deployment, some instances may have deployed the new application version. To ensure all instances are running the same version, re-deploy the appropriate application version.

ERROR Failed to deploy application.

ERROR Unsuccessful command execution on instance id(s) 'i-XXXXXXXXXXXXXXXXX'. Aborting the operation.

INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].

ERROR [Instance: i-XXXXXXXXXXXXXXXXX] Command failed on instance. An unexpected error has occurred [ErrorCode: 0000000001].

Not much help there.

Out of curiousity, I SSH’d into one of the EC2 instances.

Running sudo docker ps, I could see there were no containers running. So, then I turned to /var/log/ to see if I could find anything. I did indeed.

Digging into /var/log/eb-engine.log on the EC2 instance, I can see that Beanstalk is trying to restart nginx, but nginx is putting up a fight:

2020/04/15 00:34:55.708346 [INFO] Executing instruction: register Nginx process
2020/04/15 00:34:55.708425 [INFO] Register process nginx
2020/04/15 00:34:55.708516 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2020/04/15 00:34:55.714774 [WARN] Warning: process nginx is already registered...
Deregistering the process ...
2020/04/15 00:34:55.714869 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2020/04/15 00:34:55.722615 [INFO] Running command /bin/sh -c systemctl is-active nginx.service
2020/04/15 00:34:55.729960 [INFO] Running command /bin/sh -c systemctl show -p PartOf nginx.service
2020/04/15 00:34:55.737441 [INFO] Running command /bin/sh -c systemctl stop nginx.service
2020/04/15 00:34:56.066516 [ERROR] Job for nginx.service canceled.

2020/04/15 00:34:56.066664 [ERROR] stopProcess Failure: stopping process nginx failed: Command /bin/sh -c systemctl stop nginx.service failed with error exit status 1. Stderr:Job for nginx.service canceled.

2020/04/15 00:34:56.066717 [ERROR] deregisterProcess Failure: process nginx failed to stop:
stopProcess Failure: stopping process nginx failed: Command /bin/sh -c systemctl stop nginx.service failed with error exit status 1. Stderr:Job for nginx.service canceled.


2020/04/15 00:34:56.066810 [ERROR] An error occurred during execution of command [app-deploy] - [register Nginx process]. Stop running the command. Error: register process nginx failed with error deregisterProcess Failure: process nginx failed to stop:
stopProcess Failure: stopping process nginx failed: Command /bin/sh -c systemctl stop nginx.service failed with error exit status 1. Stderr:Job for nginx.service canceled.

What the logs above show is that the Elastic Beanstalk engine itself is trying to stop and restart nginx on the EC2 instance, but those commands are failing for reasons I don’t yet understand. Because these permissions are failing, the deployment process halts and the instance gets stuck in a down state.

By running sudo systemctl stop nginx.service, I can get the instance back and re-deploy successfully, but so far this has a 100% failure rate for all new deployments. Weird.

To get more clarity (though I have literally zero idea what the heck this means), I tried running the same command without sudo. I get an… interesting… error:

[ec2-user@ip-XXXXXXXXXXX log]$ systemctl stop nginx.service
Failed to stop nginx.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
See system logs and 'systemctl status nginx.service' for details.

Wat. But the same command with sudo (as you might expect) works a treat. sudo make me a sandwich to the rescue.

[ec2-user@ip-XXXXXXXXXXX log]$ sudo systemctl stop nginx.service
[ec2-user@ip-XXXXXXXXXXX log]$

Ho hum. I’m still chasing the root cause. Maybe this is the “unreliable deployment” cited as an issue here: https://medium.com/@acamp/elastic-beanstalk-advantages-and-drawbacks-be814615af01.

Update: For anyone working against the same issue, I managed to get around the problem by switching to immutable deployments. While *significantly* slower, immutable deployments have the advantage of always deploying to a clean EC2 instance. Because they always spin up a new instance, they don’t suffer from the restart failures described above.

Jackson Gabbard's Blog

A Scattered Repository

Leave a Reply Cancel reply