We have some crawler for gathering data from internet. EC2 spot is a very

Question

0

Editorial Team

Asked: June 7, 20262026-06-07T08:01:26+00:00 2026-06-07T08:01:26+00:00

We have some crawler for gathering data from internet. EC2 spot is a very

0

We have some crawler for gathering data from internet.
EC2 spot is a very inexpensive solution for our application.

In our case, we can set up the crawler by following steps:

launch an ami from AMAZON quick start template
install the dependency library
send crawler app to instance
set up the launcher for our crawler, let it work after boot completed
make the instance as an ami

But we need to repeat the step 3 when crawler need to update.
It influences other settings, such as the ‘ami-id’ in auto scaling
or other spot instance request scripts.

Application managment in ‘ami’ is a deployment issue, therefore we need suggestions to make it as easy as possible. Now, there is another way to manage it. We use the source code management tool, and deployment steps is like this:

3 git clone from source code repo.
3.1 compile the app from source
3.2 remove the previous build
3.3 install the latest build
4 launcher always rebuild crawler from latest release before it wakes up the crawler.

The new method prevents from ami-id changing, but it must checkout source code each time. Finally, it takes more time to fetch source (source is growing everyday)

How do you manage your artifacts on ami ?
I’m not sure always building from source is the best choice.
It only overcome some deployment problem, but no addressing about updating after the crawler instance has been running.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T08:01:27+00:00

Well, if your crawler is not updating every hour of the day then I think you should write some script ie You will be using both of your idea previous and new, to do so write the script to check from your server if the current build is latest then go normal crawling and if that older then move to the GIT Clone stuff, by this if you are not modifying the crawler very often you can have efficient performance.

with above actually you will be reducing the rebuild for most of the time because as you describe the rebuild process you must be doing these steps mostly for no reason

Hope this helps you

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We have some crawler for gathering data from internet. EC2 spot is a very

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply