We have some crawler for gathering data from internet.
EC2 spot is a very inexpensive solution for our application.
In our case, we can set up the crawler by following steps:
- launch an ami from AMAZON quick start template
- install the dependency library
- send crawler app to instance
- set up the launcher for our crawler, let it work after boot completed
- make the instance as an ami
But we need to repeat the step 3 when crawler need to update.
It influences other settings, such as the ‘ami-id’ in auto scaling
or other spot instance request scripts.
Application managment in ‘ami’ is a deployment issue, therefore we need suggestions to make it as easy as possible. Now, there is another way to manage it. We use the source code management tool, and deployment steps is like this:
- 3 git clone from source code repo.
- 3.1 compile the app from source
- 3.2 remove the previous build
- 3.3 install the latest build
- 4 launcher always rebuild crawler from latest release before it wakes up the crawler.
The new method prevents from ami-id changing, but it must checkout source code each time. Finally, it takes more time to fetch source (source is growing everyday)
How do you manage your artifacts on ami ?
I’m not sure always building from source is the best choice.
It only overcome some deployment problem, but no addressing about updating after the crawler instance has been running.
Well, if your crawler is not updating every hour of the day then I think you should
write some scriptie You will be using both of your idea previous and new, to do so write the script to check from your server if the current build is latest thengo normalcrawling and if that older thenmove to the GIT Clone stuff, by this if you are not modifying the crawler very often you can have efficient performance.with above actually you will be reducing the rebuild for most of the time because as you describe the rebuild process you must be doing these steps mostly for no reason
Hope this helps you