Backing up MongoDB on AWS Ec2 to S3
At Adobe, we had been using MongoDB for storing our massive user data. Given the scale, we had experts handling it.
At ShotPitch, we decided to use MongoDB but it did not come with the same perks of having a full time DevOps team. A week back, we had the scare of our life when we thought that we had lost all our data. At that moment I bumped up MongoDB backup tasks priority.
However, we are very small in scale. We currently run entire MongoDB from a single instance. Most of the solutions in market are paid or were not straight forward, until I came across this article. I just tweaked this for my usage. Here is what I did.
Assumption: You are doing this exercise on your MongoDB node
- Install S3cmd
In Linux
sudo su
sudo yum --enablerepo epel install s3cmd
In Ubuntu
wget -O- -q http://s3tools.org/repo/deb-all/stable/s3tools.key | sudo apt-key add -
sudo wget -O/etc/apt/sources.list.d/s3tools.list http://s3tools.org/repo/deb-all/stable/s3tools.list
sudo apt-get install s3cmd
- Configure S3cmd
s3cmd --configure
You will be asked to provide your access key and secret key. You can get the same from the AWS Console. This doc tells you how you can get the same.
Also make sure you go to permissions tab in the User that you created and attach AmazonS3FullAccess policy with it
- Test S3cmd
s3cmd ls
This command will show you all your S3 buckets(if any)
- Backup Script
Create a new shell file named as mongo_backup.sh and write the following code in it.
#!/bin/bash
#Force file syncronization and lock writes
mongo admin --eval "printjson(db.fsyncLock())"
MONGODUMP_PATH="/usr/bin/mongodump"
MONGO_DATABASE="dbname_here" #replace with your database name
TIMESTAMP=`date +%F-%H%M`
S3_BUCKET_NAME="bucketname_here" #replace with your bucket name on Amazon S3
S3_BUCKET_PATH="mongodb-backups"
# Create backup
$MONGODUMP_PATH -d $MONGO_DATABASE
# Add timestamp to backup
mv dump mongodb-$HOSTNAME-$TIMESTAMP
tar cf mongodb-$HOSTNAME-$TIMESTAMP.tar mongodb-$HOSTNAME-$TIMESTAMP
# Upload to S3
s3cmd put mongodb-$HOSTNAME-$TIMESTAMP.tar s3:*//$S3_BUCKET_NAME/$S3_BUCKET_PATH/mongodb-$HOSTNAME-$TIMESTAMP.tar
*
#Unlock database writes
mongo admin --eval "printjson(db.fsyncUnlock())"
#Delete local files
rm -rf mongodb-*
- Verify the script
bash mongo_backup.sh
What you should see is a zip file in your S3 bucket under mongodb-backups. If you download that file, you will be able to see the mongodump of your database.
- Automatic backup using Cron
We now want to automate this process of taking the backup of MongoDB to S3.
sudo su
crontab -e
Once in the file, you need to write code to schedule your process. Here is the cheat code for understanding scheduling Cron Jobs
+---------------- minute (0 - 59)
| +------------- hour (0 - 23)
| | +---------- day of month (1 - 31)
| | | +------- month (1 - 12)
| | | | +---- day of week (0 - 6) (Sunday=0 or 7)
| | | | |
* * * * * command to be executed
So If I want to schedule a backup on 1st of every month, I would write the following line in the crontab file
#1st of every month at 9 am
00 09 1 * * /bin/bash /home/ubuntu/mongo_backup.sh
This is a very basic way to make sure you do not loose data. As we grow bigger and have more instances, I will keep you guys updated about my new learnings and our new approaches.