Things I've learned and suspect I'll forget.
I have updated the architecture of the website. Hopefully, nothing has changed and no one will notice the difference. The website now runs inside Amazon S3 and is served by CloudFront. A git repository stores my posts and when that repository is updated, a web hook triggers a lambda function which executes habu and syncs the files.
There are lots of solutions when it comes to making a website. From using a full service like Medium, hosted WordPress, to more customizable, yet still free, services like GitHub Pages. But even with all of these platforms available, I still wanted the control of hosting my own solution.
Back when I started this website, it was hosted on WordPress on a cheap VPS provider. After a while I got tired of having to ensure WordPress was always up to date and moved to a static website generated by habu. I customized the code a good bit to get the features I wanted, and set up a basic nginx static website. But after getting a Chromebook, I got really tired of having to have a full python environment in order to make posts to the website. Additionally, I started getting more and more familiar with AWS services like Lambda and S3.
Using Lambda and S3 allows me to push a change to my git repository and have it automatically update the S3 bucket. I no longer have to maintain a python environment in order to generate the static HTML, and I can edit my repository from within the browser.
There are several frameworks that offer similar S3 static websites. One that I looked at was hugo-lambda which watches for uploaded Markdown files and generates the static HTML pages. It even comes with CloudFormation templates for generating IAM roles and S3 buckets, which can be a bit time consuming if you're setting it up by hand. But at the end of the day, I wanted the experience of configuring everything so that I would know how it all works.
I had two major principles when creating the website. The first was that everything would be backed by source control. This included the lambda code and the posts of the website. The second major principle was that I didn't want to run any software locally. I can still write the posts in atom on my desktop, but I also have the option of using the web editor for my git repository.
An overview of the architecture is shown below.
The action starts when a commit is pushed into the git repository. The repository is configured to call a webhook that is tied to API Gateway. API Gateway passes the webhook request to a Lambda function which then parses that request and passes it to the
sync lambda functions. The first lambda function passes the repository name and commit to the
sync functions. Those functions must then go and fetch the tarball of the repository from the git server. Not shown is a separate lambda function called by both
sync that provides OAuth credentials for them to connect to the git server.
habu function runs the habu static generation script, but ignores the static files. The
sync function syncs the static files to the S3 bucket. This allows the two functions to run concurrently.
Once the files are located in the S3 bucket, CloudFront can serve the files. CloudFront has been configured to have 30 second TTLs on the caches of the html files (since they change on each post), but the static files have a longer TTL since they generally do not change and are larger in size.
sync lambda functions need to be set to timeout at 10-20 seconds instead of the default 3. Most of this time is not spent actually computing but transmitting the file to S3. Currently, every post is updated when a new post is added (because of the
Recent Posts section). If generating all of the new posts starts to take too long, I'll consider modifying the
Recent Posts section so when creating the new post so only it would need to be sent.
CloudFront's default caching levels makes it tricky to see changes in real time. There are two ways to deal with changes to the static website. You can invalidate the objects or you can set the cache TTL to a smaller time frame.
The current sync mechanism that is in place only adds new files and does not concern itself with files that are moved or deleted. In practice this shouldn't matter too much but it does raise the likelihood of an image or other resource being used that may no longer be in place in the source.
Static resources are always synchronized between my git repository and S3, even if they already exist. I don't think it is worth fixing at the moment, but the solution is fairly straightforward. Using the list_objects method in boto3, I can get the MD5 hash (from the ETag) of each object already in the static directory and compare them with the files to sync, ignoring those with the same MD5.
published on 2017-07-13 by alex