A few weeks ago I started looking for a new apartment where I could move in with my girlfriend. We began our searches using mainly one of the widely known classifieds websites out there here in Canada. A few days into our search I was becoming frustrated by the ability of that website to filter the listings we cared about. Our main concerns were related to budget and location. We could filter by a range of rent prices but that would usually show a lot of listings located in neighborhoods too far from our daily commutes. And as our jobs are far apart we wanted to find an apartment located in between them. But that was not something we could filter and we would often waste our time checking listings that were great on the budget side but not on the location one.
With that in mind, I wanted to somehow fetch the listings based on our budget while also calculating their distances to our commutes. That way we could discard, beforehand, listings too far away from our destinations. Being a software developer, I started coding. I chose a few technologies I already knew to get things rolling fast. Nonetheless, this was the first time I used Node and Express for backend stuff. In a couple of days, I got something working after struggling for a while with the Google Maps API and ZEIT's serverless platform which I used to deploy the application.
To achieve what I wanted I decided to write a cron job that would fetch the listings every 10 minutes or so. Luckily they had a handy real-time RSS feed I could use to fetch the apartments. The XML response I got had 20 items and among their attributes the latitude and longitude which I could use to hit the Distance Matrix API to get the distance and commute time to our two destinations. With the distance matrix computed, I would filter out the listings with a commute time greater than a threshold and store the remaining potential ones in a database. Then, it was just a matter of adding an endpoint to get those listings and putting together a simple UI to list the results, showing rent prices and distances and a link to the original post.
Well, the second day into the project (around midnight) I deployed it and went to bed only to find out the following morning something might have broken during the night because I was not seeing any new listings. I went to work intrigued by what could have caused a problem and my first guess was it should be something related to the Maps API since I was using a free-tier plan. Later that day I confirmed that was it. The usage quota had been extrapolated by (around) 55x the limit and the requests were timing out with a 403 response. I just had no idea why it had that insane amount of usage. After some digging, I found a few issues. One was related to the serverless service. When you deploy to ZEIT it replicates/yields new instances as needed to ensure availability. That makes sense for a website as it needs to scale based on traffic. But for my use case, it was duplicating my cron job and that was not what I wanted. Then the issue with the Maps API was related to the limit on the number of elements you are allowed to calculate in a day. At first, I did not pay attention to what their definition of an element was but I later learned it is the number of origins multiplied by the number of destinations per request.
"Each query sent to the Distance Matrix API generates elements, where the number of origins times the number of destinations equals the number of elements."
Since my cron job was running every 10 minutes and every fetch round was returning me 20 listings, you get 5760 elements per day. That alone already went over my quota limit but given that the cron job was duplicated the number went up to 11520. When I checked the API console though it was showing a number 6~7x higher than that and I am still confused by what could have caused so high a number. Maybe it was spawning more instances of the script that I am aware of… I dunno.
Anyway, I then started looking for possible ways to mitigate those issues. I realized I didn't have any code in place to account for previously fetched listings. So, in that window of 10 minutes between jobs, if there were no new listings it would fetch the same 20 listings from the previous job and calculate their distances again. Even though there were rules for saving unique listings in the DB, the ones that had been rejected in the first run would be calculated again only to be rejected one more time which would only add up to the API usage. So I refactored the script to have a rejected collection. Before hitting the API I would filter out those rejected listings and not waste my quota on useless elements. I also increased the cron frequency to 30 minutes. It didn't make much sense to run it every 10 minutes since I would only be checking the listings at lunchtime and when I got home in the evenings.
I first had to downgrade ZEIT from v2 to v1 on the deployment side to get a cron job running. They don't seem to support it in v2 (at least not yet). Then going through v1's documentation I found out I needed to have a fixed scaling configuration which would assure I had only one instance of the script running.
{
"scale": {
"sfo1": {
"min": 1,
"max": 1
}
}
}
After those changes were in place it was time to re-deploy it and hope for the best. And indeed all of that did the trick. After a day of running the script, I checked the Maps' API dashboard again and the quota had dropped to 700+ elements/day which was roughly half my usage limit. More importantly, now we could see right away if a listing would be worth checking and we wouldn’t waste our time on click-bait-with-nice-pictures-and-low-budget-far-far-away listings.
Long story short, constraints have a huge impact on development and they should not be overlooked. Sometimes you wish you could just use whatever resource you have without limits. Mostly this is not the case and limitations can be useful to keep the scope of a project within reach. Constraints are everywhere, not only for you as a developer but for those using your application as well, be it their low-end devices, slow network connections, or even impaired abilities of interaction. Not sure any of this makes sense but I am not used to writing blog posts. This is my first one ever but I once heard we should register our accomplishments and failures to learn and remember from what we did. Other people might find value in that too. Maybe I should do it more often.
You can find the backend and frontend code in cest-qui-jiji-bot and cest-qui-jiji, respectively. The live app can be found here.
Hats off to Wes Bos for talking about a similar idea he'd developed on his Better Living Through Side Projects podcast and helped me find the motivation to hack this little project.
Anyhow... I should run and write a cron job to remove old listings since my MongoDB cluster tier is also lim 😂