I have recently launched a new website, emissionstracker.ca (repo), with two colleagues, Rose Scoville and Nick Richardson. Rose and Nick were primarily focused on the front end and visual design, while I focused on the back end, server, containerization, and design of the overall stack. The data for the application is taken from open source data provided by the Canadian federal government on greenhouse gas emitters. Users can search for emitters by type, year, facility type, or province.
This application utilizes Java Spring framework for the API, React for the front end and PostgreSQL for persistence. For development, the React is served by Vite, but for production, JavaScript is served by Nginx as static files. The entire application is containerized in 4 separate services: Spring, React/Vite, React/Nginx, and PostgreSQL. The front end uses Leaflet JS to generate the map and cluster groupings.
Initially when I was writing the application, I could only have one dataset in the database at any given time. If a new dataset was uploaded, the old needed to be deleted. This meant that if a new dataset was uploaded and, for some reason, did not parse correctly, perhaps as a result of changes in the upstream data source, then the old data could potentially be lost; rendering the application unusable until the problem is fixed. I created a data set table in the database to resolve this. Each time new data is uploaded, metadata about the dataset is stored in this table. Additionally, an “active” boolean column is set to true if it is intended to be sent to the front end, and queries return only rows that have the active data set foreign key in the dataset column. If the application is functioning as expected, the old dataset can then be deleted, and if not, it is easy to roll back to the previous dataset until the problem is resolved.
A major drawback in the application is that the amount of processing required to fetch the data is fairly significant. The application makes a SQL query for the data for each province, and then calculates the aggregate emissions for each emitter based on the year or years selected by the user. Given the limited server resources, response times for the data made the application feel extremely unresponsive. The solution for this was to enable caching on the service that returns the DTO. The data changes infrequently, as little as once a year, making this a perfect use case for caching the JSON data. To accurately invalidate the cached data, I annotated the function that sets the active dataset to invalidate the cache when a new dataset is activated. This way, up-to-date data is always returned, while ensuring that data is cached as long as possible. This lowered the response time from 4k to 5k ms to 50-100 ms during testing.
Aside from minor bug fixes and improvements, I would like to make a version 2 that covers the whole country, but this will significantly increase the amount of data that the back end needs to process. Caching the data will assist greatly in resolving this problem, but a further solution would be to attempt to determine which region the user is in when they load the app and load that region first making the app usable before loading the data for the rest of the country. In the meantime, I will be moving on to other projects for a while. Hopefully, I will find some time to work on NUTS a bit more!
Messing around with computers and coding since I was 8. Now getting paid to do what I love.