A free database ticker for everyone ?
In my two previous posts, I shared my pursuit of improving my model's post reward predictions by using the HBD reward converted into Hive.
You can read about it here: @cocaaladioxine/building-a-hive-hbd-ticker
and @cocaaladioxine/using-a-hbd-hive-ticker.
It was not the right path to follow, as this approach yielded a worse accuracy on the predictions. I was disappointed and wondered how to repurpose this work for something else. I did spend quite a lot of time on this test and it seemed to me to be a waste not to use it.
I'm always looking for new things to try and learn, especially when it can help me in my day job. I use BigQuery every day at work, but our data pipelines are using a software called "Kestra". It does a nice job but is not yet an industry standard.
As I never had to implement a data pipeline using Cloud functions on GCP, I decided to leverage my familiarity with Google Cloud Platform (GCP) and BigQuery to develop a cloud function that retrieves the HBD/Hive rate and then offer it as a freely available resource.
For the techies, I will make a complete article going over every major step in the development. But here's the global idea.
Architecture
Google GCP is one of the 3 main cloud platforms, with Amazon AWS and Microsoft Azure. Having already some knowledge of GCP, and an active project, it was the most obvious solution.
The code is a simple Python function deployed as a "Cloud Function", eliminating the need for a Docker or a Virtual machine. A Cloud Scheduler job is then used to push a message to a pub/sub topic, which in turn triggers the function every hour.
Accessing the Data
You will need a Google account for the following steps:
- Go to BigQuery
- enjoy your trial or use the free tier
- enter this basic select statement in the console
SELECT * FROM `personal-projects-sri.open_hive.tf_hive_hbd_history`;
Table description :
The history starts on 2023-05-14 10:41:16 and the table provides hourly entries indicating the current HBD/Hive ratio.
The table currently has two fields:
- ticker_timestamp: the timestamp of the data extraction.
- hbd_per_hive: the HBD/Hive ratio
The table is partitioned by the ticker_timestamp field, and it's a monthly partition.
GCP Pricing :
In BigQuery, you are billed based on the amount of data you use when querying. There are different ways of reducing the query size, one of them being partitioning.
Filtering on the partition field narrows down the amount of data scanned.
For example :
SELECT * FROM `personal-projects-sri.open_hive.tf_hive_hbd_history`
where ticker_timestamp >= timestamp(current_date);
Considering the very low amount of data, I opted for a monthly partitioning. This means that, when you filter on a specific day, you'll pay for the whole month.
With the GCP free tier, you can query up to 1 TB per month without incurring any cost. More details can be found here:https://cloud.google.com/free/docs/free-cloud-features#bigquery
Conclusion:
Feel free to use the data for any of your projects. I'm open to any suggestions for improvement, so don't hesitate to share your ideas with me. If you have any use case, please tell me, I'll be excited to see how you make use of it!
The thumbnail picture was created with Bing Image Creator and reworked with Canva.
Prompt: A desk with a computer screen showing SQL code, cartoon style.