top of page
Search

Sample Airflow DAG - solar data feed

  • timandrews1
  • Sep 8, 2023
  • 2 min read

It has been a considerable period since my last journal entry. Following my previous update, I embarked on a new professional endeavor, which has kept me extensively engaged.


Nevertheless, I recently dedicated some time to the refinement of a series of Python scripts that I had originally configured on a Windows server. These scripts play a pivotal role in monitoring and forecasting the production of my home solar panel system. Rather than maintaining these scripts as independent entities controlled by the Windows Task Scheduler, I made the deliberate decision to centralize their functionality within an Apache Airflow Directed Acyclic Graph (DAG).


My approach to this project revolves around the integration of three distinct Application Programming Interfaces (APIs): the SolarEdge API, which provides near-real-time data on my solar panel's production metrics, and two separate weather APIs. Given the potentially costly nature of API calls, I designed the DAG with a modular architecture in mind. The fundamental workflow of this DAG can be summarized as follows:

  1. Establish a connection to the designated API and temporarily store the retrieved data in a structured .csv file.

  2. Parse the data from the .csv file and transfer it to intermediate "raw," "staging," or "transient" database tables, depending on the specific target database. In my case, I am interfacing with both my local Postgres installation and my Google BigQuery project.

  3. Execute the necessary transformations and merge the information stored in the aforementioned raw tables into the final integrated destination tables.


This methodological approach affords me the flexibility to restart individual tasks without concerns about accumulating excessive API requests, which could potentially result in overages or associated charges.


For the sake of simplicity in this particular DAG, I opted to employ SQLAlchemy for the execution of SQL statements in my databases, rather than integrating with dbt. For a more detailed exploration of this project, including code and additional insights, please refer to my GitHub repository..


This refined system not only streamlines the operation of my solar panel monitoring and forecasting processes but also exemplifies a more efficient and maintainable workflow within the context of data integration and analysis.


 
 
 

Comments


Post: Blog2_Post

Follow

  • Facebook
  • Twitter
  • LinkedIn

©2022 by Tim's BigQuery blog. Proudly created with Wix.com

bottom of page