Data Collection in a Nutshell

One of the more important aspects of machine learning is the abundance of data. For our SolarCast-ML project, we implemented a custom data collection setup based on AWS and S3. Essentially, we used a custom post request from our weather condition collection device to first post to Node-RED, then use the blocks in Node-RED to upload the data to S3.

Data collected from the outdoor array, ground truth circled
Reporting setup, the node is Node-RED

A word of caution however, using the timestamps as the filenames into S3 are a pretty standard way of going about the upload, but the download back to a Windows machine will break since there are colons in the filename. We addressed this issue by first downloading to an Ubuntu computer, adjusting the filenames, then sending to the Windows machine for final processing.

Changing it up and testing our data path earlier would have caught the issue and mitigated the hassle.


Leave a Reply

Your email address will not be published. Required fields are marked *