Note: this document is still in draft.

This document outlines the proposal for the data infrastructure project, which is spun off from the SCEL weatherbox server subsystem.

The purpose of the data infrastructure is to create a platform that can be used to easily collect and store sensor data from any source. This platform will eventually fully support the efforts of the forecasting teach, which given good access to data will be able to do their own independent research.

https://www.draw.io/#G0Bxowpw1NF2d3YmRaRjBHWTJTckE

I. Motivation
II. Details
III. Goals
IV. Technical Modules

Issues with the previous project

Documentation was poor
Difficult to contribute
Availability was not that great
Hard to interface with
Limited to one data type

Motivation and Summary

It is currently very difficult to reliably gather time series data from embedded sensor devices. This project aims to provide the software infrastructure to reliably collect data, add new sensors, extend new sensor types and analyze such data.

High Level Categories:

Availability
Interfaces
Libraries
Graphing
Contributions
Documentation
Extend-ability
Logging
Verification
Validation

Misc:

Outside Users should be able to easily view nodes publicly
Each node deployment should be able to be tracked
Lab users should be able to download datasets using any scripting language
We should be able to validate the data that is collected
We should be able to scan if a sensor is down or not

Okay this is really hard.

Here is a block diagram:

Client - Primary interface into the data infrastructure - sensors with transport layers such as ZigBee will dump their data to these clients.
Messaging Bus/Gateway - Monitors all of the clients and makes sure that they are authorized to send data. Rejects invalid clients.
Data Backend - Contains all of the logic necessary to store and process data. Contains a publicly accessible API that can be used to build client applications.
Compute Backend - Able to run large compute jobs such as graphing or analysis scripts. Serves dataset results and graphs through a filesystem or the API.

Client Gateway
Queue
Client Connector

Reverse Proxy/Balancer
Gateway Database
Gateway
Gateway Queue

Worker - Processes data and makes sure that they
Node Manager - Manages the registration and display of data nodes
API - Serves data from the core/mirror database. Used by the node manager
Core database - Database that stores all of our data. Currently postgresql.
Mirror database - Public database that is RO for public users. Mirrored from the core database.
Gateway Queue - Main queue that exists between the gateway and the worker scripts. This makes it possible to upgrade the gateway without losing any data in the network.

To make sure that upgrades are completed to the system, we need to have proper validation tools and processes in place.

We can use tools such as docker and vagrant to help us test.

Initial Proposal

Outline

Previous Work

Motivation and Summary

Specifications

Technical Modules

High Level Blocks

Client

Messaging Bus

Compute/Data Backend

System Validation

Education

Possible Names

Authors

Smart Campus Energy Lab Wiki