Initial Proposal
Note: this document is still in draft.
This document outlines the proposal for the data infrastructure project, which is spun off from the SCEL weatherbox server subsystem.
The purpose of the data infrastructure is to create a platform that can be used to easily collect and store sensor data from any source. This platform will eventually fully support the efforts of the forecasting teach, which given good access to data will be able to do their own independent research.
https://www.draw.io/#G0Bxowpw1NF2d3YmRaRjBHWTJTckE
Outline
I. Motivation
II. Details
III. Goals
IV. Technical Modules
Previous Work
Issues with the previous project
Motivation and Summary
It is currently very difficult to reliably gather time series data from embedded sensor devices.
This project aims to provide the software infrastructure to reliably collect data, add new
sensors, extend new sensor types and analyze such data.
Specifications
High Level Categories:
Availability
Interfaces
Libraries
Graphing
Contributions
Documentation
Extend-ability
Logging
Verification
Validation
Misc:
Outside Users should be able to easily view nodes publicly
Each node deployment should be able to be tracked
Lab users should be able to download datasets using any scripting language
We should be able to validate the data that is collected
We should be able to scan if a sensor is down or not
Okay this is really hard.
Technical Modules
Here is a block diagram:
High Level Blocks
Client - Primary interface into the data infrastructure - sensors with transport layers such as ZigBee will dump their data to these clients.
Messaging Bus/Gateway - Monitors all of the clients and makes sure that they are authorized to send data. Rejects invalid clients.
Data Backend - Contains all of the logic necessary to store and process data. Contains a publicly accessible
API that can be used to build client applications.
Compute Backend - Able to run large compute jobs such as graphing or analysis scripts. Serves dataset results and graphs through a filesystem or the
API.
Client
Client Gateway
Queue
Client Connector
Messaging Bus
Reverse Proxy/Balancer
Gateway Database
Gateway
Gateway Queue
Compute/Data Backend
Worker - Processes data and makes sure that they
Node Manager - Manages the registration and display of data nodes
API - Serves data from the core/mirror database. Used by the node manager
Core database - Database that stores all of our data. Currently postgresql.
Mirror database - Public database that is RO for public users. Mirrored from the core database.
Gateway Queue - Main queue that exists between the gateway and the worker scripts. This makes it possible to upgrade the gateway without losing any data in the network.
System Validation
To make sure that upgrades are completed to the system, we need to have proper validation tools and processes in place.
We can use tools such as docker and vagrant to help us test.
More on this later.
Education
A large motivation behind this is being able to educate and expose students to a project with the engineering process. We should think about how to bring students up to speed quickly enough - students who have almost no experience.
Code Review Process
Alumni Contributors
Problems
Possible Names
Data Platform
Sensor Platform
Data Sensor Platform
Authors
Contributing authors:
kluong
Created by kluong on 2016/06/20 00:05.