data_infrastructure:iniital_proposal

This is an old revision of the document!


Initial Proposal

Note: this document is still in draft.

This document outlines the proposal for the data infrastructure project, which is spun off from the SCEL weatherbox server subsystem.

The purpose of the data infrastructure is to create a platform that can be used to easily collect and store sensor data from any source. This platform will eventually fully support the efforts of the forecasting teach, which given good access to data will be able to do their own independent research.

Outline

I. Motivation
II. Details
III. Goals
IV. Technical Modules

Motivation

Currently, the core project in the smart campus energy lab involves collecting data.

Issues with the previous project:

  • Documentation was poor
  • Difficult to contribute
  • Availability was not that great
  • Hard to interface with
  • Limited to one data type

Goals

  • Highly available
  • Easy to interface with
  • Easy to contribute to
  • Communicate development well

Specifications

Categories:

  • Maintainance
  • Data Interface
  • Nodes

Misc:

  • Outside Users should be able to easily view nodes publicly
  • Each node deployment should be able to be tracked
  • Lab users should be able to download datasets using any scripting language

Okay this is really hard.

Technical Modules

Here is a block diagram:

sensor_infrastructure_proposal.png_2

  • Client - Primary interface into the data infrastructure - sensors with transport layers such as ZigBee will dump their data to these clients.
  • Messaging Bus/Gateway - Monitors all of the clients and makes sure that they are authorized to send data. Rejects invalid clients.
  • Compute / Data Backend - Responsible for processing packets as they come in and putting them into the database.
  • Client Gateway
  • Queue
  • Client Connector
  • Reverse Proxy/Balancer
  • Gateway Database
  • Gateway
  • Gateway Queue
  • Worker - Processes data and makes sure that they
  • Node Manager - Manages the registration and display of data nodes
  • API - Serves data from the core/mirror database. Used by the node manager
  • Core database - Database that stores all of our data. Currently postgresql.
  • Mirror database - Public database that is RO for public users. Mirrored from the core database.
  • Gateway Queue - Main queue that exists between the gateway and the worker scripts. This makes it possible to upgrade the gateway without losing any data in the network.

System Validation

To make sure that upgrades are completed to the system, we need to have proper validation tools and processes in place.

We can use tools such as docker and vagrant to help us test.

More on this later.

Education

A large motivation behind this is being able to educate and expose students to a project with the engineering process. We should think about how to bring students up to speed quickly enough - students who have almost no experience.

  • Code Review Process
  • Alumni Contributors

Problems

  • How can contributions be small enough for students? Can we create our system so it's easier to have those small contributions?

Possible Names

  • Data Platform
  • Sensor Platform
  • Data Sensor Platform
  • data_infrastructure/iniital_proposal.1466738901.txt.gz
  • Last modified: 2021/09/19 21:59
  • (external edit)