====== Feature Idea - Staging Environment  ======

===== Background =====

A typical pattern that is used when running systems is to have multiple environments. There's the production environment, which handles the "real" data or traffic, and usually a staging system that consists of "test" or fake data that has no impact on what happens in the real system.

This staging environment is useful because it lets you test changes to your software or configuration in an environment that is pretty close to what it looks like in production, giving your confidence that your changes won't break anything in the real world.


===== Motivation =====

It would be great if we had one for our sensor network - that way we could test changes to any part of the network, and see if they would work in production. This would be useful to test gateway or database changes for example.


  * would be useful for testing
  * good educational experience being able to walk through the whole process 
  * one outcome from setting up the staging environment - writing a document that tells the story of what we had to do to set it up (useful for others to read later on)

In the past, folks have sorta set something up like this - the "test" gateway that's running on a raspberry pi. That's a great test platform, and allows the firmware and software team to test things, but the only difference is it's not something that's running all the time. A true staging environment should be running all the time, and should be free for anyone to use for testing whenever they need to.

===== Implementation =====

Here is a starting list of tasks - this can be added to later on:

  * Go through tasks, put it onto a google doc for collaboration / documentation
  * Get a new xbee and usb-serial adapter for use as a coordinator
  * Configure the new xbee with the proper parameters and a new PANID for the staging env
  * Plug in the new xbee to scelserver-1
  * Create a new postgres database for staging
  * Create new database credentials for staging
  * Create the actual tables required for the gateway in the staging database
  * Create a new gateway instance on scelserver-1
  * Create a new systemd service for the staging gateway
  * Setup a stubbed weatherbox node that transmits fake samples to the new staging xbee
  * Create a grafana dashboard to check on the status of the staging network

===== Technical Details =====

==== Systemd Intro ====

We currently use systemd to run the production gateway. Systemd is primarily used by us to manage the gateway process, and make sure that it starts up when the system reboots and well if the process crashes somehow.

More details:

  * https://en.wikipedia.org/wiki/Systemd


**Simple Example**

Here's a simple example to provide a bit more context.

Let's say I had a server sitting somewhere that I wanted to run a website on. I can use python's SimpleHTTPServer to do that. This is a pretty simple, one command webserver that will serve up files in a directory so they're accessible in a browser. 

So let's create a directory and run our webserver:

<code>
kluong@kserver:~$ mkdir test-webserver
kluong@kserver:~$ cd test-webserver/
kluong@kserver:~/test-webserver$ ls
kluong@kserver:~/test-webserver$ echo "hello world" > index.html
kluong@kserver:~/test-webserver$ python -m SimpleHTTPServer 7000
Serving HTTP on 0.0.0.0 port 7000 ...
</code>

In a browser, if I end up going to localhost:7000, I'll see "hello, world". You can also use the `curl` command in a separate terminal window:

<code>
kluong@kserver:~/test-webserver$ curl localhost:7000
hello world
</code>

Now this is great, but if you close the terminal window or reboot the machine, the website will be unavailable since the program isn't running anymore. You need something to manage this program, and make sure that it gets run every time the machine is started, and without having someone having to open up a terminal window. 

This is where systemd comes in.

A systemd config file could look something like this: 

<code>
[Unit]
Description=My Website

[Service]
Type=simple
User=kluong
WorkingDirectory=/home/kluong/webserver
ExecStart=/usr/bin/python -m SimpleHTTPServer 7000
Restart=on-failure

[Install]
WantedBy=multi-user.target
</code>

TODO: finish this section


**Details from the production gateway**

You can inspect the current systemd file by going to scelserver-1:

You can run `systemctl status` to check where the unit file is defined:

<code>
kluong@scelserver-1:~$ systemctl status xbee-gateway
● xbee-gateway.service - Scel XBee Gateway
   Loaded: loaded (/etc/systemd/system/xbee-gateway.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2021-04-20 14:26:43 HST; 5 months 17 days ago
 Main PID: 661 (run_production.)
      CPU: 16h 58min 11.722s
   CGroup: /system.slice/xbee-gateway.service
           ├─661 /bin/bash /home/scel/control-tower/gateway/run_production.sh auto
           └─725 python gateway_server.py auto
</code> 

Looks like it's defined at /etc/systemd/system/xbee-gateway.service. Let's take a look at it:

<code>
[Unit]
Description=Scel XBee Gateway
After=network.target
After=xbee-pty-bridge.service

[Service]
Type=simple
# Another Type option: forking
User=scel
WorkingDirectory=/home/scel/control-tower/gateway
ExecStart=/home/scel/control-tower/gateway/run_production.sh auto
Restart=on-failure
# Other Restart options: or always, on-abort, etc

[Install]
WantedBy=multi-user.target
</code>

It's possible to re-use most of this existing systemd configuration to create the staging staging configuration, the directories and names just have to be changed properly.


==== Database Changes ====

We'll be sharing the same postgres instance as our production system - that way we don't have to have a completely new instance of postgres. What we'll want, however is a separate database from what the production data is using, so we can isolate things a bit. 

Let's take a look at what the production instance is using. 

Turns out, our gateway is hardcoded to use certain values:
https://github.com/scel-hawaii/control-tower/blob/master/gateway/src/decoder.py#L127-L128

<code>
con = psycopg2.connect("dbname='control_tower' user='control_tower' password='' host='localhost'")
</code>

Looks like it's configured to connect to localhost and use the 'control_tower' database through the 'control_tower' user. We'll have to change this so it's not hardcoded if we want to use a different database for the staging instance. But we can revisit that in another change later on.

So we'll need a new database and a new user - how can we do that?

=== Creating the database and user ===


Here are some docs from the postgres website:

  * https://www.postgresql.org/docs/9.0/sql-createuser.html
  * https://www.postgresql.org/docs/9.0/sql-createdatabase.html

It looks like we can create a new user within postgres by using:

<code>
CREATE USER control_tower_staging;
</code>

And it looks like we can create a new database using:

<code>
CREATE DATABASE control_tower_staging OWNER control_tower_staging;
</code>

To be able to do this, you'll need to be be logged in as a user with the `superuser` permission, which the `postgres` user typically has. To do this on the server, you'll just need to switch to the postgres user and run the `psql` command to get a terminal where you can run these commands.

<code>
sudo su postgres
psql
</code>

**Note** - be sure to take care when using sudo! This gives you access to do a lot of things to the existing system, including removing system files that you normally wouldn't be able to.

To check the users, use the ''%%\du%%'' command in postgres:

<code>
postgres=# \du
                                   List of roles
       Role name       |                   Attributes                   | Member of
-----------------------+------------------------------------------------+-----------
 control_tower         |                                                | {}
 control_tower_ro      |                                                | {}
 control_tower_staging |                                                | {}
 kluong                |                                                | {}
 postgres              | Superuser, Create role, Create DB, Replication | {}
</code>

To check the databases, use the ''%%\l%%'' command in postgres:

<code>
postgres-#
postgres-# \l
                                         List of databases
     Name      |     Owner     | Encoding |   Collate   |    Ctype    |      Access privileges
---------------+---------------+----------+-------------+-------------+-----------------------------
 bears         | control_tower | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 control_tower | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres               +
               |               |          |             |             | postgres=CTc/postgres      +
               |               |          |             |             | control_tower=CTc/postgres +
               |               |          |             |             | control_tower_ro=c/postgres
 kluong        | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres               +
               |               |          |             |             | postgres=CTc/postgres      +
               |               |          |             |             | kluong=CTc/postgres
 postgres      | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0     | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres                +
               |               |          |             |             | postgres=CTc/postgres
 template1     | postgres      | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres                +
               |               |          |             |             | postgres=CTc/postgres
</code>


=== Populating the tables ===

Okay, now the database and user are setup, but we'll also need to setup the tables for the database. There's a schema file in the control_tower repo under the db/ folder that will let you do this.

Make sure the current directory you're in is the control_tower repo. Here's the command you can run as the postgres user:

<code>
psql -U control_tower_staging -d control_tower_staging -f db/multi-table.sql
</code>

After this is done, this is all the database configuration you'll need to do. There are some additional gateway changes described below.

You can check that the tables were created properly by running the following command:

<code>
psql -U control_tower_staging -d control_tower_staging -c "SELECT * FROM pg_catalog.pg_tables WHERE schemaname='public'"
</code>


==== Gateway Change - ability to pass in a database URI ====

I mentioned earlier we'll have to change the way the gateway works. This is the way the current code connects to the database:

<code>
con = psycopg2.connect("dbname='control_tower' user='control_tower' password='' host='localhost'")
</code>

https://github.com/scel-hawaii/control-tower/blob/master/gateway/src/decoder.py#L127-L128

What we want to be able to do is specify the database configuration in a separate manner - either using a configuration file, flag or an environment variable. The environment variable is probably the easiest here, so let's go with that. Python programs can read in variables that are set by the environment: https://able.bio/rhett/how-to-set-and-get-environment-variables-in-python--274rgt5

So we can modify the code to do something like this:

<code>
db_uri = os.environ["GATEWAY_DB_URI"]
con = psycopg2.connect(db_uri)
</code>  

We can also use a URI instead of passing in the arguments in the way we did previously. A shell script that calls the gateway would look something like this for staging:

<code>
#!/bin/bash

# setup the python env
source ./env/bin/activate 

export GATEWAY_DB_URI="postgresql://control_tower_staging@localhost/control_tower_staging"
python gateway.py /dev/serial/by-id/usb-FTDI_FT231X_USB_UART_DN01DS3L-if00-port0
</code>

You'd need to see some RX frames on the tty device to test this properly though, since the gateway doesn't reach out to the database unless it sees a message to decode. 

To get this to work in production, you would have to modify the run_production.sh script, and also update the repository production.


==== Gateway - testing changes ====
 
Options for testing

  * Deploy to production - see what happens
  * Deploy to staging - see what happens
  * Setup an personal network (on your laptop, or maybe on the server)
  * Connect the gateway to a "fake" xbee virtually, using a pty and another script
  * Mock out the serial device completely in python

It's not necessarily practical to have hardware to test with all of the time and often hardware might not be available anyways. Luckily, with software we can work around this. 

**Note: development for the gateway should generally be done in a linux-based environment.** The gateway code was primarily designed only for linux-based environments and may not run locally.

You can make a "fake" xbee network using the following python script:

<code>
import os
import errno
import time

def symlink_force(target, link_name):
    try:
        os.symlink(target, link_name)
    except OSError as e:
        if e.errno == errno.EEXIST:
            os.remove(link_name)
            os.symlink(target, link_name)
        else:
            raise e

def valid_packets():
    packets = {}
    packets['heartbeat'] = "\x7e\x00\x16\x90\x00\x7d\x33\xa2\x00\x40\xe6\x4b\x5e\x03\xfd\x01\x00\x00\xff\xff\xf0\xfa\x23\x00\x2b\x02\xb2"
    packets['apple'] = "\x7e\x00\x22\x90\x00\x7d\x33\xa2\x00\x40\x9f\x27\xa7\x29\x6c\x01\x01\x00\xff\xff\x80\x6f\x69\x3d\x06\x0f\x71\x7d\x33\x5a\x8a\x01\x00\x76\x01\x22\x00\x6e\x09\x55"
    packets['cranberry'] = "\x7e\x00\x22\x90\x00\x7d\x33\xa2\x00\x41\x25\xe5\x88\x0c\x83\x01\x02\x00\xff\xff\x7c\xf3\x05\x00\xba\x0f\x5c\x08\x05\x00\x20\x73\x3b\x00\xdd\x8b\x01\x00\x7a"
    packets['dragonfruit'] = "\x7e\x00\x24\x90\x00\x7d\x33\xa2\x00\x40\xe6\x72\x7d\x5e\x30\x18\x01\x03\x00\xff\xff\x30\xc8\x07\x00\x6b\x0d\xf4\x00\x06\x00\x00\x00\xb6\x72\x37\x00\xfe\x8b\x01\x00\x00"
    packets['snapdragon'] = "\x7e\x00\x22\x90\x00\x7d\x33\xa2\x00\x40\xa3\x53\x7d\x5e\x20\x9c\x01\x04\x00\xff\xff\x12\xe4\x49\x00\xca\x0d\x44\x0c\x2c\x31\x01\x00\x2f\x01\x34\x00\x64\x00\xbb"

    return packets

master_fd, slave_fd = os.openpty()
symlink_force(os.ttyname(slave_fd), '/tmp/fakexbee')


packets = valid_packets()
while True:
    for key in packets:
        os.write(master_fd, bytes(packets[key]))
        time.sleep(1)

</code>