forecasting:start

Forecasting

Updated: September/19/2020

Forecasting focuses on creating a statistical model from data and then using it to predict future trends based on the test data. It is considered a field of data science/statistics. There is also a large craze concerning machine learning recently–the purpose of this team is to learn to adapt statistical and machine learning analysis techniques to weather-related data.


Machine learning and data science are two extremely popular fields right now. It has many different applications, from analyzing genetic data and correlating it to biological processes to recognizing different types of objects against a database and making use of layered neural networks for image recognition. Due to the scale of the internet and the march of progress concerning technology, there is a large amount of information that can be used as a basis for more complex and different models that are only possible with large amounts of data.

Part of SCEL's mission is to be able to predict the weather in order to optimize the use of electricity and perhaps even the generation of energy. Renewable energy sources are considered unreliable due to the weather itself–wind energy depends depends on the wind direction and speed; solar energy generation depends on the sunlight intensity and cloud patterns. This is where we come in: our forthcoming efforts are considered as one of the possible keypoints of SCEL's mission. Our primary source of data is to be the weather boxes that are developed as a part of SCEL, such as Apple, Cranberry, and Guava.


*Fall 2020 Edit*

The SCEL Forecasting team was started in the Fall 2015 semester, headed by Jeremy Garcia under Dr. Anthony Kuh. In the prior months, a few people from the University of California, Santa Cruz, came down to give a presentation about the new iPython (and recently renamed then, Jupyter) platform. They expressed that the University of California system had invested quite heavily into the platform as an accessible, flexible, powerful, and convenient platform to teach analysis alongside the modern, Python programming language. Motivated to demo out this new platform for analysis, Dr. Kuh wanted the team to explore this platform and, with it, begin to learn more about machine learning, as it was a hot topic along with “big data” at the time.

Due to Dr Kuh's acceptance into the National Science Foundation (NSF) in 2017, the Forecasting team had been placed on the back-burner and was largely left untouched for several years. In 2020, the forecasting team was rebooted with the recruitment of graduate students Josh Renzo Claudio and Keolakanealohanokeakua Macloves.


*Fall 2020 Edit*

We will be focusing on reviewing various analytical models and techniques in order to give us a solid grounding in machine learning. To do this, we will make use of a popular programming language used extensively in data science; Python. For our development environment of choice, we are going with Google Colaboratory due to it's convenient integration with Google Drive as well as free access to Google's compute hardware.

Although previous SCEL teams have already made progress in machine learning, the documentation is somewhat scarce. For this reason, we will largely be starting over from the beginning in Fall 2020.

Our objectives for this semester:

  • Learn Python and Google Colab.
  • Learn and review various statistical techniques and models as applied in Python.
  • Start on a new and well documented machine learning framework for SCEL.

*Fall 2020 Update*

This semester, we have decided to register for an online machine learning course on Udemy. The course we have chosen is Machine Learning A-Z™: Hands-On Python & R In Data Science by Kirill Eremenko. At the time of writing, it is the highest rated course on Udemy for Machine Learning. The price for the course is listed at $100+, but there are frequent sales on Udemy where the courses may be purchased at only 10% of the original cost. Just be patient.

The class is mostly designed for computer science students and therefore you should not expect to learn much about the mathematical proofs behind machine learning algorithms.

Previous SCEL machine learning teams have used Jupyter as their development environment of choice. The Udemy course mentioned above uses Google Colab, and so we switch over to that as well.

General iPython Notes
Linear Regression
Cross Validation
How to improve Predictions by doing nothing Large sets of Data
How we implemented moving averages and variancesmoving_average.pdf

Authors

Contributing authors:

atasato bsundberg cobatake jclaudio jeremygg jobatake kluong tbesas travisnt

Created by kluong on 2015/07/29 04:39.

  • forecasting/start.txt
  • Last modified: 2020/09/19 22:15
  • by jclaudio