The Specialties of a Machine Learning Workflow in Weather Domain

When we model the weather situation, there are some differences from the common workflow. In machine learning workflow, here is the general processes: To collect historical data and validation data, create the architecture of the model, train the model, evaluation and improvement, and deploy to an operational application. Due to the specialty of the weather domain, to prepare and develop the model training has differences from the general one: complex data sources, multi-scale system interaction, and evolutional atmosphere system.

The common model development workflow (In Datatron)

1. The weather system is lively: Time related data

When preparing data to consider what features should be included for weather prediction, time-related data can have the influence. Since the weather system is evolutional, time-related data, such as time series aggregation and time lag data, can be considered as the influence features to the machine learning model as well. For machine learning algorithms, a LSTM (Long short-term memory) model is a good tool to keep different time information in the model.

2. There is something we can’t cover: Parameterization and statistical modeling

Computer model has it limitation in spacial and temporal resolution. Because weather observations may be scattered, and there is always smaller spacial and shorter temporal scale phenomenon exists. For these smaller “off-scale” phenomenon, meteorologists usually use “parameterization” to solve the problem. The common parameterizations include cumulus, radiation, and boundary layer parameterizations. In addition, they can use statistical modeling, such as Model Output Statistics (MOS) technique, to forecast the localized detail information. The statistical modeling methodology can use regression model based on historical records to correct the localization effects from a broader scale model.

MOS modeling from NOAA website (https://vlab.noaa.gov/web/mdl/mos)

3. There is something new: Hybrid modeling for evolutional weather

When we develop a weather model, we need to consider both historical record and on-going status as a hybrid model. Even if meteorologists summarize weather patterns to report the weather status efficiently, we can find new situations later (This sounds like we always find new type of virus, doesn’t it?). Unlike common image recognization technology, the image groups may have fixed patterns. Because atmosphere is evolutional and the weather situation is changing, so a weather model needs to keep improving to adapt the new status. Thus, a hybrid model should be considered, not only the historical data but also the on-going status, to be included to train the model.

When we deal with weather prediction in machine learning solution, we need to consider the specialties in weather domain. The changing weather situation requires us to include the timely data as features into the model. The continuous weather information but limited observation data lets us to use parameterization and statistical technique to present the weather status. Because of the weather evolution, we need to contain both historical and on-going weather information to the prediction model. By the data preprocessing and modeling architecture, the model can have more complete information in the network.

Reference:

Leave a comment