LTH-image

Event-Based Information Fusion for the Self-Adaptive Cloud

Researchers: Johan RuuskanenAnton Cervin

Funding: WASP

Short summary: Successful self-adaptive resource provisioning in the cloud relies on accurate tracking of workload variations and timely detection of changes in the infrastructure. The project will develop novel, event-based estimation techniques for information fusion in cloud server systems using on Monte Carlo-based inference methods.

Project Description

Self-Adaptive Cloud Systems

The idea of the self-adaptive cloud is to handle workload variations and structural changes by regulating the resources provided to the cloud service. The goal is to provide just the right amount of computing resources at all times, so that the Self-Adaptive Cloudcost is minimized while still maintaining good performance. This can be viewed as a classical feedback control loop (see the figure to the right), where the cloud service is the plant under control and the adaptation mechanism is the controller. Workload variations are viewed as disturbances that should be countered by adjustments in the resource provisioning. Performance can be measured for instance by average or Xth percentile response times, throughput, utilization, and power usage. Resource provisioning can be handled by, e.g., downscaling or upscaling the number of compute units allocated to the cloud service.

Event-Based Estimation and Control

The control loop depicted above looks fairly conventional, but if we zoom in, some interesting features can be noted. The arrows that connect the different blocks in the diagram do not represent continuous signals but rather discrete events. Measurement information is available only when something happens in the system, for instance when a new customer arrives or when a request is completed. Likewise, the resources are typically quantized and can only be set at fixed levels. To deal with these special features, new control techniques need to be developed that can handle events-based rather than continuous signals.

In recent years, new theory for event-based control has started to appear. The main idea is to act only when the magnitude of the control error is larger than a certain threshold, thereby saving resources and reducing tear and wear. In this project we will focus on event-based information fusion. Similar to a Kalman filter, the general idea is to estimate states and parameters of the cloud system by using a model of the system together with various measurements. Some of the key challenges of estimation in cloud systems are:

  • All primary measurements are event-based.
  • The amount of events – observable as well as unobservable – is massive.
  • Events of different types and on very different time scales need to be fused.

Information Fusion Using Particle Filters

The principle of event-based information fusion is illustrated in the figure below. Known inputs to the cloud service are for instance the commands from the Adaptation Mechanism, while the unknown inputs represent for instance customer arrivals that cannot be measured. Combining a-priori model knowledge with measurements, the Information Fusion system needs to take all types of events (and also the absense of events) into account when forming its estimates of key parameters and states.

The event-based information fusion problem is challenging because of the non-linear behavior of the cloud service and because new information is only available at discrete events. One promising approach to tackle the problem is to use particle filters, which is a family of Monte Carlo-based inference methods that have gained much attention in the last decades. Using particle filters for cloud systems is however not straightforward. New dynamical system models need to be developed, and the filters need to be adapted to handle event-based rather than time-based measurements. Another research challenge is how to weigh together the information from different types of events in an optimal way.

Experimental Evaluation

The novel information fusion schemes developed in this project will be evaluated both in simulations and in a server test-bed at the department. Starting from single-server systems, modeled as M/M/1 queueing systems, we will gradually scale the models and experiments to include more servers and concurrent cloud services. In the experiments we may also include various self-adaptive mechanisms that are being developed in parallel research projects at the department.