Data^3 – Data ingestion and processing pipeline from fog to cloud



Sensors and edge devices in general, produce continuous and intense data streams that need to be ingested and processed in the cloud to provide comprehensive results and insights for the user. Most commonly, each device produces a different data schema with raw values that need to be ingested by a single platform and pre-process them in order to bring them to an expected format. The pre-processing step can also be used to find patterns, outliers and other early insights on the data and the devices themselves.
To tackle the heterogeneity of the data sources as well as provide closer-tothesource processing capabilities, we combine and build upon a number of open-source distributed and centralized frameworks, creating a unique pipeline starting from edge/fog devices and ending up to the cloud nodes. In our solution, fog nodes (e.g., raspberry pies) play the role of a low-cost processing unit close to the sources, with the responsibility of ingesting the data streams from the local edge devices, pre-process them and finally send them to the cloud. Afterwards, the cloud nodes ingest the aforementioned pre-processed streaming data, as well as batch data from other sources, continue their processing with higher resource capabilities and store them along with any results in a distributed warehouse.

The pipeline conforms to the kappa architecture parafigm and consists of several open-source frameworks that are modified and tailored to each use case. MQTT and Apache Kafka are used for data transfer and queues while Apache Flink is available for cloud processing. Main-memory databases, such as Redis and Apache Ignite, as well as consistent distributed warehouse frameworks, such as Apache Druid, are used for intermediate and persistent storage during all stages of the data processing pipeline. All of these systems are scalable and fault tolerant by design which makes the whole pipeline a fault-resistant and workload independent solution.
The pipeline has been implemented and tested in several different use cases, including an urban traffic monitoring and prediction tool, a smart agriculture scenario and a health-care system improving quality of life on cancer survivors.


  • Ioannis Mavroudopoulos
  • Styliani Kyrama
  • Ilias Dimitriadis
  • Nikodimos Nikolaidis
  • Anna-Valentini Michailidou
  • Theodoros Toliopoulos
  • Anastasios Gounaris
  • Athina Vakali (Datalab director)