1st International Workshop on Data Platform Design, Management, and Optimization
Co-located with EDBT/ICDT 2022
Special Issue invitation to Elsevier FGCS for best papers

What DataPlat is about

Since big data have imposed a paradigm change in the way data are stored, managed, and queried, information systems have evolved into complex data platforms or data ecosystems supporting data-intensive storage, computation, and analysis of data with heterogeneous structures. Yet, a smart and comprehensive support for data scientists and architects to govern the data through the whole life-cycle is still lacking.

Data management support in data platforms requires the collection of a wide set of metadata capturing the distinguishing features of the data; this enables advanced functionalities spanning from research and data profiling to provenance control, orchestration of data transformation pipelines, incremental data integration, and efficient querying. The challenges begin with the management of metadata itself in terms of modeling effort, storage, complexity of retrieval activities, and effective exploitation. Besides addressing the Vs of big data, the enabled functionalities must cope with the heterogeneity of storage and computation engines – that include DBMSs supporting multiple data models and cloud storage systems with limited control and predictability – while meeting suitability requirements for less-skilled users.

This workshop calls for researchers and practitioners to propose innovative solutions to address the aforementioned challenges, by welcoming papers that contribute to the advancement of data platforms in optimizing and simplifying the different aspects of data and metadata management and fruition.

Topics

The scope of the workshop includes, but is not limited to the following topics.

  • Metadata modeling for data platforms
  • Techniques for metadata discovery and management
  • Advanced search, exploration, and profiling of data and metadata
  • Semantic enrichment of metadata
  • Data governance
  • Data wrangling
  • Provenance and data versioning control
  • Orchestration and optimization of data transformation pipelines
  • Data integration and querying in multimodel databases, multistores, polystores
  • Query processing, optimization, and performance
  • Entity resolution and data fusion
  • Big data management and querying
  • Artificial Intelligence solutions for data platforms
  • AutoML techniques
  • Cloud computing and architectures
  • Advanced architectures for data lakes and data platforms
  • Analysis, design, implementation, and testing of data platforms
  • Case studies and project experiences

Submission

[NEW 02.03.22] Program published.

[NEW 22.12.21] Submission deadline has been extended to January 20, 2022.

[NEW 10.12.21] Submission deadline has been extended to December 22, 2021.

[NEW 03.12.21] ACM Proceedings Format updated.

[NEW 02.11.21] Authors of the best papers will be invited to submit an extended version to a Special Issue with Elsevier's Future Generation Computer Systems (FGCS) journal (IF: 7.187).

Submissions should present original results and substantial new work not currently under review or published elsewhere. DataPlat 2022 will follow a single-blind review process to evaluate submissions on the basis of originality, relevance, quality, and technical contribution. The following submissions are accepted:

  • Regular and short research papers (up to 10 and 5 pages, respectively)
  • Vision papers (up to 5 pages)
  • Application papers (up to 5 pages)

Papers must be submitted via Easychair, in PDF according to the EDBT Proceedings Format.

DataPlat 2022 Submission Site on EasyChair

Accepted papers will be published online at CEUR. Workshop papers must follow the ACM Proceedings Format, appropriately modified for EDBT/ICDT 2022 workshops (available here). Please follow all instructions available here. DataPlat uses single-blind reviewing, which means that the authors should list their names and affiliations as part of their submission.

All accepted papers are expected to be presented at the workshop, and at least one author is required to register.

Important dates

Paper submission: December 12, 2021 January 20, 2022

Authors notification: January 17, 2022 February 18, 2022

Camera ready: February 25, 2022

Workshop date: March 29, 2022

Committees

Program Chairs & Organizers

Matteo Francia

DISI - University of Bologna

Enrico Gallinucci

DISI - University of Bologna

Patrick Marcel

LIFAT – Université de Tours

Stefano Rizzi

DISI - University of Bologna

Program Committee

  • Alberto Abellò - Universitat Politècnica de Catalunya, Spain
  • Amin Beheshti - Macquarie University, Australia
  • Sandro Bimonte - INRAE Clermont Ferrand, France
  • Bogdan Cautis - University of Paris-Sud, France
  • Jérome Darmont - University of Lion, France
  • Alin Deutsch - University of California, USA
  • Young-Koo Lee - Kyung Hee University, South Korea
  • Esther Pacitti - University of Montpellier 2, France
  • Franck Ravat - Université Paul Sabatier, France
  • Duncan Ruiz - Escola Politécnica - PUCRS, Brazil
  • Riccardo Torlone - Università Roma Tre, Italy
  • Ming-Chuan Wu - Apple

Keynote speaker

Alberto Abelló

FIB - Universitat Politècnica de Catalunya

Big DataBase Management System

A Big Data system is a tiny fraction of analytical code surrounded by a lot of "plumbing" devoted to manage the generated models and the associated data. Hence, we can consider that plumbing to be mimicking a DBMS, which is indeed a complex system that actually has to serve different purposes and hence provide multiple and independent functionalities. Thus, it can neither be studied nor built monolithically as an atomic unit. Oppositely, there are different software inter-dependent components that interact in different ways to achieve the global purpose. Similarly to DBMS, in a Big Data system, we have to understand among other issues how our system is going to collect data; how these are going to be used; where they are going to be stored; how they are going to be related to the corresponding metadata; if we are going to use any kind of master data, where these will come from and how they will be integrated; how are the data going to be processed; how replicas are going to be managed and their consistency guaranteed; etc. In this talk, I will discuss the difficulties to build such system, paying special attention to how metadata can help storage and processing.

Workshop program

The schedule is in GMT+1 (i.e., BST) - Edinburgh Time

time title speaker / authors
09:00 - 09:05 Opening
09:05 - 10:05 Big DataBase Management System (Invited Talk) Alberto Abelló
10:05 - 10:30 Unidata - A Modern Master Data Management Platform Sergey Kuznetsov, Alexey Tsyryulnikov, Vlad Kamensky, Ruslan Trachuk, Mikhail Mikhailov, Sergey Murskiy, Dmitrij Koznov and George Chernishev
10:30 - 11:00 Coffee Break
11:00 - 11:25 RTGEN: A Relative Temporal Graph GENerator Maria Massri, Zoltan Miklos, Philippe Raipin Parvedy and Pierre Meye
11:25 - 11:40 Towards a Holistic Data Preparation Tool Valerie Restat, Meike Klettke and Uta Störl
11:40 - 11:55 Towards Human-centric AutoML via Logic and Argumentation Joseph Giovanelli and Giuseppe Pisano
11:55 - 12:10 ECDP: A Big Data Platform for the Smart Monitoring of Local Energy Communities Luca Gagliardelli, Luca Zecchini, Domenico Beneventano, Giovanni Simonini, Sonia Bergamaschi, Mirko Orsini, Luca Magnotta, Emma Mescoli, Andrea Livaldi, Nicola Gessa, Piero De Sabbata, Gianluca D'Agosta, Fabrizio Paolucci and Fabio Moretti
12:10 - 12:25 Darwin: A Data Platform for Schema Evolution Management and Data Migration Uta Störl and Meike Klettke
12:25 - 12:30 Farewell