DataPlat

What DataPlat is about

Since big data have imposed a paradigm change in the way data are stored, managed, and queried, information systems have evolved into complex data platforms or data ecosystems supporting data-intensive storage, computation, and analysis of data with heterogeneous structures. Yet, a smart and comprehensive support for data scientists and architects to govern the data through the whole life-cycle is still lacking.

Data management support in data platforms requires the collection of a wide set of metadata capturing the distinguishing features of the data; this enables advanced functionalities spanning from research and data profiling to provenance control, orchestration of data transformation pipelines, incremental data integration, and efficient querying. The challenges begin with the management of metadata itself in terms of modeling effort, storage, complexity of retrieval activities, and effective exploitation. Besides addressing the Vs of big data, the enabled functionalities must cope with the heterogeneity of storage and computation engines – that include DBMSs supporting multiple data models and cloud storage systems with limited control and predictability – while meeting suitability requirements for less-skilled users.

This workshop calls for researchers and practitioners to propose innovative solutions to address the aforementioned challenges, by welcoming papers that contribute to the advancement of data platforms in optimizing and simplifying the different aspects of data and metadata management and fruition.

Topics

The scope of the workshop includes, but is not limited to the following topics.

Metadata modeling for data platforms
Techniques for metadata discovery and management
Advanced search, exploration, and profiling of data and metadata
Semantic enrichment of metadata
Data governance
Data wrangling
Provenance and data versioning control
Orchestration and optimization of data transformation pipelines
Data integration and querying in multimodel databases, multistores, polystores
Query processing, optimization, and performance
Entity resolution and data fusion
Big data management and querying
Artificial Intelligence solutions for data platforms
AutoML techniques
Cloud computing and architectures
Advanced architectures for data lakes and data platforms
Analysis, design, implementation, and testing of data platforms
Case studies and project experiences

Submission

[NEW 02.03.22] Program published.

[NEW 22.12.21] Submission deadline has been extended to January 20, 2022.

[NEW 10.12.21] Submission deadline has been extended to December 22, 2021.

[NEW 03.12.21] ACM Proceedings Format updated.

[NEW 02.11.21] Authors of the best papers will be invited to submit an extended version to a Special Issue with Elsevier's Future Generation Computer Systems (FGCS) journal (IF: 7.187).

Submissions should present original results and substantial new work not currently under review or published elsewhere. DataPlat 2022 will follow a single-blind review process to evaluate submissions on the basis of originality, relevance, quality, and technical contribution. The following submissions are accepted:

Regular and short research papers (up to 10 and 5 pages, respectively)
Vision papers (up to 5 pages)
Application papers (up to 5 pages)

Papers must be submitted via Easychair, in PDF according to the EDBT Proceedings Format.

DataPlat 2022 Submission Site on EasyChair

Accepted papers will be published online at CEUR. Workshop papers must follow the ACM Proceedings Format, appropriately modified for EDBT/ICDT 2022 workshops (available here). Please follow all instructions available here. DataPlat uses single-blind reviewing, which means that the authors should list their names and affiliations as part of their submission.

All accepted papers are expected to be presented at the workshop, and at least one author is required to register.

Committees

Program Chairs & Organizers

Matteo Francia

DISI - University of Bologna

Enrico Gallinucci

DISI - University of Bologna

Patrick Marcel

LIFAT – Université de Tours

Stefano Rizzi

DISI - University of Bologna

Program Committee

Alberto Abellò - Universitat Politècnica de Catalunya, Spain
Amin Beheshti - Macquarie University, Australia
Sandro Bimonte - INRAE Clermont Ferrand, France
Bogdan Cautis - University of Paris-Sud, France
Jérome Darmont - University of Lion, France
Alin Deutsch - University of California, USA
Young-Koo Lee - Kyung Hee University, South Korea
Esther Pacitti - University of Montpellier 2, France
Franck Ravat - Université Paul Sabatier, France
Duncan Ruiz - Escola Politécnica - PUCRS, Brazil
Riccardo Torlone - Università Roma Tre, Italy
Ming-Chuan Wu - Apple

Keynote speaker

Alberto Abelló

FIB - Universitat Politècnica de Catalunya

Big DataBase Management System

A Big Data system is a tiny fraction of analytical code surrounded by a lot of "plumbing" devoted to manage the generated models and the associated data. Hence, we can consider that plumbing to be mimicking a DBMS, which is indeed a complex system that actually has to serve different purposes and hence provide multiple and independent functionalities. Thus, it can neither be studied nor built monolithically as an atomic unit. Oppositely, there are different software inter-dependent components that interact in different ways to achieve the global purpose. Similarly to DBMS, in a Big Data system, we have to understand among other issues how our system is going to collect data; how these are going to be used; where they are going to be stored; how they are going to be related to the corresponding metadata; if we are going to use any kind of master data, where these will come from and how they will be integrated; how are the data going to be processed; how replicas are going to be managed and their consistency guaranteed; etc. In this talk, I will discuss the difficulties to build such system, paying special attention to how metadata can help storage and processing.

Workshop program

The schedule is in GMT+1 (i.e., BST) - Edinburgh Time

time	title	speaker / authors
09:00 - 09:05	Opening
09:05 - 10:05	Big DataBase Management System (Invited Talk)	Alberto Abelló
10:05 - 10:30	Unidata - A Modern Master Data Management Platform	Sergey Kuznetsov, Alexey Tsyryulnikov, Vlad Kamensky, Ruslan Trachuk, Mikhail Mikhailov, Sergey Murskiy, Dmitrij Koznov and George Chernishev
10:30 - 11:00	Coffee Break
11:00 - 11:25	RTGEN: A Relative Temporal Graph GENerator	Maria Massri, Zoltan Miklos, Philippe Raipin Parvedy and Pierre Meye
11:25 - 11:40	Towards a Holistic Data Preparation Tool	Valerie Restat, Meike Klettke and Uta Störl
11:40 - 11:55	Towards Human-centric AutoML via Logic and Argumentation	Joseph Giovanelli and Giuseppe Pisano
11:55 - 12:10	ECDP: A Big Data Platform for the Smart Monitoring of Local Energy Communities	Luca Gagliardelli, Luca Zecchini, Domenico Beneventano, Giovanni Simonini, Sonia Bergamaschi, Mirko Orsini, Luca Magnotta, Emma Mescoli, Andrea Livaldi, Nicola Gessa, Piero De Sabbata, Gianluca D'Agosta, Fabrizio Paolucci and Fabio Moretti
12:10 - 12:25	Darwin: A Data Platform for Schema Evolution Management and Data Migration	Uta Störl and Meike Klettke
12:25 - 12:30	Farewell