DataPlat

What DataPlat is about

Information systems have evolved into complex data platforms supporting end-to-end data-intensive needs, such as storage, computation, and analysis of data with heterogeneous structures. However, a smart and comprehensive support for data scientists and architects to govern the data through the whole life-cycle is still necessary.

Supporting data management and governance requires the collection of metadata capturing the distinguishing features of the data; this enables advanced functionalities spanning from data research and profiling to provenance control, orchestration of data pipelines, incremental data integration, efficient querying, automated analytics, and homogeneous data access. The challenges begin with metadata management in terms of the modeling effort, storage, complexity of retrieval activities, and effective exploitation. While coping with big-data issues, the enabled functionalities must: (i) handle the heterogeneity of storage and computation engines (including DBMSs supporting multiple data models and cloud storage systems with limited control and predictability), (ii) meet suitability requirements for less-skilled users, and (iii) limit the costs of pay-as-you-go resources.

This workshop calls for innovative solutions --- from researchers and practitioners --- that address the aforementioned challenges. We welcome papers that contribute to the advancement of data platforms in engineering, optimizing, and simplifying the different aspects of data and metadata management and fruition.

Topics

The scope of the workshop includes, but is not limited to the following topics.

Metadata modeling for data platforms
Techniques for metadata discovery and management
Advanced search, exploration, and profiling of data and metadata
Semantic enrichment of metadata
Data governance
Data wrangling
Provenance and data versioning control
Orchestration and optimization of data transformation pipelines
Data integration and querying in multimodel databases, multistores, polystores
Query processing, optimization, and performance
Entity resolution and data fusion
Big data management and querying
Artificial Intelligence solutions for data platforms
AutoML techniques
Cloud computing and architectures
Advanced architectures for data lakes and data platforms
Analysis, design, implementation, and testing of data platforms
Case studies and project experiences

Submission

[NEW 28.03.23] Papers are available here.

[NEW 01.08.23] Submission deadline has been extended to January 22, 2023.

[NEW 22.11.22] Authors of the best papers will be invited to submit an extended version to a Special Issue with Springer's Information Systems Frontiers journal (IF: 5.261).

Submissions should present original results and substantial new work not currently under review or published elsewhere. DataPlat 2023 will follow a single-blind review process to evaluate submissions on the basis of originality, relevance, quality, and technical contribution. The following submissions are accepted:

Regular and short research papers (up to 10 and 5 pages, respectively)
Vision papers (up to 5 pages)
Application papers (up to 5 pages)

Papers must be submitted via Microsoft CMT in PDF.

DataPlat 2023 Submission Site on Microsoft CMT

Accepted papers will be published online at CEUR. Papers should be in 2-column style (including all material) and must be formatted with the same rules as all EDBT Workshop papers using the CEUR-ART style (templates for LaTeX and DOCX are available here and on Overleaf). Please make sure to enable the two column style in the template. All accepted workshop papers will be published in the CEUR-WS series, in a joint volume will all EDBT 2023 workshops.

All accepted papers are expected to be presented at the workshop, and at least one author is required to register.

Keynote speaker

Angela Bonifati

Lyon 1 University & CNRS Liris, France

The Quest for Schemas in Graph Databases

Property graphs are a widespread data model for representing interconnected multi-labeled data enhanced with properties as key/value pairs. These highly expressive graphs are used in a wide range of domains, such as social and transportation networks, biological networks, finance, cybersecurity, logistics and planning, to name a few. Property graphs are the building blocks of future graph ecosystems, in which OLTP and OLAP processes are intertwined with complex advanced processes, such as learning, scientific computing and business intelligence. While property graphs are currently used in a variety of graph databases, a rather fragmented landscape emerges in terms of the supported query and schema languages. In particular, the coverage of schema and constraints is limited if not completely lacking in these systems. In this talk, I will present recent advances in terms of schemas and constraints for property graphs, as part of our work within the LDBC community groups. I will also focus on graph schema discovery and constraint satisfaction following these proposals for property graph schema and constraints. Finally, I will pinpoint future directions of research in this new exciting area of data management.

Workshop program

The schedule is in EEST (UTC+3) - Athens Time

time	title	speaker / authors
08:30 - 09:00	Conference registration
09:00 - 10:30	[Keynote] Angela Bonifati - Shared with DOLAP
10:30 - 11:00	Coffee Break
11:00 - 11:05	Opening of the DataPlat - Comonos Workshop
11:05 - 11:20	MongoDB Data Versioning Performance: local versus Atlas	Ela Pustulka and Lucia de Espona Pernas
11:20 - 11:45	Easy-to-use interfaces for supporting the user in the semantic annotation of web tables	Sara Bonfitto, Paolo Perlasca, and Marco Mesiti
11:45 - 12:10	Toulouse: Learning Join Order Optimization Policies for Rule-based Data Engines	Antonios Karvelas, Alkis Simitsis, Yannis E Foufoulas, and Yannis Ioannidis
12:10 - 12:35	Mining Data Wrangling Workflows for Patterns, Reuse and Optimisation Opportunities	Abdullah Kh Almasaud, Sandra Sampaio, and Pedro Sampaio
12:35 - 12:50	Prediction of user-brand associations based on sentiment analysis	Mariella Bonomo, Simona Ester Rombo, and Filippo Rotolo
12:50 - 14:30	Lunch Break
14:30 - 14:45	Towards a Multi-Model Approach to Support User-Driven Extensibility in Data Warehouses: Agro-ecology Case Study	Sandro Bimonte, Fagnine Alassane Coulibaly, Stefano Rizzi, Sylvie malembic-maher, and Frederic Fabre
14:45 - 15:10	HEALER: A Data Lake Architecture for Healthcare	Carlo Manco, Tommaso Dolci, Fabio Azzalini, Enrico Barbierato, Marco Gribaudo, and Letizia Tanca
15:10 - 15:25	Data migration in column family database evolution using MDE	Pablo Suárez-Otero, Michael Mior, María José Suárez-Cabal and Javier Tuya
15:25 - 15:50	Propagating schema changes to code: An approach based on a unified data model	Alberto Hernández Chillón, Jesus Garcia-Molina, José Ramón Hoyos and María José Ortín Ibáñez
15:50 - 16:15	Effective queries for mega-analysis in cognitive neuroscience	Mateusz Pawlik, Anna Ravenschlag, Monique Denissen, Bianca Löhnert, Nicole Himmelstoß and Florian Hutzler
16:15 - 16:20	Farewell