DataPlat

News

[2024-05-08] The special issue has been published

[2024-04-07] The program is out

[2024-03-04] The list of accepted papers is out

[2024-01-31] The keynote speaker is Felix Naumann

[2023-12-22] Submissions extended to Jan 22, 2024

What DataPlat is about

Information systems have evolved into complex data platforms supporting end-to-end data-intensive needs, such as storage, computation, and analysis of data with heterogeneous structures. However, a smart and comprehensive support for data scientists and architects to govern the data through the whole life-cycle is still necessary.

Supporting data management and governance requires the collection of metadata capturing the distinguishing features of the data; this enables advanced functionalities spanning from data research and profiling to provenance control, orchestration of data pipelines, incremental data integration, efficient querying, automated analytics, and homogeneous data access. The challenges begin with metadata management in terms of the modeling effort, storage, complexity of retrieval activities, and effective exploitation. While coping with big-data issues, the enabled functionalities must: (i) handle the heterogeneity of storage and computation engines (including DBMSs supporting multiple data models and cloud storage systems with limited control and predictability), (ii) meet suitability requirements for less-skilled users, and (iii) limit the costs of pay-as-you-go resources.

This workshop calls for innovative solutions --- from researchers and practitioners --- that address the aforementioned challenges. We welcome papers that contribute to the advancement of data platforms in engineering, optimizing, and simplifying the different aspects of data and metadata management and fruition.

Topics

The scope of the workshop includes, but is not limited to the following topics.

Metadata modeling for data platforms
Techniques for metadata discovery and management
Advanced search, exploration, and profiling of data and metadata
Semantic enrichment of metadata
Data governance
Data wrangling
Provenance and data versioning control
Orchestration and optimization of data transformation pipelines
Data integration and querying in multimodel databases, multistores, polystores
Query processing, optimization, and performance
Entity resolution and data fusion
Big data management and querying
Artificial Intelligence solutions for data platforms
AutoML techniques
Cloud computing and architectures
Advanced architectures for data lakes and data platforms
Analysis, design, implementation, and testing of data platforms
Case studies and project experiences

Submission

Submissions should present original results and substantial new work not currently under review or published elsewhere. DataPlat will follow a single-blind review process to evaluate submissions on the basis of originality, relevance, quality, and technical contribution. The following submissions are accepted:

Regular and research papers (up to 10)
Short research, application, and vision papers (up to 5 pages)

References and figures are included. Papers must be submitted via Microsoft CTM. Papers should be formatted with the same rules as ICDE conference papers. Manuscripts must be prepared in accordance with the IEEE format. Only electronic submissions in PDF format will be considered. A paper submitted to DataPlat cannot be under review for any other conference or journal during the entire time it is considered for DataPlat. All accepted papers will appear in the conference proceedings.

All accepted papers are expected to be presented at the workshop, and at least one author is required to register.

Following the ICDE guidelines, we are using IEEE Conference Publishing Services (CPS) to collect camera-ready papers and copyright forms.

**Please do NOT upload your camera-ready papers or copyright forms on the CMT site.**
Submit your camera-ready papers and copyright forms at the link sent via email, which also contains more detailed instructions on preparing them (including instructions for validation with PDF eXpress): When submitting the camera-ready at ieeecps.org, use the paper id provided by CMT.
Since the CR is not uploaded into CMT and we cannot keep track of the uploads in CPS, please also send us the CR version via email to m.francia@unibo.it
Do ensure that your paper adheres to the page limits set by the workshop, as described in the call-for-papers. Failure to do so will result in the exclusion of the paper from the proceedings.
Should you have any questions about submitting the camera-ready paper and/or the copyright release form, please contact the chair of the workshop and the publication chair: Odysseas Papapetrou (o.papapetrou@tue.nl).

Committees

Program Chairs & Organizers

Matteo Francia

DISI - University of Bologna

Enrico Gallinucci

DISI - University of Bologna

Patrick Marcel

University of Orléans

Stefanie Scherzinger

University of Passau

Program Committee

Alexandre Chanson (Université de Tours)
Christoph Quix (Fraunhofer FIT)
Duncan Ruiz (Pontifícia Universidade Católica do Rio Grande do Sul)
Franck Ravat (IRIT)
Matteo Brucato (Microsoft Research)
Nicolas Labroche (University of Tours)
Panos Vassiliadis (University of Ioannina)
Riccardo Torlone (Roma Tre University)
Sana Sellami (Aix Marseille University)
Sandra Sampaio (University of Manchester)
Sandro Bimonte (INRAE)
Sergi Nadal (Universitat Politècnica de Catalunya)
Sergio Lifschitz (PUC-Rio)
Shaleen Deep (Microsoft Gray Systems Lab)
Theodoros Toliopoulos (Aristotle University of Thessaloniki)
Thomas Bodner (Hasso Plattner Institute)
Thorsten Papenbrock (Philipps University of Marburg)
Zezhou Huang (Columbia University)

Keynote speaker

Felix Naumann

Hasso Plattner Institute, Germany

Prof. Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he completed his PhD thesis in the area of data quality at Humboldt University of Berlin in 2000. In 2001 and 2002 he worked at the IBM Almaden Research Center on data integration topics. From 2003 - 2006 he was assistant professor for information integration, again at the Humboldt-University of Berlin. Since 2006 he holds the chair for information systems at the Hasso Plattner Institute (HPI) at the University of Potsdam in Germany. He has been visiting researcher at QCRI, AT&T Research, IBM Research, and SAP. His research interests include data profiling, data cleansing, and data integration with over 200 scientific publications. Next to numerous PC memberships for international conferences, he has organized several conferences in various roles, including VLDB 2021 as PC co-chair, and he was trustee of the VLDB Endowment.

Data Profiling for Data Integration

Data profiling comprises a broad range of methods to efficiently analyze a given dataset. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and various data dependencies. The talk highlights the key insights behind recent state of the art methods and presents various use cases in the areas of data cleaning and data integration: violations of dependencies point to errors in the data; key discovery identifies the core entities of a data source; inclusion dependencies are candidates to join up multiple sources; and in general, data profiling results can be used to organize data lakes.

time	title	speaker / authors
8:30 - 9:00	Conference registration
9:00 - 9:05	Opening
9:05 - 9:25	Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly	Alexander Taylor, Yicong Huang, Junheng Hao, Xinyuan Lin, Xiusi Chen, Wei Wang, Chen Li
9:25 - 9:40	Towards an End-to-End Data Quality Optimizer	Valerie Restat, Meike Klettke, Uta Störl
9:40 - 10:00	CASA: Classification-based Adjusted Slot Admission Control for Query Processing Engines	Timothy Zeyl, Harshwin Venugopal, Calvin Sun, Paul Larson
10:00 - 10:30	Coffee Break
10:30 - 11:20	[Keynote] Data Profiling for Data Integration	Felix Naumann
11:20 - 11:40	Design and Development of a Provenance Capture Platform for Data Science	Riccardo Torlone, Paolo Missier, Luca Gregori, Alessandro Wood, Matthew Stidolph
11:40 - 12:00	Collaboration Management for Federated Learning	Marius Schlegel, Daniel Scheliga, Kai-Uwe Sattler, Marco Seeland, Patrick Mäder
12:00	Farewell