Skip to main content

ESAPCA: enabling the analysis of extremely large data sets by scalable and hardware-accelerated PCA and DMD

Running

Running

Organisational Unit
05 March 2024

Duration: 18 months

Objective

Singular Value Decomposition (SVD) is indispensable and ubiquitous in data science and engineering: either it is part of important tools (PCA, POD, DMD etc.) or it is used as pre-/post-processing by dimensionality reduction. However, in the context of very large data sets---as they nowadays arise in many disciplines inside and outside the space sector---this becomes computationally challenging as runtime and memory footprint usually grow superlinearly as a function of data size. Specific use-cases at ESA include ---but are not limited to---, long-term thermospheric density data, earth observation SAR and optical imaging data, and in situ measurements of powder bed solidification. The goal of this project is to develop a parallel, GPU-accelerated implementation of SVD and related techniques, optimized for scalability on high-performance computing (HPC) systems, and with a focus on interoperability within the Python (NumPy/SciPy/scikit-learn) data science ecosystem. Hereby, we will exploit the existing infrastructure for multi-node array computing within the Heat research software library (Refs. [1-3]). With this project, we want to fill the gap between the ease of use of the Python NumPy/SciPy/scikit-learn ecosystem, and the need for highly-efficient, hardware-accelerated matrix decomposition in space science and engineering.

Contract number
4000144045
OSIP Idea Id
I-2023-00566
Related OSIP Campaign
Open Channel
Main application area
Generic for multiple space applications
Budget
174996€
ESAPCA: enabling the analysis of extremely large data sets by scalable and hardware-accelerated PCA and DMD