High-Performance Data Analytics with Python
from
Monday, 15 June 2026 (11:00)
to
Tuesday, 16 June 2026 (15:00)
Monday, 15 June 2026
11:00
Introduction to HPC & the Athena Supercomputer
-
Leszek Grzanka
Introduction to HPC & the Athena Supercomputer
Leszek Grzanka
11:00 - 11:30
A brief overview of High-Performance Computing (HPC) concepts, logging into the Athena supercomputer at ACK Cyfronet AGH, and the basics of accessing computational resources.
11:30
Environment Setup & Python Memory Model
-
Leszek Grzanka
Environment Setup & Python Memory Model
Leszek Grzanka
11:30 - 12:15
Initializing the JupyterHub environment, verifying dependencies, and exploring the performance differences between standard Python lists and NumPy arrays
12:15
Coffee break
Coffee break
12:15 - 12:30
12:30
Introduction to Data Analysis with Pandas
-
Leszek Grzanka
Introduction to Data Analysis with Pandas
Leszek Grzanka
12:30 - 13:45
Hands-on introduction to Pandas using real-world weather datasets. Covers basic data manipulation, filtering, and preparation for parallel processing.
13:45
Coffee break
Coffee break
13:45 - 14:00
14:00
Scaling Up: First Steps with Dask
-
Leszek Grzanka
Scaling Up: First Steps with Dask
Leszek Grzanka
14:00 - 15:00
Transitioning from Pandas to Dask DataFrames. An introduction to distributed data processing and executing our first operations on larger-than-memory datasets.
Tuesday, 16 June 2026
11:00
Dask Performance & Lazy Evaluation
-
Leszek Grzanka
Dask Performance & Lazy Evaluation
Leszek Grzanka
11:00 - 11:45
A condensed overview of how Dask works under the hood. Understanding task graphs, lazy evaluation mechanics, and benchmarking Pandas versus Dask performance.
11:45
Coffee Break
Coffee Break
11:45 - 12:00
12:00
Processing Scientific Data at Scale
-
Leszek Grzanka
Processing Scientific Data at Scale
Leszek Grzanka
12:00 - 13:15
Analyzing real high-frequency physics datasets (Parquet format) from proton therapy beam measurements using single-node Dask clusters.
13:15
Coffee break
Coffee break
13:15 - 13:30
13:30
Multi-node Computing & Final Challenge
-
Leszek Grzanka
Multi-node Computing & Final Challenge
Leszek Grzanka
13:30 - 15:00
Utilizing dask-jobqueue and SLURMCluster to scale out computations across multiple Athena nodes. Plenty of hands-on time to tackle the open-ended multi-dataset challenge.