AWS · Pharma R&D

R&D data lake accelerated molecule screening 4× for a global pharma.

We built a governed R&D data lake on AWS with elastic HPC — collapsing molecule screening cycles by 4× while keeping every dataset lineage-tracked.

[Screening speed]

4×

[HPC capacity]

elastic

[Lineage tracked]

100%

[ The Problem ]

Where the business was stuck.

R&D workflows blocked by fragmented data across labs, CROs and legacy HPC.

[ Key Challenges ]

Our approach

STEP 01

Governed data lake on S3 + Lake Formation

STEP 02

Elastic HPC on ParallelCluster with spot

STEP 03

Lineage graph via OpenLineage

[ Solution highlights ]

[ Tech stack ]

AWS

ParallelCluster

Lake Formation

OpenLineage

Our scientists spend time on science, not on chasing data.

— Head of R&D Informatics

Keep exploring