to the manual segmented images. While GANSeg and U-Net achieved comparable Dice performance as human experts on the labeled Heidelbergtest dataset, only GANSeg achieved comparable Dice with the best performance for the GCL+IPL layer (90%, 95% CI: 68%-96%) and the worst performance for intraretinal fluid (58%, 95% CI: 18%-89%), which was statistically similar to human graders (79%, 95% CI 43%-94%). GANSeg
of the ANN, an independent longitudinal dataset of 40 patients, with data from 239 MRI scans, was collected at Heidelberg University Hospital in parallel with the training dataset (Heidelbergtest dataset), and 2034 MRI scans from 532 patients at 34 institutions collected between Oct 26, 2011, and Dec 3, 2015, in the EORTC-26101 study were of sufficient quality to be included in the EORTC-26101 test dataset. The ANN yielded excellent performance for accurate detection and segmentation of CE tumours and NE volumes in both longitudinal test datasets (median DICE coefficient for CE tumours 0·89 [95% CI 0·86-0·90], and for NEs 0·93 [0·92-0·94] in the Heidelbergtest dataset; CE tumours 0·91 [0·90-0·92], NEs 0·93 [0·93-0·94] in the EORTC-26101 test dataset). Time to progression from quantitative ANN