PhD thesis proposal (CIFRE LaTIM - AQUILAB)

Harmonization methods for radiomics

Mathieu Hatt and David Gibon

Recruitment

We are looking for a motivated student to conduct a PhD under the CIFRE convention between the Laboratory of Medical Information Processing (LaTIM, INSERM UMR 1101) and the society AQUILAB.

The PhD candidate will be recruited by AQUILAB and will work full time for 3 years within the LaTIM. Both meetings via visioconference and physical stays within the office of AQUILAB will be scheduled regularly. The candidate must hold a master and/or an engineering degree in computer sciences, machine learning or statistics. Additional experience/expertise with medical imaging and clinical applications will be considered a plus but is not a requisite. The candidate must be proficient in English (both written and spoken).

A CV and a letter of motivation should be sent to both hatt@univ-brest.fr and david.gibon@aquilab.com

Scientific context of the proposal

Radiomics in oncology and the challenge of standardization & harmonization

Cancer is a major worldwide health issue with 13 million cases and 8 million deaths reported in 2008. Projections for 2030 are 22 million cases and 13 million deaths [1]. Nowadays, multimodal medical imaging such as computed tomography (CT), positron emission tomography (PET/CT) and magnetic resonance imaging (MRI) is crucial in oncology, with numerous applications including early diagnosis, staging, treatment decision and planning, monitoring, and patient follow-up [2]. A patient’s treatment and follow-up strategy could be optimized based on improved diagnosis (“virtual biopsy”) and predictive models able to identify patients at risk of future local failures and recurrence at diagnosis, before initiating treatment. In addition, the need to integrate data from several sources (clinical, imaging, dosimetry, genetics, toxicity) in order to improve predictive ability has been emphasized upon [3].

Radiomics denotes the high throughput extraction of numerous quantitative metrics (including shape, intensity or textural features) of images with the goal of providing a full macroscopic phenotyping of tissues (tumors, organs, etc.) that could reflect at least in part the underlying pathophysiological processes (such as necrosis, proliferation, etc.), down to the genomic level [4]. The field has seen an exponential growth in the last 5 years (<10 publications in 2014, ~450 in 2018 and already the same number in 2019). This is mostly due to its potential to provide a quantitative signature of tumors’ characteristics that cannot necessarily be appreciated with the naked eye, even the trained ones [5]. With the quick and overwhelming development of deep learning in all fields of science including medical imaging [6], radiomics is evolving quite rapidly, techniques based on deep neural networks being used either to automate or improve some parts of the radiomics workflow, or to replace it entirely [7–9]. Recent attempts to exploit CNN for radiomics and predictive modeling in oncology and radiotherapy [10–12] were met with limited success given the still numerous algorithmic and study design challenges that need to be addressed within this context. A particular issue concerns the limited amount of available data for training in medical imaging, compared to other applications (e.g., the millions of images used in ImageNet [13]).

Radiomics has shown promising results in identifying tumor subtypes, aggressiveness as well as in predicting response to therapy and outcome of patients in several cancers [9], however, most of these results have been obtained small, retrospective and monocentric cohorts. Importantly, recently published recommendations and guidelines [9,14–16] regarding the use of automated segmentation, standardized features, and proper machine learning scheme for statistical analysis, have not been followed in the majority of these studies except the most recent ones. As a result, even very famous studies in radiomics have been criticized for being potentially false discovery [17], or even erroneous interpretation, for example the radiomics signature being a surrogate of tumor volume [18]

In the following, we will therefore use the term “radiomics” to denote any methodological workflow (relying or deep learning or not) aiming at extracting clinically-relevant information from images (whatever modality or scale).

Radiomics is currently implemented in the research community as a rather complex sequential workflow, which suffers from several limitations that hamper its potential transfer to the clinical routine including the lack of i) standardization, ii) automation, iii) harmonization and iv) the “black box” effect [9,14].

Methods relying on deep neural networks could help in solving most of these limitations (e.g., fully automatic detection and segmentation of tumors in the images instead of the most often semi-automatic approach followed in most studies) [8,9].

Figure 1 below illustrates the radiomics workflow as well as how it can be implemented following the standard machine learning or the more recent deep learning pipelines.

Figure 1: the radiomics workflow and its implementation following machine or deep learning pipelines.

On the one hand, standardization was identified early on as a major limitation preventing radiomics to enter clinical practice, because of the lack of comparability of the results. No meta-analysis could be carried out, because each research group relied on different methodological workflows, software, nomenclature and implementation choices, and did not provide sufficient details for their work to be reproduced [14]. These issues have been addressed by the Imaging Biomarker Standardization Initiative (IBSI) [15,19] and the radiomics ontology[1] for the last two years.

On the other hand, it has been shown for PET [20–22], CT [23,24] and MRI [25,26] that most radiomic features exhibit moderate to high sensitivity to variability in scanner models, acquisition protocols and reconstruction settings, which constitutes the biggest challenge for multicentric studies [27]. The lack of harmonization in scanner models, reconstruction algorithms and acquisition protocols, leads to high inter clinical site variability in images and resulting features. This is the current clinical reality and it will not change. This is why currently proposed radiomics and/or deep learning models are limited in terms of validation using external datasets [30].

Objectives

Our long term goal is to achieve societal impact by improving patients management. This will be achieved thanks to more robust and accurate predictive models that will help identify patients at risk before initiating treatment. In order for these tools to be exploited in the clinical routine a high level of proof is necessary, which in turn requires larger scale, multicentric (ideally prospective) studies regarding the use of radiomics and/or deep learning techniques in patients managements relying on multimodal medical images, which are currently lacking. The objectives of this PhD are thus to develop harmonization techniques in both image and feature domains in order to improve, facilitate or even render feasible otherwise impossible radiomic analyses of large, multicentric, heterogeneous cohorts in all types of multimodal imaging and cancer applications.

Methodology

Standardization and harmonization

Standardization can be defined as a concept whereby agreement of results is achieved by establishing traceability to higher order reference materials and / or measurement procedures. Harmonization is defined as the process of making agreement in order to produce a consistent interpretation where no reference measurement procedure exist.

In the following, we use the term standardization to denote a process to achieve common and standard practice, nomenclature, mathematical definitions and implementation in the overall methodology workflow of radiomics. It may also denote the similar process in achieving comparable acquisition protocols and reconstruction settings in the generation of medical multimodal images, such as computed tomography (CT), positron emission tomography (PET) or magnetic resonance imaging (MRI). In the present PhD we will rely on existing guidelines and standards such as the Imaging Biomarker Standardization Initiative (IBSI) [15,19], in order to ensure the highest level of standardization of our developments and to increase the likelihood of the reproducibility of our results.

On the other hand, we will use the term harmonization to denote the process by which we make multimodal medical images and/or features extracted from these images, comparable and suitable for pooling, irrespectively of where and how they were produced.

Three main approaches can be considered: i) harmonizing images before features extraction, ii) harmonizing extracted features, iii) combination of i) and ii), as well as transfer learning.

Harmonizing images before features extraction

In order to harmonize images of a given modality but from different clinical centers (acquired and reconstructed using different scanner models/generation, acquisition protocols and/or reconstruction algorithms and settings), we will rely on generative adversarial networks (GANs) to transform images so they are made more similar to each other while preserving their respective informative content, which will be the main challenge.

We will develop a GAN-based framework in which multicentric, heterogeneous images are translated to match the properties of a standard dataset, such as a template reference image, or alternatively, one set of images chosen as a reference (in the absence of an appropriate standard). The first challenge is to determine the relevant properties within images (local or global metrics, texture, edges, contrast, signal-to-noise ratio, etc.) that should be reproduced. The second challenge is to ensure the ability of the framework to harmonize images without losing their clinically-relevant informative content, which will be one of the most important criteria for its evaluation. Although numerous studies have shown the use of GANs to synthesize images, such as generating a CT from an MRI for MR-based radiotherapy treatment planning, these techniques have not as yet been extensively exploited for the purpose of multicentric image harmonization. A very recent example concerns the reduction of variance due to the use of different kernels in CT reconstruction by relying on a CNN [31].

Harmonizing extracted features

In the features (image-derived variables) space, numerous statistical approaches can be applied, such as normalization [32] or batch effect compensation [33]. In radiomics the method ComBat, initially developed for genomics batch correction through Bayesian estimates [33], has been used in order to carry out multicentric studies [34]. It was chosen because it had been shown to outperform other similar statistical approaches before [35]. Figure 2 below illustrates the benefit of ComBat for multicentric validation of radiomics models.

Figure 2: Kaplan-Meier curves for loco-regional control in locally advanced cervical cancer obtained using the FDG PET + ADC map radiomics model in the testing multicentric cohort with and without features harmonization using the standard ComBat approach [34].

However, ComBat suffers from a number of limitations: a minimal number of annotated samples per batch are required, the batch-corrected variables may lose their absolute values, hence their clinical meaning because all features are moved to an arbitrary average reference, and the harmonization is cumbersome to use for newly acquired patients or if an additional center is added to the database (e.g., the harmonization using the entire database has to be re-run entirely each time). We will improve these methods robustness (e.g., add Monte Carlo estimation for small samples) and flexibility (the ability to choose a reference amongst the available batches), as well as the ability to learn transforms so they can be applied to newly added data. The combination of the batch-correction methods with unsupervised clustering will also be investigated to deal with data presenting very high heterogeneity and very small number of samples per batch.

Combination and transfer learning

It might be beneficial and complementary to combine image-based and feature based harmonization methodologies for improving results of multicentric radiomics studies. This will involve evaluating the potential added benefit of both previously developed approaches in improving the results. The goal of this task will be to answer whether the first or the second approach (or the combination of both) are the most efficient, taking into account not only the absolute improvement observed in the results, but also the computing time and efforts required to implement each approach in practice. This will be crucial, especially to facilitate the transfer of the developed methods to the clinical practice through industrial implementation.

Multiparametric models trained using standard machine learning methods or deep neural networks cannot be directly applied to external datasets with different properties. This task will investigate the use of transfer learning to solve the issue, i.e., fine-tuning pre-trained models (whether they are based on deep networks or more “shallow” modeling approaches such as random forests or support vector machines) to allow them to perform better in newly unseen data with important differences in images and/or data.

Collaborative context

Brest – LaTIM, INSERM UMR 1101

The PhD supervisor is Mathieu Hatt from the team ACTION (therapeutic action guided by multimodal imaging in oncology) led by Dimitris Visvikis, in the Laboratory of medical information processing, LaTIM (INSERM UMR 1101). M. Hatt is in charge of the group “multiparametric modeling for therapy optimization” within the team ACTION. Its main field of expertise is the development of image processing and analysis methods, especially dedicated to positron emission tomography (PET), such as automatic segmentation, partial volume effects correction or filtering. On the topic of automatic PET image segmentation, the team contributed to the report of the international taskgroup 211 of the AAPM (American Association of Physicist in Medicine) [36,37] and organized the first MICCAI challenge [38]. During the last three years, the group has also developed its expertise around machine (deep) learning methods and extended its expertise to CT and MR imaging. Regarding radiomics, the team is amongst the pioneers with a first publication in 2011 [39]. Since then, the group has published more than 30 papers related to radiomics in PET, CT and MRI, from methodological developments to more clinical applied studies. The reviews, editorials and invited perspectives [7–9,14,27] also indicate the level of recognition of the team on the topic.

Most of the proposed developments will require extensive computing power in order to process datasets, train and validate models (especially deep learning ones). The LaTIM is managing a high performance computing platform (PLACIS, http://placis.univ-brest.fr/english) which is a hybrid cluster with 800 CPU cores and 50 GPUs dedicated to calculations, with a 150 TB storage facility. Access to this platform will be granted to the PhD student.

AQUILAB

Aquilab is a French company created in 2000, based on a technological transfer of methodological developments by research and clinical teams in Lille. The society has been led by David Gibon since its creation. David Gibon has a background in computer sciences and dedicated 10 years of his career to research on radiotherapy and exploitation of medical imaging in therapeutic action before founding Aquilab in 2000.

Aquilab developed software solutions for quality control of medical imaging and radiotherapy hardware. Its ARTISCAN solution is installed in more than 350 centers in the world and the company is leader in France equipping more than 80% of oncology centers. The company has also developed the ARTIVIEW solution for preparing and evaluating radiotherapy treatment plans. For a few years, this expertise has been associated to a web platform (Share Place) in order to manage databases in imaging and radiotherapy multicentric trials.

Relationship between LaTIM and AQUILAB

The team ACTION has already established a collaborative research effort with Aquilab. First, some methods and code (PET image segmentation, radiomics) is under industrial transfer within the software solution of Aquilab through the SATT Ouest valorisation. Second, a Labcom (MALICE, machine learning against cancer) associating ACTION and Aquilab is currently under submission for funding to the ANR. Finally, ACTION and Aquilab are also partners in a large project on data sharing and analysis in pediatric cancer under submission to INCa.

References

1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68:394–424.

2. Zafra M, Ayala F, Gonzalez-Billalabeitia E, Vicente E, Gonzalez-Cabezas P, Garcia T, et al. Impact of whole-body 18F-FDG PET on diagnostic and therapeutic management of Medical Oncology patients. Eur J Cancer. 2008;44:1678–83.

3. Jaffray DA, Das S, Jacobs PM, Jeraj R, Lambin P. How Advances in Imaging Will Affect Precision Radiation Oncology. Int J Radiat Oncol Biol Phys. 2018;101:292–8.

4. Segal E, Sirlin CB, Ooi C, Adler AS, Gollub J, Chen X, et al. Decoding global gene expression programs in liver cancer by noninvasive imaging. Nat Biotechnol. 2007;25:675–80.

5. Aerts H. Radiomics: there is more than meets the eye in medical imaging. SPIE Med Imaging 2016 Comput-Aided Diagn [Internet]. 2016 [cited 2016 Sep 21]. p. 97850O-97850O – 1. Available from: http://dx.doi.org/10.1117/12.2214251

6. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88.

7. Visvikis D, Cheze Le Rest C, Jaouen V, Hatt M. Artificial intelligence, machine (deep) learning and radio(geno)mics: definitions and nuclear medicine imaging applications. Eur J Nucl Med Mol Imaging. 2019;

8. Hatt M, Parmar C, Qi J, Naqa IE. Machine (Deep) Learning Methods for Image Processing and Radiomics. IEEE Trans Radiat Plasma Med Sci. 2019;3:104–8.

9. Hatt M, Le Rest CC, Tixier F, Badic B, Schick U, Visvikis D. Radiomics: Data Are Also Images. J Nucl Med Off Publ Soc Nucl Med. 2019;60:38S-44S.

10. Antropova N, Huynh BQ, Giger ML. A deep feature fusion methodology for breast cancer diagnosis demonstrated on three imaging modality datasets. Med Phys. 2017;44:5162–71.

11. Bibault J-E, Giraud P, Housset M, Durdux C, Taieb J, Berger A, et al. Deep Learning and Radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep. 2018;8:12611.

12. Diamant A, Chatterjee A, Vallières M, Shenouda G, Seuntjens J. Deep learning in head & neck cancer outcome prediction. Sci Rep. 2019;9:2764.

13. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis. 2015;115:211–52.

14. Vallières M, Zwanenburg A, Badic B, Cheze Le Rest C, Visvikis D, Hatt M. Responsible Radiomics Research for Faster Clinical Translation. J Nucl Med Off Publ Soc Nucl Med. 2018;59:189–93.

15. Zwanenburg A, Leger S, Vallières M, Löck S, Initiative for the IBS. Image biomarker standardisation initiative. ArXiv161207003 Cs [Internet]. 2016 [cited 2017 Oct 18]; Available from: http://arxiv.org/abs/1612.07003

16. Zwanenburg A. Radiomics in nuclear medicine: robustness, reproducibility, standardization, and how to avoid data analysis traps and replication crisis. Eur J Nucl Med Mol Imaging. 2019;

17. Chalkidou A, O’Doherty MJ, Marsden PK. False Discovery Rates in PET and CT Studies with Texture Features: A Systematic Review. PloS One. 2015;10:e0124165.

18. Welch ML, McIntosh C, Haibe-Kains B, Milosevic MF, Wee L, Dekker A, et al. Vulnerabilities of radiomic signature development: The need for safeguards. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2019;130:2–9.

19. Hatt M, Vallieres M, Visvikis D, Zwanenburg A. IBSI: an international community radiomics standardization initiative. J Nucl Med. 2018;59:287–287.

20. Galavis PE, Hollensen C, Jallow N, Paliwal B, Jeraj R. Variability of textural features in FDG PET images due to different acquisition modes and reconstruction parameters. Acta Oncol. 2010;49:1012–6.

21. Yan J, Chu-Shern JL, Loi HY, Khor LK, Sinha AK, Quek ST, et al. Impact of Image Reconstruction Settings on Texture Features in 18F-FDG PET. J Nucl Med. 2015;56:1667–73.

22. Pfaehler E, Beukinga RJ, de Jong JR, Slart RHJA, Slump CH, Dierckx RAJO, et al. Repeatability of 18 F-FDG PET radiomic features: A phantom study to explore sensitivity to image reconstruction settings, noise, and delineation method. Med Phys. 2019;46:665–78.

23. Mackin D, Fave X, Zhang L, Fried D, Yang J, Taylor B, et al. Measuring Computed Tomography Scanner Variability of Radiomics Features. Invest Radiol. 2015;50:757–65.

24. Berenguer R, Pastor-Juan MDR, Canales-Vázquez J, Castro-García M, Villas MV, Mansilla Legorburo F, et al. Radiomics of CT Features May Be Nonreproducible and Redundant: Influence of CT Acquisition Parameters. Radiology. 2018;288:407–15.

25. Yang F, Dogan N, Stoyanova R, Ford JC. Evaluation of radiomic texture feature error due to MRI acquisition and reconstruction: A simulation study utilizing ground truth. Phys Medica PM Int J Devoted Appl Phys Med Biol Off J Ital Assoc Biomed Phys AIFB. 2018;50:26–36.

26. Um H, Tixier F, Bermudez D, Deasy JO, Young RJ, Veeraraghavan H. Impact of image preprocessing on the scanner dependence of multi-parametric MRI radiomic features and covariate shift in multi-institutional glioblastoma datasets. Phys Med Biol. 2019;64:165011.

27. Hatt M, Lucia F, Schick U, Visvikis D. Multicentric validation of radiomics findings: challenges and opportunities. EBioMedicine. 2019;

28. Fiset S, Welch ML, Weiss J, Pintilie M, Conway JL, Milosevic M, et al. Repeatability and reproducibility of MRI-based radiomic features in cervical cancer. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2019;135:107–14.

29. Traverso A, Kazmierski M, Shi Z, Kalendralis P, Welch M, Nissen HD, et al. Stability of radiomic features of apparent diffusion coefficient (ADC) maps for locally advanced rectal cancer in response to image pre-processing. Phys Medica PM Int J Devoted Appl Phys Med Biol Off J Ital Assoc Biomed Phys AIFB. 2019;61:44–51.

30. Zwanenburg A, Löck S. Why validation of prognostic models matters? Radiother Oncol. 2018;127:370–3.

31. Choe J, Lee SM, Do K-H, Lee G, Lee J-G, Lee SM, et al. Deep Learning-based Image Conversion of CT Reconstruction Kernels Improves Radiomics Reproducibility for Pulmonary Nodules or Masses. Radiology. 2019;292:365–73.

32. Chatterjee A, Vallières M, Dohan A, Levesque IR, Ueno Y, Saif S, et al. Creating robust predictive radiomic models for data from independent institutions using normalization. IEEE Trans Radiat Plasma Med Sci. 2019;1–1.

33. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostat Oxf Engl. 2007;8:118–27.

34. Lucia F, Visvikis D, Vallières M, Desseroit M-C, Miranda O, Robin P, et al. External validation of a combined PET and MRI radiomics model for prediction of recurrence in cervical cancer patients treated with chemoradiotherapy. Eur J Nucl Med Mol Imaging. 2019;46:864–77.

35. Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, et al. Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PloS One. 2011;6:e17238.

36. Hatt M, Lee JA, Schmidtlein CR, Naqa IE, Caldwell C, De Bernardi E, et al. Classification and evaluation strategies of auto-segmentation approaches for PET: Report of AAPM task group No. 211. Med Phys. 2017;44:e1–42.

37. Berthon B, Spezi E, Galavis P, Shepherd T, Apte A, Hatt M, et al. Toward a standard for the evaluation of PET-Auto-Segmentation methods following the recommendations of AAPM task group No. 211: Requirements and implementation. Med Phys. 2017;44:4098–111.

38. Hatt M, Laurent B, Ouahabi A, Fayad H, Tan S, Li L, et al. The first MICCAI challenge on PET tumor segmentation. Med Image Anal. 2018;44:177–95.

39. Tixier F, Le Rest CC, Hatt M, Albarghach N, Pradier O, Metges JP, et al. Intratumor heterogeneity characterized by textural features on baseline 18F-FDG PET images predicts response to concomitant radiochemotherapy in esophageal cancer. J Nucl Med. 2011;52:369–78.