As biomedical scientists, the idea of having an identical twin is a tantalizing one. A completely other you to test out the effects of different health variables and therapies….
The chance of being an identical twin is about 0.4%. But if that twin suddenly became virtual, everyone could have a simulated counterpart grounded in real-world data.
This is the concept of a “digital twin.”
A virtual model is built from someone’s health and biomedical data (e.g., clinical, digital/wearable, or molecular data), helping scientists understand a patient’s unique health profile. These characteristics could then be used to simulate a patient’s response to different medical interventions, help us uncover the impact of new interventions, and speed up the drug discovery process.
Digital twins have been widely used in the aerospace and automobile industry to test innovations safely. But their adoption in healthcare is only just beginning to emerge. This late adoption is driven by our limited understanding of disease biology, lack of access to appropriate data and mechanistic models to reflect disease trajectory, and a lack of trust in computational models and their outputs.
Here we explore the reasons behind this lack of trust in models generating digital twins, and what we can do to address some of the challenges.
Access to Reliable Data
One of the greatest challenges in fully realizing the benefits of biomedical digital twins is the lack of access to adequate and appropriate data. Models that generate digital twins of patients for various diseases require training a combination of mechanistic, AI-based generative, and forecasting models. These models require a large amount of multi-modal data collected over time from an individual to be able to model/predict the behavior of the digital twins.
It’s unlikely that any single medical or research organization will have all the required data and expertise for these models. We foresee the need for coalitions of researchers from biology, medicine, computer science, and physical sciences to contribute to model development based on data that are generated across institutions and disciplines, as well as across patient populations.
Large, organized data management efforts focused on modeling digital twins can facilitate this trans-disciplinary research and development. Additionally, community engagement efforts are also needed for the success of such cross-institutional coalitions. The nascency of digital twins research provides a unique opportunity to proactively develop purposeful tools and platforms to ensure transparent and patient-centered digital twin development.
Effective data management for digital twins involves supporting two crucial stages: Model Generation and Twin Generation. This approach ensures seamless data transfer between those who generate data and those who use it.
What We Need to Build the Models
Data and Model Sharing Platforms
A cloud-based, scalable solution for data ingest and management that transcends institutional and platform-specific boundaries is essential to support building digital twin models. Such a system would need to efficiently gather vast amounts of data from diverse sources, process it swiftly, and prepare it for on-demand distribution, ensuring that the flow of information is seamless and uninterrupted.
Digital twin models will be dynamic, evolving entities. As new data-modality-capture systems emerge and our understanding of biology deepens, these models must continually adapt. This requires platforms that support an ongoing update cycle, where data is not only fed into models but also refined through iterative feedback between data generators and model creators. This collaborative loop ensures that the most relevant and accurate data is collected, enabling models to evolve and improve continuously.
A key challenge in digital twin research is the disparity in data across different institutions. Platforms that can measure these disparities and harmonize data across sources are crucial for breaking down silos and promoting a unified approach to digital twin development. By integrating and standardizing data on such a scale, we can drive innovation in digital twin model building capabilities.
Beyond data management, we foresee the development of platforms for storing and sharing digital twin models. These platforms must adhere to rigorous standards for model description to ensure that models are findable and accessible. Detailed model descriptions, encompassing various parameters at various scales—whether cellular, tissue, or organism—are essential for clarity and usability.
Accuracy is the cornerstone of digital twin technology. Therefore, model-sharing platforms that enable continuous benchmarking against gold standard datasets will be invaluable. This capability not only enhances the precision of models but also provides transparency regarding their performance, fostering trust within the research community. By setting realistic expectations for output, these platforms will play a critical role in advancing the field of digital twins.
Data and Metadata Standards
Digital twin models can utilize data originating at various scales, e.g. cellular, tissue, and organism-level data. Existing data models need to be enhanced to successfully connect and integrate such multiscale data. To generate reliable models, data must also undergo rigorous quality checks and be prepared to meet FAIR standards.
To enable successful integration of data, detailed metadata is necessary to include to account for the nuances and biases in data capture. Tools that facilitate easy annotation of data files, leveraging automated capture from laboratory information management systems (LIMS) or electronic laboratory notebooks (ELNS), as well as manual input by researchers, are crucial for scalability while maintaining data quality.
What We Need to Generate the Twins
Twin-Data Management
Once the models to generate digital twins are developed, several measures should be implemented on data-sharing platforms to facilitate the generation and storage of patients’ digital twins. Clinical sites must have access to twin-generating platforms to support data ingestion for individual patients, and integrate newly acquired data on a rolling basis. Both new and old data require integration using unique patient identifiers while maintaining HIPAA compliance and patient privacy. All data that is ingested, processed, formatted, and then used downstream in the models for generating digital twins must preserve unbroken provenance to improve and protect transparency of data use.
Model Descriptions as Twin Metadata
Digital twins are only as robust as the generative models they are built on, making it crucial to implement specific standards for describing and deploying these models. These descriptions act as essential metadata for the digital twins, much like assay parameters serve as valuable metadata for data files in biological research.
A critical aspect of maintaining the reliability of digital twins is measuring the congruence between data from real patients and the predictions made by the digital twins. This comparison highlights the strengths and weaknesses of the models, ensuring transparency in their performance. Any platform supporting digital twins should facilitate the display of these congruence metrics, allowing users to assess and trust the digital twins, especially when multiple models are in competition.
Additionally, the containerization of models—packaging them so they can be deployed by users other than the original model creators—presents an opportunity to test both the generalizability of the models and the fidelity of the resulting digital twins. Platforms that provide insights into model deployment and generalizability will be invaluable in building user trust in digital twins and ensuring their reliable application in downstream processes.
Keeping Twins Secure
As we embrace the potential of digital twins in healthcare, we must also address the ethical and security implications of these digital replicas. Digital twins, containing vast stores of personal health data, require stringent security measures to protect the individuals they represent.
Informed consent and risk minimization strategies are paramount to preserving the privacy and trust of research participants. Patients must be informed about how their data will be used, who will have access to their digital counterparts, and each study context in which their digital twin is enrolled. This speaks to the concept of “digital dignity,” which is gaining traction as the public becomes more aware of the use of personal data in tracking, marketing, and other potentially invasive applications. The principles of digital dignity can be extended to research participants, enabling them to monitor the use of their data in current and future studies through a dynamic attribution and consent process.
Data governance takes on increased importance with digital twins. Unlike traditional medical records, these comprehensive digital models cannot be fully de-identified and require robust frameworks to protect privacy while preserving data context. It’s essential to guard against potential data misuse and exploitation, such as using insights gained from models to influence health insurance decisions or making broad predictions about disease risk that could harm specific demographics.
Implementing strict access controls is another crucial aspect of digital twin management. Access should be limited to authorized parties within secure environments, with permissions tailored to different stages of twin development. To ensure compliance with research objectives and ethical guidelines, we should consider the monitoring and auditing of protection measures, reliability of results, decisions made on the use of digital twins, and handling of other data outputs.
Finally, Ethical, Legal, and Social Implications (ELSI) frameworks should also be implemented if or when considering the dissemination of results from digital twin studies. While sharing research insights with clinical care teams could significantly enhance model validity and improve research outcomes, it’s imperative to weigh the potential benefits against the risks to participants. This delicate balance requires a thoughtful approach to transparency in the pursuit of knowledge, without compromising individual privacy or dignity.
Trusting Our Twins
Given that drug discovery and clinical decision-making may be among the most important use cases of digital twins, we need to ensure the trustworthiness of the twins. For data used in model generation, we must establish and maintain comprehensive metrics for quality assessment and harmonization. These metrics should be clearly defined, readily accessible, and continuously updated.
Predictive modeling inherently involves uncertainty, which must be acknowledged and quantified in digital twin outputs. Uncertainty metrics should be documented and displayed alongside predicted data to prevent misinterpretation. This transparency enables informed decision-making, enhances the reliability of models in healthcare applications, and can foster confidence among clinicians and the public.
Final thoughts
As we at Sage Bionetworks get ready to navigate this frontier in healthcare technology, we recognize the immense potential we have in helping to pave the way through our long history of enabling data management at scale and continuous benchmarking of machine learning models while preserving appropriate data governance. As we continue to explore opportunities to support digital twin research, we’d love to hear from you. We welcome your thoughts, experiences, and an opportunity to collaborate!
Loved this! And once the real-world "digital twin" is modeled well enough, we've developed a causal inference method to test it in a virtual lab. https://arxiv.org/abs/2208.00739