Sign up to save tools and stay up to date with the latest in AI
bg
bg
1

Nucleotide Transformer: building and evaluating robust foundation models for human genomics

Dec 07, 2024 - nature.com
Researchers have developed a new model, the Nucleotide Transformer (NT), to encode genomic sequences. The NT model was trained on three different datasets, including the human reference genome, a collection of 3,202 diverse human genomes, and 850 genomes from various species. The model was then tested on 18 genomic curated prediction tasks and compared with three alternative DNA foundational models. The NT model demonstrated strong performance, either matching or surpassing the baseline models in most tasks. The researchers also found that the NT models could detect known genomic elements within their embeddings in an unsupervised manner, which could be harnessed for efficient downstream genomics task predictions.

The study also compared the NT models to other genomics foundational models and found that the NT models achieved the highest overall performance across tasks. The researchers also found that the NT models could detect known genomic elements within their embeddings in an unsupervised manner. The models were also able to assess the severity of various genetic variants and prioritize those with functional significance. The researchers concluded that the NT models have the potential to transform the field of genomics and provide a robust tool for genomic prediction tasks.

Key takeaways:

  • The study presents the Nucleotide Transformer (NT), a robust foundation model to encode genomic sequences, which can be adapted for a wide range of predictive tasks in the field of artificial intelligence (AI).
  • The NT models were trained on extensive datasets and demonstrated the ability to compete with and even outperform previous methods for tasks such as predicting protein structure and function, even in data-scarce regimens.
  • The NT models were found to be capable of detecting known genomic elements within their embeddings in an unsupervised manner, which can be harnessed for efficient downstream genomics task predictions.
  • The study also demonstrated the NT models' ability to assess the severity of various genetic variants and prioritize those with functional significance, which could prove helpful in evaluating the significance of genetic variants.
View Full Article

Comments (0)

Be the first to comment!