AI Will Not Replace Biologists. But Biologists Who Understand AI Systems Will Replace Those Who Only Use AI Tools
When I first moved from genomics and computational biology into AI biology, I thought the biggest challenge would be learning new models.
Large language models. Graph neural networks. Foundation models. AI agents. Embedding spaces. Retrieval-augmented generation. Model evaluation. MLOps.
All of these were important. But over time, I realized something deeper:
The real challenge is not simply learning how to use AI tools.
The real challenge is learning how to think in AI systems.
This distinction matters because biology is entering a new era. AI is no longer just a convenient assistant for writing code, summarizing papers, or generating figures. It is beginning to shape how we organize knowledge, design experiments, prioritize hypotheses, analyze multimodal data, and make decisions in discovery pipelines.
We have already seen powerful examples. AlphaFold changed how many scientists think about protein structure prediction [1]. Protein language models showed that large-scale learning from sequences can capture biological structure and function [2]. Deep learning models such as Enformer demonstrated that sequence-based models can improve gene expression prediction by modeling long-range regulatory information [3]. More broadly, deep learning has become an important part of modern genomics and genome interpretation [4,5].
But these advances also reveal something important: the future of AI biology is not only about bigger models. It is about building better scientific systems around those models.
In this new era, the biologists who thrive will not necessarily be the ones who use the most AI tools. They will be the ones who understand how AI systems work, where they fail, how to evaluate them, and how to connect them to real biological questions.
That is why I believe:
AI will not replace biologists. But biologists who understand AI systems will replace those who only use AI tools.
From using tools to understanding systems
Many scientists today are already using AI. They ask ChatGPT to polish manuscripts, write Python scripts, explain error messages, summarize literature, or draft emails. These uses are helpful. They save time. They reduce friction. They make technical work more accessible.
But using AI as a tool is only the first layer.
A tool user asks:
“Can AI help me finish this task faster?”
A systems thinker asks:
“What is the input? What is the output? What knowledge does the model have? What knowledge is missing? How do I evaluate whether the answer is correct? How does this fit into a larger scientific workflow?”
This difference may sound small, but it changes everything.
For example, if I ask an AI model to summarize papers about drought tolerance in maize, I may receive a fluent answer. But a fluent answer is not necessarily a reliable answer. A biologist who only uses the tool may accept the summary too quickly. A biologist who understands AI systems will ask: Which papers were retrieved? Are they current? Are the claims supported by experimental evidence? Are the genes discussed in the right tissue, developmental stage, and environmental context? Are we mixing evidence from Arabidopsis, rice, sorghum, and maize as if they were interchangeable?
In biology, context is not decoration. Context is the science.
A gene is not simply “important.” It is important in a genotype, tissue, cell type, developmental stage, environment, and evolutionary history. A variant is not simply “associated.” It has allele frequency, linkage disequilibrium, population structure, effect size, uncertainty, and biological plausibility. A regulatory element is not simply “predicted.” It has chromatin accessibility, transcription factor binding, conservation, activity, target gene ambiguity, and experimental validation limits.
This is why biological AI requires more than prompting. It requires system-level thinking.
Biology is not just data
One common misunderstanding in AI biology is the idea that biology is simply a data problem.
More data, bigger model, better prediction.
Sometimes that is true. Often, it is incomplete.
Biological data is noisy, biased, incomplete, heterogeneous, and deeply contextual. Different data types capture different layers of life: genome sequence, chromatin accessibility, gene expression, methylation, protein structure, metabolites, phenotypes, environmental variables, clinical outcomes, and evolutionary constraints. Each layer has its own measurement errors, assumptions, and missingness.
A machine learning model can find patterns. But not every pattern is meaningful. Not every correlation is causal. Not every prediction is actionable.
This is where trained biologists remain essential.
Biologists understand experimental design. They understand confounding. They know that a beautiful heatmap can hide a batch effect. They know that a significant association can be driven by population structure. They know that a gene expression signal may reflect cell type composition rather than regulation. They know that a model trained on one species, tissue, or condition may not generalize to another.
AI can accelerate discovery, but it does not automatically understand what makes a biological conclusion trustworthy.
That judgment still comes from scientists.
The future belongs to scientists who can combine biological judgment with AI system design.
The next skill is not just coding
For the past decade, many biologists were told: “Learn to code.”
That advice was useful. Coding opened the door to bioinformatics, genomics, data analysis, and reproducible research. It allowed biologists to work directly with large datasets rather than relying entirely on others.
But in the AI era, coding alone is no longer enough.
The next skill is understanding how biological knowledge becomes computable.
This includes questions such as:
How do we represent biological entities and relationships?
How do we connect genes, variants, traits, pathways, tissues, environments, publications, and experimental evidence?
How do we integrate structured databases with unstructured literature?
How do we build workflows where AI agents can retrieve, reason, analyze, and report?
How do we evaluate whether an AI-generated hypothesis is biologically meaningful?
How do we prevent models from producing confident but unsupported conclusions?
These are not just computer science questions. They are scientific questions.
A good AI system for biology is not just a model. It is a carefully designed connection between data, knowledge, algorithms, evaluation, and human decision-making.
That is why I believe knowledge representation will become one of the most important skills in AI biology.
Biomedical knowledge graphs already show why representation matters. They provide a way to connect entities such as genes, proteins, diseases, drugs, phenotypes, pathways, and publications into structured relationships that both humans and machines can query and reason over [6]. Graph representation learning further extends this idea by learning from the topology and semantics of biological and biomedical networks [7].
In genomics, we often start with sequences. But discovery rarely ends with sequence alone. We need to connect sequence variation to gene regulation, gene regulation to cellular function, cellular function to phenotype, and phenotype to environment or disease. This chain is complex. It is full of uncertainty. But it is also where the real biological meaning lives.
AI systems that ignore this complexity may generate answers. AI systems that model this complexity may generate insight.
The danger of becoming only an AI consumer
There is a risk in the current AI wave: scientists may become passive consumers of AI outputs.
The model suggests a candidate gene.
The model ranks a variant.
The model proposes a pathway.
The model writes the interpretation.
If we are not careful, scientists may slowly lose the habit of questioning the reasoning behind the output.
That would be dangerous.
Science advances through skepticism. We ask why. We ask how. We ask what evidence supports the claim. We ask whether there is another explanation. We ask what experiment could prove us wrong.
AI should not weaken this habit. It should make it stronger.
A biologist who understands AI systems does not blindly trust the model. But she/he also does not reject it out of fear. Instead, she treats AI as a powerful but imperfect collaborator.
She/he asks:
What data was this model trained on?
What assumptions are built into the system?
What is the failure mode?
What kind of uncertainty is being hidden?
What evidence would increase my confidence?
What experiment should come next?
This is the mindset we need.
Not AI worship.
Not AI fear.
AI literacy with scientific discipline.
What should biologists learn now?
Not all biologists need to become machine learning engineers. But I do think more biologists need to understand the architecture of AI-enabled discovery.
At minimum, future-ready biologists should understand five things.
First, they should understand data. Not only how to download it, but how it was generated, normalized, biased, and limited.
Second, they should understand representation. In biology, how we represent a problem often determines what the model can learn. A sequence, a graph, a table, an image, a time series, and a knowledge graph all expose different aspects of the same biological system.
Third, they should understand models. They do not need to derive every equation, but they should know what different models are good at, what they assume, and when they are likely to fail.
Fourth, they should understand evaluation. In AI biology, a high benchmark score is not the same as biological usefulness. We need to evaluate models based on generalization, interpretability, robustness, experimental relevance, and decision value. Recent discussions of large language models in scientific discovery also emphasize that these systems should be integrated into scientific workflows with clear human goals and clear evaluation metrics [8].
Fifth, they should understand workflows. AI is most powerful when embedded into real scientific workflows: literature mining, data integration, hypothesis generation, prioritization, experiment design, and feedback from validation.
This is the shift from using AI tools to building AI-assisted scientific systems.
A personal transition
My own path into AI biology did not start from computer science. It started from population genetics, evolutionary biology, and genomics.
Population genetics trained me to think about variation, structure, uncertainty, history, and selection. Genomics trained me to work with large-scale biological data. Bioinformatics trained me to build pipelines and extract signals from complexity. AI is now teaching me to think about representation, reasoning, automation, and decision systems.
Each stage did not replace the previous one. It expanded it.
This is why I do not see AI as a departure from biology. I see it as a new language for asking biological questions.
But learning this language requires humility.
We need to admit that many AI methods are unfamiliar. We need to learn new concepts. We need to collaborate with engineers, data scientists, and machine learning experts. But we also need to remember that biological insight is not outdated. It is more important than ever.
The scientist of the future will not be defined by one discipline. She/he will be able to move between biology, computation, data infrastructure, AI models, and real-world decisions.
She/he will not simply ask, “What can this tool do?”
She/he will ask, “What kind of scientific system are we building?”
The future biologist
The future biologist will still care about genes, cells, organisms, evolution, disease, crops, ecosystems, and patients.
But She/he will also understand embeddings, knowledge graphs, agents, multimodal data, model evaluation, and feedback loops.
She/he will know how to ask good biological questions and how to design AI systems that make those questions computable.
She/he will be skeptical but not afraid.
Technical but not narrow.
Biological but not limited by traditional boundaries.
Curious enough to learn new tools, and wise enough not to be ruled by them.
AI will change biology. There is no doubt about that.
But the deepest transformation will not come from replacing scientists. It will come from changing what scientists are capable of doing.
The most valuable biologists in the AI era will not be those who simply use AI to work faster.
They will be those who understand enough biology to ask meaningful questions, enough AI to build powerful systems, and enough scientific judgment to know when the answer is real.
References and further reading
[1] Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). doi:10.1038/s41586-021-03819-2.
[2] Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, e2016239118 (2021). doi:10.1073/pnas.2016239118.
[3] Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods 18, 1196–1203 (2021). doi:10.1038/s41592-021-01252-x.
[4] Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A. & Telenti, A. A primer on deep learning in genomics. Nature Genetics 51, 12–18 (2019). doi:10.1038/s41588-018-0295-5.
[5] Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modeling techniques for genomics. Nature Reviews Genetics 20, 389–403 (2019). doi:10.1038/s41576-019-0122-6.
[6] Nicholson, D. N. & Greene, C. S. Constructing knowledge graphs and their biomedical applications. Computational and Structural Biotechnology Journal 18, 1414–1428 (2020). doi:10.1016/j.csbj.2020.05.017.
[7] Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering 6, 1353–1369 (2022). doi:10.1038/s41551-022-00942-x.
[8] Zhang, Y. et al. Exploring the role of large language models in the scientific method: from hypothesis to discovery. npj Artificial Intelligence 1, Article 14 (2025). doi:10.1038/s44387-025-00019-5.
[9] Bommasani, R. et al. On the opportunities and risks of foundation models. arXiv:2108.07258 (2021).