Sunday, June 21, 2026

From Genomics to AI Biology (Series 1)

 AI Will Not Replace Biologists. But Biologists Who Understand AI Systems Will Replace Those Who Only Use AI Tools


By Li Lei on June 21, 2026

When I first moved from genomics and computational biology into AI biology, I thought the biggest challenge would be learning new models.

Large language models. Graph neural networks. Foundation models. AI agents. Embedding spaces. Retrieval-augmented generation. Model evaluation. MLOps.

All of these were important. But over time, I realized something deeper:

The real challenge is not simply learning how to use AI tools.

The real challenge is learning how to think in AI systems.

This distinction matters because biology is entering a new era. AI is no longer just a convenient assistant for writing code, summarizing papers, or generating figures. It is beginning to shape how we organize knowledge, design experiments, prioritize hypotheses, analyze multimodal data, and make decisions in discovery pipelines.

We have already seen powerful examples. AlphaFold changed how many scientists think about protein structure prediction [1]. Protein language models showed that large-scale learning from sequences can capture biological structure and function [2]. Deep learning models such as Enformer demonstrated that sequence-based models can improve gene expression prediction by modeling long-range regulatory information [3]. More broadly, deep learning has become an important part of modern genomics and genome interpretation [4,5].

But these advances also reveal something important: the future of AI biology is not only about bigger models. It is about building better scientific systems around those models.

In this new era, the biologists who thrive will not necessarily be the ones who use the most AI tools. They will be the ones who understand how AI systems work, where they fail, how to evaluate them, and how to connect them to real biological questions.

That is why I believe:

AI will not replace biologists. But biologists who understand AI systems will replace those who only use AI tools.

From using tools to understanding systems

Many scientists today are already using AI. They ask ChatGPT to polish manuscripts, write Python scripts, explain error messages, summarize literature, or draft emails. These uses are helpful. They save time. They reduce friction. They make technical work more accessible.

But using AI as a tool is only the first layer.

A tool user asks:

“Can AI help me finish this task faster?”

A systems thinker asks:

“What is the input? What is the output? What knowledge does the model have? What knowledge is missing? How do I evaluate whether the answer is correct? How does this fit into a larger scientific workflow?”

This difference may sound small, but it changes everything.

For example, if I ask an AI model to summarize papers about drought tolerance in maize, I may receive a fluent answer. But a fluent answer is not necessarily a reliable answer. A biologist who only uses the tool may accept the summary too quickly. A biologist who understands AI systems will ask: Which papers were retrieved? Are they current? Are the claims supported by experimental evidence? Are the genes discussed in the right tissue, developmental stage, and environmental context? Are we mixing evidence from Arabidopsis, rice, sorghum, and maize as if they were interchangeable?

In biology, context is not decoration. Context is the science.

A gene is not simply “important.” It is important in a genotype, tissue, cell type, developmental stage, environment, and evolutionary history. A variant is not simply “associated.” It has allele frequency, linkage disequilibrium, population structure, effect size, uncertainty, and biological plausibility. A regulatory element is not simply “predicted.” It has chromatin accessibility, transcription factor binding, conservation, activity, target gene ambiguity, and experimental validation limits.

This is why biological AI requires more than prompting. It requires system-level thinking.

Biology is not just data

One common misunderstanding in AI biology is the idea that biology is simply a data problem.

More data, bigger model, better prediction.

Sometimes that is true. Often, it is incomplete.

Biological data is noisy, biased, incomplete, heterogeneous, and deeply contextual. Different data types capture different layers of life: genome sequence, chromatin accessibility, gene expression, methylation, protein structure, metabolites, phenotypes, environmental variables, clinical outcomes, and evolutionary constraints. Each layer has its own measurement errors, assumptions, and missingness.

A machine learning model can find patterns. But not every pattern is meaningful. Not every correlation is causal. Not every prediction is actionable.

This is where trained biologists remain essential.

Biologists understand experimental design. They understand confounding. They know that a beautiful heatmap can hide a batch effect. They know that a significant association can be driven by population structure. They know that a gene expression signal may reflect cell type composition rather than regulation. They know that a model trained on one species, tissue, or condition may not generalize to another.

AI can accelerate discovery, but it does not automatically understand what makes a biological conclusion trustworthy.

That judgment still comes from scientists.

The future belongs to scientists who can combine biological judgment with AI system design.

The next skill is not just coding

For the past decade, many biologists were told: “Learn to code.”

That advice was useful. Coding opened the door to bioinformatics, genomics, data analysis, and reproducible research. It allowed biologists to work directly with large datasets rather than relying entirely on others.

But in the AI era, coding alone is no longer enough.

The next skill is understanding how biological knowledge becomes computable.

This includes questions such as:

How do we represent biological entities and relationships?

How do we connect genes, variants, traits, pathways, tissues, environments, publications, and experimental evidence?

How do we integrate structured databases with unstructured literature?

How do we build workflows where AI agents can retrieve, reason, analyze, and report?

How do we evaluate whether an AI-generated hypothesis is biologically meaningful?

How do we prevent models from producing confident but unsupported conclusions?

These are not just computer science questions. They are scientific questions.

A good AI system for biology is not just a model. It is a carefully designed connection between data, knowledge, algorithms, evaluation, and human decision-making.

That is why I believe knowledge representation will become one of the most important skills in AI biology.

Biomedical knowledge graphs already show why representation matters. They provide a way to connect entities such as genes, proteins, diseases, drugs, phenotypes, pathways, and publications into structured relationships that both humans and machines can query and reason over [6]. Graph representation learning further extends this idea by learning from the topology and semantics of biological and biomedical networks [7].

In genomics, we often start with sequences. But discovery rarely ends with sequence alone. We need to connect sequence variation to gene regulation, gene regulation to cellular function, cellular function to phenotype, and phenotype to environment or disease. This chain is complex. It is full of uncertainty. But it is also where the real biological meaning lives.

AI systems that ignore this complexity may generate answers. AI systems that model this complexity may generate insight.

The danger of becoming only an AI consumer

There is a risk in the current AI wave: scientists may become passive consumers of AI outputs.

The model suggests a candidate gene.

The model ranks a variant.

The model proposes a pathway.

The model writes the interpretation.

If we are not careful, scientists may slowly lose the habit of questioning the reasoning behind the output.

That would be dangerous.

Science advances through skepticism. We ask why. We ask how. We ask what evidence supports the claim. We ask whether there is another explanation. We ask what experiment could prove us wrong.

AI should not weaken this habit. It should make it stronger.

A biologist who understands AI systems does not blindly trust the model. But she/he also does not reject it out of fear. Instead, she treats AI as a powerful but imperfect collaborator.

She/he asks:

What data was this model trained on?

What assumptions are built into the system?

What is the failure mode?

What kind of uncertainty is being hidden?

What evidence would increase my confidence?

What experiment should come next?

This is the mindset we need.

Not AI worship.

Not AI fear.

AI literacy with scientific discipline.

What should biologists learn now?

Not all biologists need to become machine learning engineers. But I do think more biologists need to understand the architecture of AI-enabled discovery.

At minimum, future-ready biologists should understand five things.

First, they should understand data. Not only how to download it, but how it was generated, normalized, biased, and limited.

Second, they should understand representation. In biology, how we represent a problem often determines what the model can learn. A sequence, a graph, a table, an image, a time series, and a knowledge graph all expose different aspects of the same biological system.

Third, they should understand models. They do not need to derive every equation, but they should know what different models are good at, what they assume, and when they are likely to fail.

Fourth, they should understand evaluation. In AI biology, a high benchmark score is not the same as biological usefulness. We need to evaluate models based on generalization, interpretability, robustness, experimental relevance, and decision value. Recent discussions of large language models in scientific discovery also emphasize that these systems should be integrated into scientific workflows with clear human goals and clear evaluation metrics [8].

Fifth, they should understand workflows. AI is most powerful when embedded into real scientific workflows: literature mining, data integration, hypothesis generation, prioritization, experiment design, and feedback from validation.

This is the shift from using AI tools to building AI-assisted scientific systems.

A personal transition

My own path into AI biology did not start from computer science. It started from population genetics, evolutionary biology, and genomics.

Population genetics trained me to think about variation, structure, uncertainty, history, and selection. Genomics trained me to work with large-scale biological data. Bioinformatics trained me to build pipelines and extract signals from complexity. AI is now teaching me to think about representation, reasoning, automation, and decision systems.

Each stage did not replace the previous one. It expanded it.

This is why I do not see AI as a departure from biology. I see it as a new language for asking biological questions.

But learning this language requires humility.

We need to admit that many AI methods are unfamiliar. We need to learn new concepts. We need to collaborate with engineers, data scientists, and machine learning experts. But we also need to remember that biological insight is not outdated. It is more important than ever.

The scientist of the future will not be defined by one discipline. She/he will be able to move between biology, computation, data infrastructure, AI models, and real-world decisions.

She/he will not simply ask, “What can this tool do?”

She/he will ask, “What kind of scientific system are we building?”

The future biologist

The future biologist will still care about genes, cells, organisms, evolution, disease, crops, ecosystems, and patients.

But She/he will also understand embeddings, knowledge graphs, agents, multimodal data, model evaluation, and feedback loops.

She/he will know how to ask good biological questions and how to design AI systems that make those questions computable.

She/he will be skeptical but not afraid.

Technical but not narrow.

Biological but not limited by traditional boundaries.

Curious enough to learn new tools, and wise enough not to be ruled by them.

AI will change biology. There is no doubt about that.

But the deepest transformation will not come from replacing scientists. It will come from changing what scientists are capable of doing.

The most valuable biologists in the AI era will not be those who simply use AI to work faster.

They will be those who understand enough biology to ask meaningful questions, enough AI to build powerful systems, and enough scientific judgment to know when the answer is real.

References and further reading

[1] Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). doi:10.1038/s41586-021-03819-2.

[2] Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, e2016239118 (2021). doi:10.1073/pnas.2016239118.

[3] Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods 18, 1196–1203 (2021). doi:10.1038/s41592-021-01252-x.

[4] Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A. & Telenti, A. A primer on deep learning in genomics. Nature Genetics 51, 12–18 (2019). doi:10.1038/s41588-018-0295-5.

[5] Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modeling techniques for genomics. Nature Reviews Genetics 20, 389–403 (2019). doi:10.1038/s41576-019-0122-6.

[6] Nicholson, D. N. & Greene, C. S. Constructing knowledge graphs and their biomedical applications. Computational and Structural Biotechnology Journal 18, 1414–1428 (2020). doi:10.1016/j.csbj.2020.05.017.

[7] Li, M. M., Huang, K. & Zitnik, M. Graph representation learning in biomedicine and healthcare. Nature Biomedical Engineering 6, 1353–1369 (2022). doi:10.1038/s41551-022-00942-x.

[8] Zhang, Y. et al. Exploring the role of large language models in the scientific method: from hypothesis to discovery. npj Artificial Intelligence 1, Article 14 (2025). doi:10.1038/s44387-025-00019-5.

[9] Bommasani, R. et al. On the opportunities and risks of foundation models. arXiv:2108.07258 (2021).


Icebreaker Speech for Toastmaster club: “Who Am I?”

 

Icebreaker Speech for Toastmasters Club: “Who Am I?”

by Li Lei in RTP, NC, in June, 2026


Good afternoon, everyone!

Today is my icebreaker speech at ToastWhisper Club here at Syngenta. Since many of you already know me from work, I want to begin with a simple question:

Who do you think I am?

Maybe some of you would say, “Li is an AI scientist.”
Some may say, “She works in computational biology.”
Some may say, “She is always talking about data, genes, models, and pipelines.”
And some of you may say, “She is the new colleague who is still trying to figure out where everything is in this building.”

All of those are true.

But today, I want to tell you a little bit about the person behind the job title.

I was born and raised in Zhaotong, a small city in Yunnan Province in southwest China. Yunnan is famous for its mountains, flowers, ethnic diversity, and beautiful landscapes. Zhaotong is not a big city, but it shaped me deeply. It gave me curiosity, imagination, and maybe also a little bit of stubbornness.

When I was a little girl, I had a very clear dream: I wanted to become a mathematician.

Not because I fully understood what mathematicians did every day. I did not imagine myself standing in front of a blackboard writing equations for the rest of my life. The real reason was simpler — and a little rebellious.

I heard people say, “Girls are not good at math, especially when they get older.”

I remember thinking, “Really? Who decided that?”

So naturally, I wanted to prove them wrong.

At that time, I learned about Emmy Noether, a brilliant German mathematician. She became one of my role models. She lived in a time when women faced many barriers in academia, but her work changed modern mathematics and physics. To me, she represented intelligence, courage, and quiet strength.

So I studied math very seriously. I loved the beauty of numbers and logic. Math felt like a world where every problem had a hidden door, and if you were patient enough, you could find the key.

My math grades were excellent, and I later won a silver medal in a national mathematics competition. For a young girl from a small city, that was a big encouragement. It made me believe that many limits people place on us are not always real. Sometimes they are just walls built by other people’s assumptions.

But math was not my only dream.

I also wanted to become a poet.

That was a very different dream. Math gave me structure. Poetry gave me freedom. Math helped me understand the world through logic. Poetry helped me feel the world through language.

But when I was young, I also heard people say, “Poets are usually not very happy.” Many famous poets had difficult lives. Some were lonely, depressed, or died young.

So I thought, “Well… maybe being a professional poet is a little dangerous.”

I decided poetry could stay with me as a hobby — a private garden in my heart — but maybe not as my profession.

Looking back, I find this funny. As a child, I gave up being a poet because I thought it was emotionally risky. Then I became a scientist — which is also emotionally risky, just in a different way.

In science, experiments fail. Code breaks. Models do not converge. Papers get rejected. Funding is uncertain. And sometimes, after months of analysis, the data simply tells you, “No.”

So maybe scientists and poets are not so different. Both are searching for patterns. Both are trying to express something true. One uses equations and data; the other uses images and words.

Today, I am an AI scientist, a computational biologist, and a population geneticist. That may sound far away from the little girl who wanted to become a mathematician and poet. But I feel those dreams are still inside me.

As an AI scientist, I still use mathematical thinking. I work with models, algorithms, data, and uncertainty. As a computational biologist, I study life through patterns hidden in genomes and biological systems. As a population geneticist, I think about evolution — how life changes over time, how diversity emerges, and how history leaves traces in DNA.

In some way, I did not abandon math. I followed math into biology.

And I did not abandon poetry either. I still love language, stories, and the beauty of expression. That is one reason I joined Toastmasters. I want to become not only a better scientist, but also a better communicator. I want to learn how to tell stories, how to speak clearly, and how to connect ideas with people.

Because in science, having good ideas is important. But being able to communicate those ideas is equally important. A good idea that no one understands is like a beautiful poem locked in a drawer.

So, who am I?

I am a girl from Zhaotong who once dreamed of becoming a mathematician.
I am someone who still carries poetry quietly inside her.
I am a scientist who studies life through data.
I am a colleague, a learner, and now, a Toastmaster beginning a new journey.

And perhaps, like all of us, I am still becoming.

Thank you.