Home
The Pulse
The Pulse: Articles
Seeing The Invisible: Ai’s Role In Protein Sequencing

Seeing the invisible: AI’s role in protein sequencing

24/03/2026

share Whatsapp. It will open in a new tab

A creative representation of a DNA strand symbolizing health technology.

Artificial intelligence is transforming the way scientists “see” proteins, turning once-enigmatic molecular structures into clear maps that guide drug discovery. IE University researcher Dr. Rubén Sánchez-García and his team are using AI to make structural biology faster, more accurate and more accessible.

Imagine trying to design a key without ever seeing the lock. That has been the challenge facing drug discovery for decades. Inside every cell, proteins act as molecular machines, carrying out nearly every biological process. Each protein’s 3D structure determines how it interacts with other molecules. When that shape changes or when the wrong molecule binds to it, disease can follow. To design a drug that fits precisely, scientists must first “see” the lock they are trying to open.

Until recently, most of these locks were invisible. Traditional techniques such as X-ray crystallography or NMR spectroscopy could reveal structures, but only after months or years of delicate lab work. Even with newer imaging tools like cryo-electron microscopy (cryo-EM), which freezes proteins in thin layers of ice and observes them with an electron beam, researchers still faced long hours of data collection and manual interpretation. Dr. Sánchez-García compares this process to “connecting ten thousand dots.” Each dot represents a data point in the electron-density map, and joining them into a coherent picture once required painstaking manual effort.

As a consequence, the field has long focused on individual pieces of the puzzle rather than the full cellular network. Dr. Sánchez-García's goal is to make structure determination scale up the way genomics did twenty years ago, moving from single genes to the entire genome, or in this case, from one protein to the whole proteome.

That slow, protein-by-protein approach is what researchers are finally breaking through. Over the past decade, artificial intelligence and new cryo-EM technologies have begun to merge, turning those invisible dots into detailed, reliable 3D structures far more quickly.

The AI revolution: how AlphaFold and Cryo-EM work together

For years, structural biologists relied on X-ray crystallography and NMR spectroscopy to capture protein structures. These methods demanded pure, stable crystals or solutions, which ruled out many complex proteins, especially those embedded in membranes. In the 2010s, that limitation began to lift with the rise of cryo-electron microscopy, or cryo-EM. By flash-freezing proteins in a thin layer of ice and imaging them with a beam of electrons, researchers could observe molecules in their natural, hydrated state without the need for crystals.

Each cryo-EM image, however, looked more like static on an old television than a picture. Scientists had to collect hundreds of thousands of these “snowstorm” images and use computational averaging to reveal the protein’s shape. The process worked, but it was slow and fragile. Even small errors in orientation or focus could blur the final 3D reconstruction.

Then artificial intelligence arrived. In 2020, DeepMind’s AlphaFold demonstrated that AI could predict a protein’s shape directly from its genetic sequence with remarkable accuracy. Suddenly, structural biologists had powerful starting models. AlphaFold provided the hypothesis, while cryo-EM supplied the experimental proof. The combination of the two created a new scientific workflow, merging prediction and validation instead of treating them as separate worlds.

Dr. Sánchez-García’s work builds right at that intersection. His group combines cryo-EM data with AI predictions, feeding small experimental details back into the modeling process so that computational results reflect the true structure. This integration allows scientists to get high-quality models using less data and in far less time.

“AlphaFold gives you a good hypothesis,” he explains, “but sometimes the prediction does not agree with experimental evidence.” By steering the AI with just a few experimental constraints, his team can refine those predictions into models that match what is seen in the microscope.

Training AI for faster, more efficient modeling

The partnership between machine learning and experimentation has redefined structural biology. Dr. Sánchez-García’s team took this union further, building deep-learning tools that accelerate one of the hardest parts of cryo-EM: the transformation of noisy 3D maps into clear atomic structures.

Once the experimental snapshots are collected, the hardest part begins: turning thousands of faint cryo-EM images into a precise atomic model. Each image is a two-dimensional projection of a protein that can be oriented in countless ways. Rebuilding the correct three-dimensional structure is like assembling a puzzle, with every piece blurred and rotated differently. Traditionally, this process demanded weeks of expert work to clean, align and interpret the data.

Dr. Sánchez-García’s lab has spent the past few years developing artificial intelligence tools that make this reconstruction step dramatically faster and more accurate. His approach starts from a simple observation: cryo-EM maps are not final pictures but “clouds” of density that hint at where atoms should be. Interpreting those clouds requires sharpening and noise reduction, a task that older mathematical filters handled poorly because they treated every region the same. Dr. Sánchez-García’s team used deep learning to teach computers what a good map actually looks like and how to enhance only the meaningful parts.

Their first system, DeepEMhancer, published in Communications Biology (2021), learned from hundreds of examples of raw and expertly processed cryo-EM maps. The neural network automatically masks out background noise and locally sharpens details, performing in one step what scientists previously did through multiple manual adjustments.

Conventional processing leaves parts of the protein hidden by signals from surrounding lipids, but DeepEMhancer selectively removes this background and restores missing protein density.

On test structures, DeepEMhancer consistently improved resolution and interpretability, revealing features that had been invisible in the original data. The DeepEMhancer map produced continuous, well-defined densities even in regions where the published structure was broken, enabling additional residues to be traced. These improvements were not cosmetic; they changed how scientists could interpret the biology behind the images.

Scaling the process for practical application

Building on this foundation, Dr. Sánchez-García’s group created CryoPARES (Pose Assignment for Related Experiments via Supervision), a system designed to handle entire families of related cryo-EM datasets. In drug discovery, researchers often image the same protein bound to many different small molecules. By understanding how each molecule fits or binds with the protein, they can design more effective medicines.

The problem here is that this research requires an enormous amount of image processing and alignment. CryoPARES reduces that labor-intensive process by providing a single trained model that can predict particle orientations and quality scores for new samples in one pass.

The algorithm learns from a well-aligned reference dataset, then applies its knowledge to new ligand-bound complexes (proteins combined with a variety of different molecules), reducing processing time from hours to minutes. Its internal filters also flag poor-quality or misaligned particles before they distort the final map.

Together, DeepEMhancer and CryoPARES show how AI can partner with experimental data to remove bottlenecks in structural biology. Rather than replacing scientists, these systems free them to focus on biological meaning instead of manual corrections.

As Dr. Sánchez-García describes it, his lab’s goal is “to blend computational and experimental information to make the process much faster.” This acceleration is not just about speed; it lays the groundwork for structure-guided drug discovery on an entirely new scale.

Implications for modern drug discovery

Structure-based drug discovery follows a set pattern: scientists propose a hypothesis, collect a protein structure, design or adjust a compound, test it and repeat. Each round depends on accurate structural data, and until recently, producing that data was a bottleneck in the process.

Dr. Sánchez-García explains that while the long clinical phases of drug development are difficult to shorten, the early discovery and optimization stages, which usually last one to two years, can move much faster if structural feedback arrives sooner.

CryoPARES was built for exactly that purpose. The system can reuse information from a previously solved protein to accelerate the reconstruction of new ligand-bound forms of the same target. Instead of starting alignment from scratch each time, the model recognizes familiar patterns and infers particle orientations directly. A process that once took hours or days can now run in minutes, while maintaining near-atomic accuracy.

For drug designers, seeing how a molecule actually binds to its target is the moment when theory meets reality. This new technology means that chemists and structural biologists can evaluate multiple drug candidates in parallel, comparing how each molecule fits into the binding pocket almost in real time and dramatically shortening the research period.

Projects such as the rapid COVID-19 Moonshot have already shown that increasing structural throughput can cut the journey from concept to pre-clinical candidate from years to months. With AI-assisted cryo-EM pipelines such as CryoPARES, this acceleration could become standard practice, turning structure determination into a fast, iterative dialogue between biology and design.

Broader impact and future opportunities

The story of structural biology’s transformation is not only about machines and models; it is also about people who can speak both languages. Dr. Sánchez-García often describes his lab as a bridge between disciplines. His group writes the algorithms that interpret the data, but their work depends on experimental partners who generate that data and on chemists who understand what the results mean for drug design. “We write the code,” he says, “but we need people who ask the biological questions.”

That philosophy extends to how he envisions the next generation of researchers. Modern structural biology, he explains, is moving from hands-on benchwork to semi-automated systems where scientists interact directly with robots, sensors and databases. The researchers who thrive in that environment will be those who combine computational fluency with biological intuition. Coding and mathematical literacy are becoming as fundamental to this field as pipettes once were.

Dr. Sánchez-García also emphasizes the need for communication across specialties. A scientist who can translate between biology and computer science can often find insights that single-discipline experts miss. IE University’s SciTech Hub reflects this approach: an environment where training in life sciences, data science and AI come together under one roof.

This is not just a technological revolution, but an invitation to a new generation of researchers to step into a field where biology, AI and medicine meet, and where seeing the invisible may lead to the next major leap in human health.

Seeing the invisible: AI’s role in protein sequencing

The AI revolution: how AlphaFold and Cryo-EM work together

Training AI for faster, more efficient modeling

Scaling the process for practical application

Implications for modern drug discovery

Broader impact and future opportunities

Explore More from This Research Series

Our Purpose

Our Purpose

Diversity

Innovation

Humanities

Entrepreneurship

Diversity

Innovation

Humanities

Entrepreneurship