Genomics Has a Data Problem. These Companies Are Building the Infrastructure to Solve It.

by Rabbt | May 19, 2026 | Health, Medical Tech | 0 comments

Sequencing the human genome was the hard part. Turning that data into something useful at scale is where the real infrastructure build-out begins.

By Rabbt

The Human Genome Project completed its first draft in 2003 and took thirteen years. A modern NovaSeq X instrument can sequence a human genome in under a day for roughly two hundred dollars. That compression in cost and time is genuinely significant. However, the story most coverage tells about genomics stops there, at the sequencing milestone, as if the hardware were the finish line.

It is not. Sequencing a genome generates roughly 100 gigabytes of raw data per person. The global installed base of Illumina instruments alone produced more than 480 petabases of data in 2024. That volume has created a structural problem the industry has no clean answer to: genomic data is now far easier to generate than it is to store, process, interpret, or connect to clinical decisions. The infrastructure layer between sequence generation and actionable insight is where the real competition is happening, and it is a competition most investors have not mapped correctly.

The Infrastructure Gap

Three structural forces have converged in 2025 and 2026 to make this infrastructure layer genuinely consequential. First, whole-genome sequencing costs have crossed a threshold where population-scale data generation is financially feasible. Illumina’s NovaSeq X fleet is approaching 900 installed systems globally, and the company announced in February 2026 a roadmap pushing output to 35 billion reads per run with quality scores that support molecular residual disease testing at scale. Second, pharmaceutical companies have recognized that their most expensive failure mode is late-stage clinical trial failure driven by poor patient stratification. The answer is better molecular data, earlier in the development process. Third, AI model development for biology requires training datasets of a size and quality that only large-scale sequencing infrastructure can produce.

These three forces explain why genomics is not a research story anymore. It is an infrastructure story, and the companies positioned at different layers of that stack have meaningfully different structural positions and dependency profiles.

Stay ahead of the Frontier Economy. Rabbt Research publishes deep-dive intelligence on the companies and sectors shaping what comes next. Subscribe at rabbt.substack.com

Tempus AI: The Interpretation Layer

Ticker: TEM (NASDAQ) | Price (May 18, 2026): ~$43-45 | Market Cap: ~$7.9B | 52-Week Range: $41.73 – $104.32

Tempus AI does not manufacture sequencing instruments. It does not own the wet-lab infrastructure that generates raw genomic data. What it controls is the layer above: a de-identified multimodal database of clinical records, genomic sequences, imaging data, and pathology that is large enough to be structurally significant to pharmaceutical companies designing clinical trials.

The company’s structural position rests on a specific dependency pharmaceutical firms have developed. AI-driven clinical trial design requires training data that connects molecular subtypes to clinical outcomes. Assembling that data internally, at the scale and diversity Tempus has built, would take years and hundreds of millions of dollars for any individual pharma company. Tempus provides that infrastructure as a service, which creates a recurring revenue dependency that is different from a standard diagnostics contract.

The most significant recent signal is the May 14, 2026 expansion of Tempus’s collaboration with Bristol Myers Squibb. The partnership applies Tempus’s Lens analytical platform and multimodal real-world data to five clinical development programs across oncology and neuroscience. The structural read is that BMS is using Tempus’s dataset to pressure-test trial assumptions before committing capital to late-stage development. Additionally, Tempus completed a convertible notes offering in May 2026 and retired secured debt, which changes its near-term capital structure.

The key dependencies to watch: Tempus generates $1.36B in trailing twelve-month revenue but reported a $302M net loss. The business model requires that large pharma partnerships scale in value faster than operating costs. If any two or three of the largest collaborators reduce spending or bring data operations in-house, the revenue model compresses quickly. The BMS relationship spans 13 community health systems and five clinical programs currently, suggesting depth rather than breadth, but that concentration is itself a structural fragility.

What to watch: Whether the BMS collaboration expands to additional therapeutic areas after initial programs close. Whether Q2 2026 revenue mix shows diagnostics revenue growing relative to data licensing, which would indicate the platform is reaching clinical adoption at scale rather than remaining primarily a pharma research tool.

Illumina: The Hardware Substrate

Ticker: ILMN (NASDAQ) | Price (May 18, 2026): ~$143 | Market Cap: ~$21.6B | 52-Week Range: $78.55 – $155.53

Illumina’s structural position is easier to state than most analysis gives it credit for. The company controls the physical substrate through which most of the world’s genomic data passes. Approximately 80 percent of the global installed base for high-throughput DNA sequencing runs on Illumina instruments. The NovaSeq X fleet, at 890+ systems globally by end of 2025, represents the largest single concentration of sequencing infrastructure in existence. Every genome sequenced on that fleet generates consumables revenue for Illumina, which creates a predictable recurring revenue layer that instrument sales alone do not capture.

The February 2026 NovaSeq X roadmap announcement was structurally significant for a specific reason: it committed Illumina to delivering a 40 percent increase in output per run, faster turnaround, and new flow cell configurations across its entire installed base. That means 890 existing systems get more capable through software and chemistry updates without hardware replacement. Customers who bought NovaSeq X instruments get increasing value over time, which deepens the switching cost. The Constellation mapped-read technology, expected for commercial availability in 2026, is also worth watching because it eliminates library preparation, one of the most labor-intensive steps in current short-read workflows.

The key dependencies: China exposure is material. Export restrictions on sequencing instruments to China have reduced a growth market that previously represented double-digit revenue contribution. Illumina is also watching emerging long-read competition from Oxford Nanopore and PacBio in applications where short-read accuracy is insufficient. The company’s 2026 guidance upgrade and expanded share buyback program signal management confidence, but the China variable remains unresolved.

What to watch: Commercial launch timing and early uptake metrics for Constellation. Whether the Alliance for Genomic Discovery expansion and Veritas Health preventive genomics consortium generate consumables pull-through, because these population-scale programs are precisely the use case that justifies the NovaSeq X installed base at current pricing.

With no PCR amplification step, base modifications are directly detected during sequencing. Measurement of variation in polymerase kinetics of DNA base incorporation eliminates the need for chemical modification to detect base modifications.

PacBio: The Long-Read Differentiation Layer

Ticker: PACB (NASDAQ) | Price (May 18, 2026): ~$1.17 | Market Cap: ~$362M | 52-Week Range: $0.85 – $2.73

PacBio’s structural position is narrow but increasingly defensible in a specific context. HiFi long-read sequencing detects variants in genomic regions that short-read methods cannot reliably resolve, including complex structural variants, repetitive sequences, and epigenetic modifications. For rare disease research, phasing, and population genomics applications where short-read platforms produce ambiguous results, PacBio’s technology is the only commercially available solution with sufficient accuracy and scale.

The company made a deliberate strategic move in early 2026: it sold its short-read sequencing intellectual property and related assets to Illumina for approximately $48.1M in net cash proceeds. That transaction sharpened PacBio’s focus on long-read applications exclusively, reduced its competitive surface with Illumina, and generated near-term liquidity. The structural read is that PacBio’s management accepted a smaller addressable market in exchange for a more defensible position within it.

The recent signal with the most structural weight is the Trillion Gene Atlas collaboration with Basecamp Research, announced March 2026. The project will generate approximately 100,000 deeply sequenced samples from more than 31 countries using PacBio’s Revio system and SPRQ-Nx chemistry. PacBio is collaborating with Anthropic, NVIDIA, and Ultima Genomics on the initiative. That is a meaningful data point: three companies building AI foundation models at scale chose PacBio’s long-read accuracy as the data generation layer. Additionally, PacBio launched the HiFi Solves Global Consortium with DNAstack, connecting nearly 30 institutions across 15 countries to a federated dataset of 10,000+ HiFi whole genomes.

The key dependencies: PacBio’s 2025 full-year revenue was $160M against significant operating losses. The SPRQ-Nx chemistry update, targeting a cost reduction to below $300 per genome from current levels, is the single most important near-term variable. Commercial availability of SPRQ-Nx in 2026 determines whether PacBio can reach the price point at which population-scale adoption becomes viable. The company has committed to platform support through 2032, but balance sheet management through the transition period is a genuine constraint.

What to watch: SPRQ-Nx full commercial availability date and initial per-genome pricing. PacBio’s Q2 2026 consumables revenue as an indicator of whether the Vega benchtop system is gaining traction in clinical research settings beyond academic genome centers.

Company Comparison: Genomics Infrastructure Stack

Company	Role in Stack	Structural Position	Key Dependency	What to Watch
Tempus AI (TEM)	AI interpretation layer; multimodal clinical data platform	Controls the largest de-identified multimodal genomic and clinical dataset in the U.S. Pharma partnerships lock in recurring data revenue.	Continued pharma spending on AI-driven trial design. Revenue concentration risk across a small number of large collaborators.	Whether the BMS collaboration expands to additional disease areas; Q2 2026 revenue mix between data licensing and diagnostics.
Illumina (ILMN)	Sequencing hardware and consumables; short-read infrastructure backbone	Controls roughly 80% of the global installed base for high-throughput sequencing instruments. The NovaSeq X fleet of 890+ systems is the physical substrate most genomic data runs through.	China export restrictions limiting a key growth market. Emerging long-read competition from Oxford Nanopore and PacBio.	Commercial launch of Constellation mapped-read technology; pace of NovaSeq X consumables revenue growth as the installed base expands.
PacBio (PACB)	Long-read sequencing technology; HiFi accuracy layer for complex variants	Holds a differentiated position in long-read accuracy where short-read methods fail. Sold short-read IP to Illumina and sharpened focus on rare disease and population genomics.	Achieving commercial scale. Current revenue is $160M annually against significant operating losses. SPRQ-Nx chemistry cost reduction is critical.	SPRQ-Nx full commercial availability and per-genome cost hitting sub-$300; uptake of the Trillion Gene Atlas collaboration as a signal of production-scale demand.

The Honest Tension

The structural argument here is that the value in the genomics ecosystem will increasingly sit in the layers above sequencing hardware: data curation, interpretation, and clinical integration. However, that argument rests on the assumption that proprietary data moats are durable. They may not be. Several scenarios could compress the advantage each of these companies currently holds.

Illumina’s installed base advantage erodes if long-read sequencing costs reach parity with short-read. Oxford Nanopore’s third-generation sequencing technology is advancing. PacBio’s per-genome costs are falling. If the accuracy gap closes at a price point hospitals and research centers can afford, the 890-system NovaSeq X fleet is not a moat. It is a legacy installed base.

Tempus’s data advantage depends on continued pharma willingness to pay for external data rather than building internal data infrastructure. Several large pharmaceutical companies have announced or are rumored to be building internal real-world evidence teams. If that trend accelerates, Tempus’s revenue model faces compression from both sides: more internal competition from pharma clients and more external competition from other data platform companies. The structural position is real. Its durability over a five-year horizon is a genuine open question.

Rabbt Intelligence NoteA structured Research File on Tempus AI would map the company’s pharma collaboration revenue against its multimodal dataset scale, and flag customer concentration as the condition most likely to shift this picture if any single large partnership does not renew. The Relationship Graph would show how Tempus sits between pharmaceutical R&D budgets and clinical health systems, a structural position most coverage describes as a data company rather than the dependency layer it actually is. A Research File on Illumina would map the NovaSeq X consumables trajectory against China export restriction exposure, and flag the Constellation technology launch as the most significant near-term change trigger for its competitive moat in short-read sequencing. For PacBio, the Relationship Graph would show the Trillion Gene Atlas partnership connecting three separate AI and compute dependencies: Anthropic, NVIDIA, and Ultima Genomics, none of which is stable if PacBio’s balance sheet deteriorates before SPRQ-Nx reaches commercial scale. The open question: if long-read sequencing costs reach parity with short-read within two years, which layer of this stack captures the economic value of that shift?

0 Comments

Submit a Comment Cancel reply

Tag's

YOU MIGHT HAVE MISSED

Rabbt May 6, 2026

Quantum Computing’s First Real Commercial Use Cases Are Closer Than You Think

The quantum computing story has been told as a future event for twenty years. That framing is now wrong. It was not wrong because the technology suddenly...

Rabbt May 4, 2026

Critical Minerals: The U.S. Supply Chain Build-Out

The U.S. does not have a critical minerals problem. It has a processing problem. The ore is in the ground. The lithium deposits are confirmed. The rare earth...

Rabbt Nov 15, 2025

Why “DePIN” Might Be the Most Undervalued AI Play of the Decade

While everyone’s chasing cloud stocks and AI tokens, DePIN is quietly building the rails of the next trillion-dollar infrastructure shift. By Steve Hubbard |...

Rabbt Nov 11, 2025

The 2025 Medical Breakthrough Boom: Ten Emerging Players to Watch Before Wall Street Wakes Up

Medicine has finally crossed the sci-fi line.Cures that once lived in labs are now on the market. Neural interfaces let paralyzed patients move robotic arms....

Rabbt Oct 31, 2025

The Dawn of a New Internet: Why Quantum Networking Could Be the Next Big Moonshot

And so, we stand at the brink. The internet as we know it, a web of classical bits, miles of fiber, and predictable vulnerabilities, is nearing a...

Rabbt Oct 29, 2025

Kilimo: The Company Turning Drought Into Opportunity

The kind of silence that hangs over a dry field after the irrigation lines stop running. In the central valleys of Argentina, where the soil once gleamed with...

Rabbt Oct 27, 2025

Why Critical Metals Corp.’s Appointment of Rear Admiral Peter Stamatopoulos Signals a Strategic Shift in Critical-Metals Investing

When mining companies talk about growth, they usually mean more tons, higher grades, bigger drills. But sometimes growth is also about who you bring in—and...

Rabbt Sep 30, 2025

The Hidden Engine of AI and Space Tech: Gold, Silver, Titanium and Rare Metals

Why future breakthroughs in computing, renewable energy, and medicine depend on metals most people overlook. Introduction: The Invisible Backbone of...

Rabbt Jun 30, 2025

Soaring Higher: The Race to Power Electric Flight with Advanced Batteries

From the dawn of powered flight, weight and power have always been critical challenges. Initially driven by gasoline engines and later refined through...

Rabbt May 29, 2025

Scientists Just Made Light Solid – Here’s How It Could Reshape Tech, Quantum Computing, and the $1 Trillion Photonics Race

In a groundbreaking achievement, scientists have transformed light into a "super solid," a state of matter that exhibits properties of both solids and liquids...

Links

Categories