Where is our data going?

That is the question that derails more enterprise AI projects than any technical limitation. The proof of concept works. The use cases are approved. The budget is allocated. And then information security, legal, or compliance ask where the prompts, completions, and training data actually end up. The answer, on shared infrastructure operated by a third party, is enough to halt the project entirely in regulated industries.

The adoption gap is a trust gap

The numbers on enterprise AI adoption look impressive at the headline level. Azure OpenAI now serves over 230,000 organisations globally, and 80 percent of Fortune 500 companies use Azure AI Foundry (Source: Microsoft Q4 FY2025 earnings). McKinsey’s 2025 Global Survey on AI found that adoption continues to accelerate, with organisations increasingly embedding AI into core operations.

But dig beneath those headlines and a different picture emerges. Cloudera’s 2025 survey of nearly 1,500 enterprise IT leaders across 14 countries found that data privacy is the number one barrier to scaling AI, cited by 53 percent of respondents. Integration with legacy systems (40 percent) and high implementation costs (39 percent) followed. The World Quality Report 2025 from OpenText and Capgemini paints an even starker picture: while 90 percent of organisations are pursuing generative AI, only 15 percent have achieved enterprise-scale deployment. Data privacy risks (67 percent), integration complexity (64 percent), and reliability concerns (60 percent) are the top blockers.

The pattern is consistent across every major survey. Organisations want to use AI. They have approved budgets and identified use cases. But the moment production data is involved, the conversation shifts from “can we?” to “should we?” And for many, that shift stalls the project entirely.

Private LLM deployments resolve this tension architecturally rather than contractually.

What “private” actually means

The term gets used loosely, so it is worth being precise. There are three distinct deployment models for enterprise LLMs, each with different implications for data control, cost, and complexity.

The first is a shared API with contractual protections. This is the standard Azure OpenAI Service. Your prompts and completions are processed on shared infrastructure. Microsoft’s data processing terms state that your data is not used to train models and is not accessible to other customers. For many organisations, this is sufficient. The protection is contractual and compliance-certified (SOC 2, ISO 27001, HIPAA eligible), and for general-purpose, non-sensitive workloads, it is the fastest path to value.

The second is Azure OpenAI with private endpoints and network isolation. The same managed service, but deployed within your Virtual Network (VNet). Traffic between your application and the model endpoint never traverses the public internet. Combined with Azure Private Link, managed identity authentication, and customer-managed encryption keys, this gives you network-level isolation while Microsoft still manages the model infrastructure. This is where the majority of regulated enterprises land.

The third is self-hosted models on dedicated Azure infrastructure. Open-source models (Llama, Mistral, Phi) deployed on your own Azure compute, typically using Azure Machine Learning managed endpoints or Azure Kubernetes Service. You control the model weights, the serving infrastructure, and every aspect of the data pipeline. Maximum control, maximum operational responsibility.

The right choice depends on your data sensitivity, volume, customisation needs, and team capability. Most organisations I work with start with the second option and selectively add self-hosted models for specific use cases where fine-tuning or cost optimisation at extreme scale justifies the operational overhead.

Five reasons organisations move to private deployments

1. Data sovereignty is non-negotiable

For organisations handling financial data, health records, legal documents, or proprietary intellectual property, the question is not whether the API provider’s terms are acceptable. The question is whether data should leave the controlled network boundary at all.

Azure OpenAI with private endpoints keeps data within your Azure subscription. No prompts, completions, or embeddings traverse the public internet. Combined with Azure’s regional data residency guarantees, this satisfies even the most demanding compliance teams. Organisations operating under GDPR, the UK Data Protection Act, FCA regulations, or NHS Data Security requirements can deploy AI that processes sensitive data without architectural compromise.

2. Cost predictability at scale

Public API pricing follows a pay-per-token model. At low volumes, this is efficient. At enterprise scale, the maths changes quickly.

Azure OpenAI’s provisioned throughput model lets you purchase dedicated capacity in Provisioned Throughput Units (PTUs). Instead of paying per token, you pay for guaranteed throughput. For sustained high-volume workloads, this typically reduces per-token costs by 50 to 80 percent compared to pay-as-you-go pricing. The cost becomes predictable, budgetable, and decoupled from the exponential growth in usage that successful AI adoption drives.

For organisations processing millions of tokens daily across multiple use cases, the financial case for provisioned or dedicated capacity is compelling. The pattern is well-documented: enterprise AI usage scales exponentially once adoption takes hold across business units, and pay-per-token costs scale with it. Provisioned throughput decouples cost growth from usage growth, turning an unpredictable variable expense into a fixed, plannable infrastructure line item.

3. Consistent performance without noisy neighbours

Shared API endpoints are subject to rate limiting, throttling, and variable latency during peak demand. If your AI is embedded in a customer-facing application, an internal decision-support tool, or a real-time workflow, inconsistent response times erode user trust and adoption.

Dedicated capacity eliminates the noisy-neighbour problem. Your throughput is guaranteed regardless of what other customers on the platform are doing. For latency-sensitive applications, this is not a nice-to-have. It is a hard requirement.

4. Full control over the AI stack

Private deployments give you architectural control that shared services cannot. This includes fine-tuning models on your proprietary data to improve domain-specific accuracy. It includes building RAG pipelines that connect to your internal document stores, databases, and knowledge bases without routing data through external services. It includes configuring custom content filters that align with your organisation’s specific risk profile rather than a one-size-fits-all default. And it includes implementing custom logging, monitoring, and audit trails that integrate with your existing observability stack.

This level of control is particularly important for organisations building AI-augmented products or services where the model behaviour needs to align precisely with business requirements.

5. Regulatory trajectory is clear

The EU AI Act, the UK’s AI Safety Institute framework, and sector-specific regulations are all moving towards requiring organisations to demonstrate how AI systems reach their outputs. Audit trails, bias monitoring, explainability, and human oversight are becoming compliance requirements, not optional best practices.

Private deployments integrate with Microsoft Purview for data governance, Defender for Cloud for security monitoring, and Azure Monitor for operational observability. This gives compliance teams the logging and audit infrastructure they need to satisfy regulatory requirements, both current and emerging.

The architecture of a private LLM deployment on Azure

A production-grade private LLM deployment on Azure typically consists of several connected layers.

At the network layer, the Azure OpenAI resource is deployed with a private endpoint inside your VNet. DNS resolution routes to the private IP. Network Security Groups restrict inbound and outbound traffic. If you are running a hub-and-spoke network topology, the private endpoint lives in a spoke VNet peered to your central hub.

At the identity layer, Azure Managed Identity authenticates your applications to the OpenAI resource. No API keys stored in configuration files or environment variables. Role-Based Access Control (RBAC) governs which applications, users, and service principals can invoke the model.

At the data layer, your RAG pipeline connects to Azure AI Search (formerly Cognitive Search) with its own private endpoint. Documents are chunked, embedded, and indexed within your network boundary. The search index, the embedding model, and the completion model all communicate over private connections.

At the application layer, your front-end applications call the model through an API Management gateway or a direct SDK connection. APIM provides rate limiting, request logging, and cost allocation across business units.

At the observability layer, Azure Monitor, Application Insights, and custom dashboards track token consumption, latency, error rates, and content safety events. Diagnostic logs flow to Log Analytics or a SIEM for compliance retention.

This is not trivial to set up correctly. But it is a well-understood pattern with mature tooling, and once established, it provides a secure, scalable foundation for every AI use case your organisation deploys.

When private deployment does not make sense

Private deployments add cost and complexity. They are not the right choice for every organisation or every use case.

If your AI use cases are experimental and low-volume, the standard Azure OpenAI API with data processing agreements is sufficient and dramatically simpler. If your data is not regulated or commercially sensitive, the shared service provides the same model quality with lower operational overhead. If your team does not have Azure networking and identity management skills, the private endpoint configuration can become a source of ongoing operational friction.

The question is not “should we go private?” It is “do our data sensitivity, volume, and governance requirements justify the additional infrastructure complexity?” For many organisations, the answer starts as no and becomes yes as AI moves from experimentation to production.

Private LLM Readiness Assessment
Evaluate five dimensions to determine whether your organisation should consider a private LLM deployment on Azure. Select your level for each dimension.
Data Sensitivity & Compliance
How sensitive is the data your AI needs to process?
Usage Volume & Cost Trajectory
How many tokens are you processing, and where is that heading?
Customisation Requirements
Do you need fine-tuning, custom system prompts, or domain-specific RAG?
Latency & Reliability Needs
How critical is consistent, low-latency AI response for your use cases?
AI Governance & Auditability
Can you audit every prompt, completion, and model decision?

The Osmos acquisition signals where Fabric fits

In January 2026, Microsoft acquired Osmos, an agentic AI data engineering platform (Source: Microsoft Blog, January 2026). The acquisition is strategically significant because it embeds autonomous data preparation directly into Microsoft Fabric. Osmos agents generate production-grade PySpark notebooks that handle ingestion, transformation, and validation, reducing data engineering effort by over 50 percent according to Microsoft’s own assessment.

This matters for private LLM deployments because the biggest bottleneck is rarely the model. It is getting enterprise data into a state where the model can use it effectively. RAG pipelines depend on clean, well-structured, properly chunked data. Fabric with Osmos-powered agents accelerates the data preparation that feeds your private LLM, creating a shorter path from raw enterprise data to AI-ready assets within your controlled environment.

Combined with Fabric IQ, the new semantic workload announced at Ignite 2025, organisations can define ontologies that give AI agents a structured understanding of business entities and relationships. The model does not just process text. It reasons over your business concepts, grounded in data that never leaves your Azure subscription.

Making it practical

For organisations considering a private LLM deployment, the approach that works is incremental rather than all-at-once.

Start by deploying Azure OpenAI with a private endpoint for your highest-sensitivity use case. This establishes the network architecture, identity patterns, and operational processes without attempting to migrate everything at once.

Build your RAG pipeline against a focused document set. Prove that retrieval-augmented generation delivers more accurate, grounded responses than the base model alone. Measure the quality improvement.

Add provisioned throughput when daily token consumption makes the economics favourable. The crossover point depends on your specific usage pattern, but as a rough guide, sustained consumption above one to two million tokens per day typically justifies provisioned capacity.

Extend to additional use cases once the infrastructure and operational patterns are established. Each subsequent deployment is faster because the networking, identity, and observability foundations are already in place.

This is the Build phase of our Design, Build, Manage methodology. The design work identifies the use cases and architectural requirements. The build work establishes the private infrastructure and proves value with targeted deployments. The manage work ensures ongoing performance, cost optimisation, and compliance as AI usage scales across the organisation.

Key takeaways

Private LLM deployments on Azure are not about restricting AI adoption. They are about removing the barriers that prevent AI from reaching production. When data security, cost predictability, performance consistency, and regulatory compliance are addressed architecturally, the “should we?” question becomes “when do we start?”

The organisations moving fastest with enterprise AI are the ones that resolved the trust question early. They built the private infrastructure, proved value with targeted use cases, and scaled from a position of confidence rather than caution.

The technology is mature. The patterns are well-established. The remaining question is whether your architecture is ready to support what your AI strategy demands.


Sources

  1. Microsoft. Q4 FY2025 Earnings: Azure OpenAI serves 230,000+ organisations; 80% of Fortune 500 use Azure AI Foundry. 2025.
  2. McKinsey & Company. “The State of AI: Global Survey.” November 2025.
  3. Cloudera. “The Future of Enterprise AI Agents.” Survey of 1,484 enterprise IT leaders across 14 countries. April 2025.
  4. OpenText & Capgemini. “World Quality Report 2025.” Survey of 2,000+ senior executives across 22 countries. November 2025.
  5. Microsoft. “Microsoft announces acquisition of Osmos to accelerate autonomous data engineering in Fabric.” January 2026.
  6. Microsoft. “Fabric IQ: The Semantic Foundation for Enterprise AI.” Ignite 2025 announcement. November 2025.
  7. Microsoft. Azure OpenAI Service documentation: Private endpoints, provisioned throughput, and data processing. 2025-2026.

Ready to architect your private AI deployment? Take our AI Maturity Assessment to identify your primary constraint. In under 3 minutes, you’ll receive a personalised report with specific recommendations for your situation.

Take the Assessment →