Unified Environment Infrastructure

A universal execution layer powering the complete model lifecycle:
Data Curation (SFT), Reinforcement Learning (RL), and Benchmark Evaluation.

Core Components

Breakdown of the operational modules in the diagram.

Gen Code & Data Pipeline

The driver of the operation. During SFT, it generates synthetic trajectories for data curation. In RL, it executes policy-based actions. For Evaluation, it runs benchmark tasks against the environment.

Batch Service Orchestration

Leverages Azure Batch to handle massive parallelism across all lifecycle stages. It dynamically scales resources whether running a nightly benchmark or a week-long RL training run.

Windows Docker Containers dockur/windows

Architecture Highlight: These containers run full Windows environments. This allows the system to verify visual outcomes (e.g., "is the text bold?") which is critical for both Reward calculation and Benchmark scoring.

Feedback Loop & Storage

Azure Storage captures the results. It records successful trajectories for training data, calculates reward signals for the RL policy, and stores output documents for performance analysis.

Environment Specifications

The technical foundation powering the containers.

Container Base
Windows-in-Docker configuration using KVM acceleration. Optimized to run a full GUI environment inside a container for headed automation.
Office LTSC 2024
Persistent, volume-licensed Office runtime optimized for air-gapped or regulated automation. Provides static versions of Word, Excel, and PowerPoint with policy-controlled VBA execution.
Scaling Limits
Supports dynamic scaling via Azure Batch. Mix of Dedicated (always-on) and Scaled (spot) nodes to handle variable dispatch loads.
Security & Storage
Ephemeral file systems ensure session data is wiped post-execution. Logs and artifacts are securely offloaded to Azure Storage.