Unified Environment Infrastructure

A universal execution layer powering the complete model lifecycle:
Data Curation (SFT), Reinforcement Learning (RL), and Benchmark Evaluation.

System Overview

The Lifecycle Engine

This infrastructure provides the "ground truth" for model development. By running tasks in real Windows containers, we support the entire pipeline from initial data synthesis to final benchmarking.

Supported Modes:

1. SFT Curation: Generating trial-and-error trajectories & quality checks.
2. RL Training: Computing environment rewards from execution states.
3. Benchmarking: Documenting outcomes against golden datasets.

Architecture Diagram showing Gen Code to Azure Storage flow

Core Components

Breakdown of the operational modules in the diagram.

Gen Code & Data Pipeline

The driver of the operation. During SFT, it generates synthetic trajectories for data curation. In RL, it executes policy-based actions. For Evaluation, it runs benchmark tasks against the environment.

Batch Service Orchestration

Leverages Azure Batch to handle massive parallelism across all lifecycle stages. It dynamically scales resources whether running a nightly benchmark or a week-long RL training run.

Windows Docker Containers dockur/windows

Architecture Highlight: These containers run full Windows environments. This allows the system to verify visual outcomes (e.g., "is the text bold?") which is critical for both Reward calculation and Benchmark scoring.

Feedback Loop & Storage

Azure Storage captures the results. It records successful trajectories for training data, calculates reward signals for the RL policy, and stores output documents for performance analysis.

Environment Specifications

The technical foundation powering the containers.

Container Base

Windows-in-Docker configuration using KVM acceleration. Optimized to run a full GUI environment inside a container for headed automation.

Office LTSC 2024

Persistent, volume-licensed Office runtime optimized for air-gapped or regulated automation. Provides static versions of Word, Excel, and PowerPoint with policy-controlled VBA execution.

Scaling Limits

Supports dynamic scaling via Azure Batch. Mix of Dedicated (always-on) and Scaled (spot) nodes to handle variable dispatch loads.

Security & Storage

Ephemeral file systems ensure session data is wiped post-execution. Logs and artifacts are securely offloaded to Azure Storage.