Environments Module
The envs module provides Gymnasium-compatible reinforcement learning environments for infrastructure management.
Base Environment
Base infrastructure management environment for gymnasium and stable-baselines3.
This module provides a base class for infrastructure maintenance environments that:
Is fully compatible with gymnasium and stable-baselines3
Uses the updated Simulator with model dependency system
Provides abstract methods for model creation (no models defined in base)
Supports all observability modes and reward schemes
Includes comprehensive action and observation space handling
The base class assumes all inheriting environments will define appropriate models (dynamics, cost, budget, hierarchy, metadata) as needed.
Example
Creating a custom environment by inheriting from BaseInfraEnv:
class MyCustomEnv(BaseInfraEnv):
def _create_models(self):
dynamics = MyDynamicsModel()
cost = MyCostModel()
budget = MyBudgetModel()
return dynamics, cost, budget, None, None
def _compute_reward(self, sim_info):
return -sim_info['total_cost']
Classes
BaseInfraEnv : Abstract base class for infrastructure environments
- class infralib.envs.base.BaseInfraEnv(n_components: int, max_steps: int = 365, observability: str = 'full', action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
-
Abstract base class for infrastructure maintenance environments.
This base class provides the core functionality for infrastructure maintenance RL environments while requiring subclasses to define their own models and reward functions. It is fully compatible with gymnasium and stable-baselines3.
The environment handles: - Action and observation space definition - Simulator integration with model dependencies - Episode management (reset, step, termination) - Multiple observability modes (full, partial, noisy) - Rendering capabilities
Subclasses must implement: - _create_models(): Define dynamics, cost, budget, and optional hierarchy/metadata - _compute_reward(): Define reward function based on simulation info
- Parameters:
n_components (int) – Number of infrastructure components to simulate
max_steps (int, default 365) – Maximum number of steps per episode
observability ({'full', 'partial', 'noisy'}, default 'full') – Type of state observability
action_type ({'multi_discrete', 'discrete', 'box'}, default 'multi_discrete') – Format of action space
render_mode (str, optional) – Rendering mode (‘human’, ‘rgb_array’, None)
rich_display (bool, default False) – Enable rich terminal displays during simulation
seed (int, optional) – Random seed for reproducibility
- action_space
Gymnasium action space
- Type:
gym.Space
- observation_space
Gymnasium observation space
- Type:
gym.Space
Notes
This class is designed to work seamlessly with stable-baselines3 and other modern RL libraries. All environments created by inheriting from this class will pass gymnasium’s env_checker.
The action space supports multiple formats: - ‘multi_discrete’: Separate action per component [4, 4, …, 4] - ‘discrete’: Single action encoding all components (4^n_components) - ‘box’: Continuous actions (for advanced use cases)
Examples
>>> class SimpleEnv(BaseInfraEnv): ... def _create_models(self): ... dynamics = MarkovDynamics(n_states=10) ... cost = SimpleCost() ... budget = FixedBudget(initial_budget=5000) ... return dynamics, cost, budget, None, None ... ... def _compute_reward(self, sim_info): ... return -sim_info['total_cost'] - sim_info['failures'] * 100 ... >>> env = SimpleEnv(n_components=5) >>> obs, info = env.reset() >>> action = env.action_space.sample() >>> obs, reward, terminated, truncated, info = env.step(action)
- __init__(n_components: int, max_steps: int = 365, observability: str = 'full', action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
- reset(seed: int | None = None, options: dict[str, Any] | None = None) tuple[ndarray, dict[str, Any]][source]
Reset the environment to start a new episode.
- step(action: int | ndarray) tuple[ndarray, float, bool, bool, dict[str, Any]][source]
Take a step in the environment.
- Parameters:
action (int or np.ndarray) – Action to take, format depends on action_type
- Returns:
(observation, reward, terminated, truncated, info)
- Return type:
- Raises:
RuntimeError – If called on terminated/truncated environment
ValueError – If action format is invalid
- infralib.envs.base.make_env_from_config(env_class, config_path: str, **kwargs) BaseInfraEnv[source]
Create environment from configuration file.
- Parameters:
env_class (class) – Environment class that inherits from BaseInfraEnv
config_path (str) – Path to YAML configuration file
**kwargs – Additional keyword arguments to override config
- Returns:
Configured environment instance
- Return type:
Simple Environment
Simple infrastructure management environments using config files.
This module provides simple implementations of infrastructure maintenance environments that can be configured via YAML config files and CSV component data files. These environments use standard models and are designed for ease of use and educational purposes.
The environments support:
Configuration-based setup from YAML and CSV files
Both POMDP (partial observability) and MDP (full observability) variants
Stable-baselines3 compatibility
Multiple reward schemes and termination conditions
Rich terminal displays and rendering
Example
Using configuration files to create environments:
# Create POMDP environment
env = SimpleInfraEnv.from_config(
config_path='config.yaml',
components_path='components.csv'
)
# Create MDP environment
env = SimpleInfraMDPEnv.from_config(
config_path='config.yaml',
components_path='components.csv'
)
Classes
SimpleInfraEnv : POMDP-style infrastructure environment SimpleInfraMDPEnv : MDP-style infrastructure environment with component margins
Functions
load_config_data : Load parameters from config and component files
- infralib.envs.simple.load_config_data(config_path: str, components_path: str) dict[str, Any][source]
Load configuration from YAML and CSV files.
- Parameters:
- Returns:
Dictionary containing all loaded parameters
- Return type:
Examples
>>> params = load_config_data('config.yaml', 'components.csv') >>> print(f"Budget: {params['initial_budget']}") >>> print(f"Components: {len(params['component_types'])}")
- class infralib.envs.simple.SimpleInfraEnv(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'cost_penalty', max_steps: int = 100, observability: str = 'partial', action_type: str = 'discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
Bases:
BaseInfraEnvSimple POMDP infrastructure environment with configuration support.
This environment simulates infrastructure maintenance under partial observability where components can only be observed through inspections. The environment uses configuration files to define component properties and model parameters.
Features: - POMDP formulation with inspection-based observations - Configuration-based setup from YAML/CSV files - Multiple reward schemes and termination conditions - Support for component types with different characteristics - Rich terminal displays and basic rendering
- Parameters:
config_path (str, optional) – Path to YAML configuration file
components_path (str, optional) – Path to CSV components data file
reward_scheme ({'cost_penalty', 'survival', 'condition'}, default 'cost_penalty') – Reward function to use
max_steps (int, default 100) – Maximum episode length
observability ({'full', 'partial', 'noisy'}, default 'partial') – Observation mode (partial recommended for POMDP)
action_type ({'multi_discrete', 'discrete'}, default 'discrete') – Action space format
render_mode (str, optional) – Rendering mode
rich_display (bool, default False) – Enable rich terminal status displays
- failure_thresholds
Failure thresholds per component
- Type:
np.ndarray
Notes
This environment is designed for training RL agents on infrastructure maintenance problems with realistic component degradation and cost models. The POMDP formulation requires agents to balance exploration (inspections) with exploitation (maintenance actions).
Actions are: - 0: Do nothing - 1: Inspect component - 2: Repair component - 3: Replace component
Observations include last inspection results, time since inspections, and remaining budget information.
Examples
>>> env = SimpleInfraEnv.from_config('config.yaml', 'components.csv') >>> obs, info = env.reset() >>> action = env.action_space.sample() >>> obs, reward, terminated, truncated, info = env.step(action)
>>> # Check environment with stable-baselines3 >>> from stable_baselines3.common.env_checker import check_env >>> check_env(env, warn=True)
- __init__(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'cost_penalty', max_steps: int = 100, observability: str = 'partial', action_type: str = 'discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
- classmethod from_config(config_path: str, components_path: str, **kwargs) SimpleInfraEnv[source]
Create environment from configuration files.
- Parameters:
- Returns:
Configured environment instance
- Return type:
Examples
>>> env = SimpleInfraEnv.from_config( ... 'config.yaml', 'components.csv', ... reward_scheme='survival', max_steps=200 ... )
- class infralib.envs.simple.SimpleInfraMDPEnv(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'margin', max_steps: int = 100, action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
Bases:
BaseInfraEnvSimple MDP infrastructure environment with component margins.
This environment provides full observability of component states and focuses on the margin between current state and failure threshold. This formulation is easier to learn for many RL algorithms as it provides direct state information.
Features: - MDP formulation with full state observability - Component margins as primary observation - Configuration-based setup from YAML/CSV files - Margin-based reward functions - Support for component types with different failure thresholds
- Parameters:
config_path (str, optional) – Path to YAML configuration file
components_path (str, optional) – Path to CSV components data file
reward_scheme ({'margin', 'weighted_margin', 'binary'}, default 'margin') – Reward function to use
max_steps (int, default 100) – Maximum episode length
action_type ({'multi_discrete', 'discrete'}, default 'multi_discrete') – Action space format (multi_discrete recommended for MDP)
render_mode (str, optional) – Rendering mode
rich_display (bool, default False) – Enable rich terminal status displays
- failure_thresholds
Failure thresholds per component type
- Type:
np.ndarray
Notes
The MDP formulation uses component margins as the primary state representation: margin = (current_state - failure_threshold) / (max_state - failure_threshold)
This normalization makes the state space more uniform across component types and focuses learning on the critical region near failure thresholds.
Observations include: - Component margins (normalized to [-1, 1] range) - Normalized remaining budget
Examples
>>> env = SimpleInfraMDPEnv.from_config('config.yaml', 'components.csv') >>> obs, info = env.reset() >>> print(f"Component margins: {obs[:-1]}") # All but last element >>> print(f"Budget remaining: {obs[-1]}") # Last element
- __init__(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'margin', max_steps: int = 100, action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
- classmethod from_config(config_path: str, components_path: str, **kwargs) SimpleInfraMDPEnv[source]
Create MDP environment from configuration files.
- Parameters:
- Returns:
Configured MDP environment instance
- Return type:
Examples
>>> env = SimpleInfraMDPEnv.from_config( ... 'config.yaml', 'components.csv', ... reward_scheme='weighted_margin' ... )