Environments Module

The envs module provides Gymnasium-compatible reinforcement learning environments for infrastructure management.

Base Environment

Base infrastructure management environment for gymnasium and stable-baselines3.

This module provides a base class for infrastructure maintenance environments that:

Is fully compatible with gymnasium and stable-baselines3
Uses the updated Simulator with model dependency system
Provides abstract methods for model creation (no models defined in base)
Supports all observability modes and reward schemes
Includes comprehensive action and observation space handling

The base class assumes all inheriting environments will define appropriate models (dynamics, cost, budget, hierarchy, metadata) as needed.

Example

Creating a custom environment by inheriting from BaseInfraEnv:

class MyCustomEnv(BaseInfraEnv):
    def _create_models(self):
        dynamics = MyDynamicsModel()
        cost = MyCostModel()
        budget = MyBudgetModel()
        return dynamics, cost, budget, None, None

    def _compute_reward(self, sim_info):
        return -sim_info['total_cost']

Classes

BaseInfraEnv : Abstract base class for infrastructure environments

class infralib.envs.base.BaseInfraEnv(n_components: int, max_steps: int = 365, observability: str = 'full', action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

Bases: Env, ABC

Abstract base class for infrastructure maintenance environments.

This base class provides the core functionality for infrastructure maintenance RL environments while requiring subclasses to define their own models and reward functions. It is fully compatible with gymnasium and stable-baselines3.

The environment handles: - Action and observation space definition - Simulator integration with model dependencies - Episode management (reset, step, termination) - Multiple observability modes (full, partial, noisy) - Rendering capabilities

Subclasses must implement: - _create_models(): Define dynamics, cost, budget, and optional hierarchy/metadata - _compute_reward(): Define reward function based on simulation info

Parameters:

n_components (int) – Number of infrastructure components to simulate
max_steps (int, default 365) – Maximum number of steps per episode
observability ({'full', 'partial', 'noisy'}, default 'full') – Type of state observability
action_type ({'multi_discrete', 'discrete', 'box'}, default 'multi_discrete') – Format of action space
render_mode (str, optional) – Rendering mode (‘human’, ‘rgb_array’, None)
rich_display (bool, default False) – Enable rich terminal displays during simulation
seed (int, optional) – Random seed for reproducibility

n_components

Number of components in the system

Type:: int

simulator

Infrastructure simulator instance

Type:: Simulator

current_step

Current episode step counter

Type:: int

action_space

Gymnasium action space

Type:: gym.Space

observation_space

Gymnasium observation space

Type:: gym.Space

Notes

This class is designed to work seamlessly with stable-baselines3 and other modern RL libraries. All environments created by inheriting from this class will pass gymnasium’s env_checker.

The action space supports multiple formats: - ‘multi_discrete’: Separate action per component [4, 4, …, 4] - ‘discrete’: Single action encoding all components (4^n_components) - ‘box’: Continuous actions (for advanced use cases)

Examples

>>> class SimpleEnv(BaseInfraEnv):
...     def _create_models(self):
...         dynamics = MarkovDynamics(n_states=10)
...         cost = SimpleCost()
...         budget = FixedBudget(initial_budget=5000)
...         return dynamics, cost, budget, None, None
...
...     def _compute_reward(self, sim_info):
...         return -sim_info['total_cost'] - sim_info['failures'] * 100
...
>>> env = SimpleEnv(n_components=5)
>>> obs, info = env.reset()
>>> action = env.action_space.sample()
>>> obs, reward, terminated, truncated, info = env.step(action)

metadata: dict[str, Any] = {'render_fps': 4, 'render_modes': ['human', 'rgb_array']}

__init__(n_components: int, max_steps: int = 365, observability: str = 'full', action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

reset(seed: int | None = None, options: dict[str, Any] | None = None) → tuple[ndarray, dict[str, Any]][source]

Reset the environment to start a new episode.

Parameters:

seed (int, optional) – Random seed for the episode
options (dict, optional) – Additional options including ‘initial_states’

Returns:

(observation, info) where observation is the initial state observation and info contains environment metadata

Return type:

tuple

step(action: int | ndarray) → tuple[ndarray, float, bool, bool, dict[str, Any]][source]

Take a step in the environment.

Parameters:

action (int or np.ndarray) – Action to take, format depends on action_type

Returns:

(observation, reward, terminated, truncated, info)

Return type:

tuple

Raises:

RuntimeError – If called on terminated/truncated environment
ValueError – If action format is invalid

render() → ndarray | str | None[source]

Render the environment state.

Returns:: Rendered output depending on render_mode
Return type:: np.ndarray or str or None

close()[source]: Clean up environment resources.

infralib.envs.base.make_env_from_config(env_class, config_path: str, **kwargs) → BaseInfraEnv[source]

Create environment from configuration file.

Parameters:

env_class (class) – Environment class that inherits from BaseInfraEnv
config_path (str) – Path to YAML configuration file
**kwargs – Additional keyword arguments to override config

Returns:

Configured environment instance

Return type:

BaseInfraEnv

Simple Environment

Simple infrastructure management environments using config files.

This module provides simple implementations of infrastructure maintenance environments that can be configured via YAML config files and CSV component data files. These environments use standard models and are designed for ease of use and educational purposes.

The environments support:

Configuration-based setup from YAML and CSV files
Both POMDP (partial observability) and MDP (full observability) variants
Stable-baselines3 compatibility
Multiple reward schemes and termination conditions
Rich terminal displays and rendering

Example

Using configuration files to create environments:

# Create POMDP environment
env = SimpleInfraEnv.from_config(
    config_path='config.yaml',
    components_path='components.csv'
)

# Create MDP environment
env = SimpleInfraMDPEnv.from_config(
    config_path='config.yaml',
    components_path='components.csv'
)

Classes

SimpleInfraEnv : POMDP-style infrastructure environment SimpleInfraMDPEnv : MDP-style infrastructure environment with component margins

Functions

load_config_data : Load parameters from config and component files

infralib.envs.simple.load_config_data(config_path: str, components_path: str) → dict[str, Any][source]

Load configuration from YAML and CSV files.

Parameters:

config_path (str) – Path to YAML configuration file
components_path (str) – Path to CSV components data file

Returns:

Dictionary containing all loaded parameters

Return type:

dict

Examples

>>> params = load_config_data('config.yaml', 'components.csv')
>>> print(f"Budget: {params['initial_budget']}")
>>> print(f"Components: {len(params['component_types'])}")

class infralib.envs.simple.SimpleInfraEnv(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'cost_penalty', max_steps: int = 100, observability: str = 'partial', action_type: str = 'discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

Bases: BaseInfraEnv

Simple POMDP infrastructure environment with configuration support.

This environment simulates infrastructure maintenance under partial observability where components can only be observed through inspections. The environment uses configuration files to define component properties and model parameters.

Features: - POMDP formulation with inspection-based observations - Configuration-based setup from YAML/CSV files - Multiple reward schemes and termination conditions - Support for component types with different characteristics - Rich terminal displays and basic rendering

Parameters:

config_path (str, optional) – Path to YAML configuration file
components_path (str, optional) – Path to CSV components data file
reward_scheme ({'cost_penalty', 'survival', 'condition'}, default 'cost_penalty') – Reward function to use
max_steps (int, default 100) – Maximum episode length
observability ({'full', 'partial', 'noisy'}, default 'partial') – Observation mode (partial recommended for POMDP)
action_type ({'multi_discrete', 'discrete'}, default 'discrete') – Action space format
render_mode (str, optional) – Rendering mode
rich_display (bool, default False) – Enable rich terminal status displays

params

Loaded configuration parameters

Type:: dict

failure_thresholds

Failure thresholds per component

Type:: np.ndarray

component_types

Component type names

Type:: list

Notes

This environment is designed for training RL agents on infrastructure maintenance problems with realistic component degradation and cost models. The POMDP formulation requires agents to balance exploration (inspections) with exploitation (maintenance actions).

Actions are: - 0: Do nothing - 1: Inspect component - 2: Repair component - 3: Replace component

Observations include last inspection results, time since inspections, and remaining budget information.

Examples

>>> env = SimpleInfraEnv.from_config('config.yaml', 'components.csv')
>>> obs, info = env.reset()
>>> action = env.action_space.sample()
>>> obs, reward, terminated, truncated, info = env.step(action)

>>> # Check environment with stable-baselines3
>>> from stable_baselines3.common.env_checker import check_env
>>> check_env(env, warn=True)

__init__(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'cost_penalty', max_steps: int = 100, observability: str = 'partial', action_type: str = 'discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

classmethod from_config(config_path: str, components_path: str, **kwargs) → SimpleInfraEnv[source]

Create environment from configuration files.

Parameters:

config_path (str) – Path to YAML configuration file
components_path (str) – Path to CSV components data file
**kwargs – Additional keyword arguments to override defaults

Returns:

Configured environment instance

Return type:

SimpleInfraEnv

Examples

>>> env = SimpleInfraEnv.from_config(
...     'config.yaml', 'components.csv',
...     reward_scheme='survival', max_steps=200
... )

class infralib.envs.simple.SimpleInfraMDPEnv(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'margin', max_steps: int = 100, action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

Bases: BaseInfraEnv

Simple MDP infrastructure environment with component margins.

This environment provides full observability of component states and focuses on the margin between current state and failure threshold. This formulation is easier to learn for many RL algorithms as it provides direct state information.

Features: - MDP formulation with full state observability - Component margins as primary observation - Configuration-based setup from YAML/CSV files - Margin-based reward functions - Support for component types with different failure thresholds

Parameters:

config_path (str, optional) – Path to YAML configuration file
components_path (str, optional) – Path to CSV components data file
reward_scheme ({'margin', 'weighted_margin', 'binary'}, default 'margin') – Reward function to use
max_steps (int, default 100) – Maximum episode length
action_type ({'multi_discrete', 'discrete'}, default 'multi_discrete') – Action space format (multi_discrete recommended for MDP)
render_mode (str, optional) – Rendering mode
rich_display (bool, default False) – Enable rich terminal status displays

params

Loaded configuration parameters

Type:: dict

failure_thresholds

Failure thresholds per component type

Type:: np.ndarray

max_states

Maximum component state value

Type:: int

Notes

The MDP formulation uses component margins as the primary state representation: margin = (current_state - failure_threshold) / (max_state - failure_threshold)

This normalization makes the state space more uniform across component types and focuses learning on the critical region near failure thresholds.

Observations include: - Component margins (normalized to [-1, 1] range) - Normalized remaining budget

Examples

>>> env = SimpleInfraMDPEnv.from_config('config.yaml', 'components.csv')
>>> obs, info = env.reset()
>>> print(f"Component margins: {obs[:-1]}")  # All but last element
>>> print(f"Budget remaining: {obs[-1]}")    # Last element

__init__(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'margin', max_steps: int = 100, action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

classmethod from_config(config_path: str, components_path: str, **kwargs) → SimpleInfraMDPEnv[source]

Create MDP environment from configuration files.

Parameters:

config_path (str) – Path to YAML configuration file
components_path (str) – Path to CSV components data file
**kwargs – Additional keyword arguments to override defaults

Returns:

Configured MDP environment instance

Return type:

SimpleInfraMDPEnv

Examples

>>> env = SimpleInfraMDPEnv.from_config(
...     'config.yaml', 'components.csv',
...     reward_scheme='weighted_margin'
... )