Environments Module

The envs module provides Gymnasium-compatible reinforcement learning environments for infrastructure management.

Base Environment

Base infrastructure management environment for gymnasium and stable-baselines3.

This module provides a base class for infrastructure maintenance environments that:

  • Is fully compatible with gymnasium and stable-baselines3

  • Uses the updated Simulator with model dependency system

  • Provides abstract methods for model creation (no models defined in base)

  • Supports all observability modes and reward schemes

  • Includes comprehensive action and observation space handling

The base class assumes all inheriting environments will define appropriate models (dynamics, cost, budget, hierarchy, metadata) as needed.

Example

Creating a custom environment by inheriting from BaseInfraEnv:

class MyCustomEnv(BaseInfraEnv):
    def _create_models(self):
        dynamics = MyDynamicsModel()
        cost = MyCostModel()
        budget = MyBudgetModel()
        return dynamics, cost, budget, None, None

    def _compute_reward(self, sim_info):
        return -sim_info['total_cost']

Classes

BaseInfraEnv : Abstract base class for infrastructure environments

class infralib.envs.base.BaseInfraEnv(n_components: int, max_steps: int = 365, observability: str = 'full', action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

Bases: Env, ABC

Abstract base class for infrastructure maintenance environments.

This base class provides the core functionality for infrastructure maintenance RL environments while requiring subclasses to define their own models and reward functions. It is fully compatible with gymnasium and stable-baselines3.

The environment handles: - Action and observation space definition - Simulator integration with model dependencies - Episode management (reset, step, termination) - Multiple observability modes (full, partial, noisy) - Rendering capabilities

Subclasses must implement: - _create_models(): Define dynamics, cost, budget, and optional hierarchy/metadata - _compute_reward(): Define reward function based on simulation info

Parameters:
  • n_components (int) – Number of infrastructure components to simulate

  • max_steps (int, default 365) – Maximum number of steps per episode

  • observability ({'full', 'partial', 'noisy'}, default 'full') – Type of state observability

  • action_type ({'multi_discrete', 'discrete', 'box'}, default 'multi_discrete') – Format of action space

  • render_mode (str, optional) – Rendering mode (‘human’, ‘rgb_array’, None)

  • rich_display (bool, default False) – Enable rich terminal displays during simulation

  • seed (int, optional) – Random seed for reproducibility

n_components

Number of components in the system

Type:

int

simulator

Infrastructure simulator instance

Type:

Simulator

current_step

Current episode step counter

Type:

int

action_space

Gymnasium action space

Type:

gym.Space

observation_space

Gymnasium observation space

Type:

gym.Space

Notes

This class is designed to work seamlessly with stable-baselines3 and other modern RL libraries. All environments created by inheriting from this class will pass gymnasium’s env_checker.

The action space supports multiple formats: - ‘multi_discrete’: Separate action per component [4, 4, …, 4] - ‘discrete’: Single action encoding all components (4^n_components) - ‘box’: Continuous actions (for advanced use cases)

Examples

>>> class SimpleEnv(BaseInfraEnv):
...     def _create_models(self):
...         dynamics = MarkovDynamics(n_states=10)
...         cost = SimpleCost()
...         budget = FixedBudget(initial_budget=5000)
...         return dynamics, cost, budget, None, None
...
...     def _compute_reward(self, sim_info):
...         return -sim_info['total_cost'] - sim_info['failures'] * 100
...
>>> env = SimpleEnv(n_components=5)
>>> obs, info = env.reset()
>>> action = env.action_space.sample()
>>> obs, reward, terminated, truncated, info = env.step(action)
metadata: dict[str, Any] = {'render_fps': 4, 'render_modes': ['human', 'rgb_array']}
__init__(n_components: int, max_steps: int = 365, observability: str = 'full', action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
reset(seed: int | None = None, options: dict[str, Any] | None = None) tuple[ndarray, dict[str, Any]][source]

Reset the environment to start a new episode.

Parameters:
  • seed (int, optional) – Random seed for the episode

  • options (dict, optional) – Additional options including ‘initial_states’

Returns:

(observation, info) where observation is the initial state observation and info contains environment metadata

Return type:

tuple

step(action: int | ndarray) tuple[ndarray, float, bool, bool, dict[str, Any]][source]

Take a step in the environment.

Parameters:

action (int or np.ndarray) – Action to take, format depends on action_type

Returns:

(observation, reward, terminated, truncated, info)

Return type:

tuple

Raises:
render() ndarray | str | None[source]

Render the environment state.

Returns:

Rendered output depending on render_mode

Return type:

np.ndarray or str or None

close()[source]

Clean up environment resources.

infralib.envs.base.make_env_from_config(env_class, config_path: str, **kwargs) BaseInfraEnv[source]

Create environment from configuration file.

Parameters:
  • env_class (class) – Environment class that inherits from BaseInfraEnv

  • config_path (str) – Path to YAML configuration file

  • **kwargs – Additional keyword arguments to override config

Returns:

Configured environment instance

Return type:

BaseInfraEnv

Simple Environment

Simple infrastructure management environments using config files.

This module provides simple implementations of infrastructure maintenance environments that can be configured via YAML config files and CSV component data files. These environments use standard models and are designed for ease of use and educational purposes.

The environments support:

  • Configuration-based setup from YAML and CSV files

  • Both POMDP (partial observability) and MDP (full observability) variants

  • Stable-baselines3 compatibility

  • Multiple reward schemes and termination conditions

  • Rich terminal displays and rendering

Example

Using configuration files to create environments:

# Create POMDP environment
env = SimpleInfraEnv.from_config(
    config_path='config.yaml',
    components_path='components.csv'
)

# Create MDP environment
env = SimpleInfraMDPEnv.from_config(
    config_path='config.yaml',
    components_path='components.csv'
)

Classes

SimpleInfraEnv : POMDP-style infrastructure environment SimpleInfraMDPEnv : MDP-style infrastructure environment with component margins

Functions

load_config_data : Load parameters from config and component files

infralib.envs.simple.load_config_data(config_path: str, components_path: str) dict[str, Any][source]

Load configuration from YAML and CSV files.

Parameters:
  • config_path (str) – Path to YAML configuration file

  • components_path (str) – Path to CSV components data file

Returns:

Dictionary containing all loaded parameters

Return type:

dict

Examples

>>> params = load_config_data('config.yaml', 'components.csv')
>>> print(f"Budget: {params['initial_budget']}")
>>> print(f"Components: {len(params['component_types'])}")
class infralib.envs.simple.SimpleInfraEnv(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'cost_penalty', max_steps: int = 100, observability: str = 'partial', action_type: str = 'discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

Bases: BaseInfraEnv

Simple POMDP infrastructure environment with configuration support.

This environment simulates infrastructure maintenance under partial observability where components can only be observed through inspections. The environment uses configuration files to define component properties and model parameters.

Features: - POMDP formulation with inspection-based observations - Configuration-based setup from YAML/CSV files - Multiple reward schemes and termination conditions - Support for component types with different characteristics - Rich terminal displays and basic rendering

Parameters:
  • config_path (str, optional) – Path to YAML configuration file

  • components_path (str, optional) – Path to CSV components data file

  • reward_scheme ({'cost_penalty', 'survival', 'condition'}, default 'cost_penalty') – Reward function to use

  • max_steps (int, default 100) – Maximum episode length

  • observability ({'full', 'partial', 'noisy'}, default 'partial') – Observation mode (partial recommended for POMDP)

  • action_type ({'multi_discrete', 'discrete'}, default 'discrete') – Action space format

  • render_mode (str, optional) – Rendering mode

  • rich_display (bool, default False) – Enable rich terminal status displays

params

Loaded configuration parameters

Type:

dict

failure_thresholds

Failure thresholds per component

Type:

np.ndarray

component_types

Component type names

Type:

list

Notes

This environment is designed for training RL agents on infrastructure maintenance problems with realistic component degradation and cost models. The POMDP formulation requires agents to balance exploration (inspections) with exploitation (maintenance actions).

Actions are: - 0: Do nothing - 1: Inspect component - 2: Repair component - 3: Replace component

Observations include last inspection results, time since inspections, and remaining budget information.

Examples

>>> env = SimpleInfraEnv.from_config('config.yaml', 'components.csv')
>>> obs, info = env.reset()
>>> action = env.action_space.sample()
>>> obs, reward, terminated, truncated, info = env.step(action)
>>> # Check environment with stable-baselines3
>>> from stable_baselines3.common.env_checker import check_env
>>> check_env(env, warn=True)
__init__(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'cost_penalty', max_steps: int = 100, observability: str = 'partial', action_type: str = 'discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
classmethod from_config(config_path: str, components_path: str, **kwargs) SimpleInfraEnv[source]

Create environment from configuration files.

Parameters:
  • config_path (str) – Path to YAML configuration file

  • components_path (str) – Path to CSV components data file

  • **kwargs – Additional keyword arguments to override defaults

Returns:

Configured environment instance

Return type:

SimpleInfraEnv

Examples

>>> env = SimpleInfraEnv.from_config(
...     'config.yaml', 'components.csv',
...     reward_scheme='survival', max_steps=200
... )
class infralib.envs.simple.SimpleInfraMDPEnv(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'margin', max_steps: int = 100, action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]

Bases: BaseInfraEnv

Simple MDP infrastructure environment with component margins.

This environment provides full observability of component states and focuses on the margin between current state and failure threshold. This formulation is easier to learn for many RL algorithms as it provides direct state information.

Features: - MDP formulation with full state observability - Component margins as primary observation - Configuration-based setup from YAML/CSV files - Margin-based reward functions - Support for component types with different failure thresholds

Parameters:
  • config_path (str, optional) – Path to YAML configuration file

  • components_path (str, optional) – Path to CSV components data file

  • reward_scheme ({'margin', 'weighted_margin', 'binary'}, default 'margin') – Reward function to use

  • max_steps (int, default 100) – Maximum episode length

  • action_type ({'multi_discrete', 'discrete'}, default 'multi_discrete') – Action space format (multi_discrete recommended for MDP)

  • render_mode (str, optional) – Rendering mode

  • rich_display (bool, default False) – Enable rich terminal status displays

params

Loaded configuration parameters

Type:

dict

failure_thresholds

Failure thresholds per component type

Type:

np.ndarray

max_states

Maximum component state value

Type:

int

Notes

The MDP formulation uses component margins as the primary state representation: margin = (current_state - failure_threshold) / (max_state - failure_threshold)

This normalization makes the state space more uniform across component types and focuses learning on the critical region near failure thresholds.

Observations include: - Component margins (normalized to [-1, 1] range) - Normalized remaining budget

Examples

>>> env = SimpleInfraMDPEnv.from_config('config.yaml', 'components.csv')
>>> obs, info = env.reset()
>>> print(f"Component margins: {obs[:-1]}")  # All but last element
>>> print(f"Budget remaining: {obs[-1]}")    # Last element
__init__(config_path: str | None = None, components_path: str | None = None, reward_scheme: str = 'margin', max_steps: int = 100, action_type: str = 'multi_discrete', render_mode: str | None = None, rich_display: bool = False, seed: int | None = None)[source]
classmethod from_config(config_path: str, components_path: str, **kwargs) SimpleInfraMDPEnv[source]

Create MDP environment from configuration files.

Parameters:
  • config_path (str) – Path to YAML configuration file

  • components_path (str) – Path to CSV components data file

  • **kwargs – Additional keyword arguments to override defaults

Returns:

Configured MDP environment instance

Return type:

SimpleInfraMDPEnv

Examples

>>> env = SimpleInfraMDPEnv.from_config(
...     'config.yaml', 'components.csv',
...     reward_scheme='weighted_margin'
... )