muster release v0.1.0
Changed
- Session duration reduced from 90 days to 30 days. The refresh token TTL now
matches Dex’s
absoluteLifetime(720h). Previously, muster’s 90-day refresh token outlived Dex’s 30-day session, causing confusing failures when auto-refresh silently stopped working after day 30. Users who were logging in once every ~2 months will now need to re-authenticate every 30 days. muster auth statusnow shows session expiry. Instead ofRefresh: Available, the output now showsSession: ~29 days remaining (auto-refresh), giving users a concrete estimate of when re-authentication will be required.- Access token TTL is now explicitly set to 30 minutes (matching Dex’s
idTokensexpiry) instead of relying on the library default of 1 hour. - Session duration is now configurable via
oauth.server.sessionDurationinconfig.yaml(default:720h/ 30 days). - Kubernetes event emission is now disabled by default (alpha feature). Use
--enable-eventsflag onmuster serveor setevents: trueinconfig.yamlto opt in. - Switch CI to
push-to-registries-multiarch(architect-orb@6.14.0) with amd64-only on branches for faster PR feedback and full multi-arch on release tags. Chart tests now run before publishing to the app catalog. - Update Dockerfile to multi-stage build with native cross-compilation support for multi-architecture images.
Note: The Server-Side Meta-Tools Migration below is a breaking change that will be released as part of the next major version. External integrations should prepare for this change.
Breaking Changes
Server-Side Meta-Tools Migration
Meta-tools (list_tools, call_tool, describe_tool, etc.) have moved from the agent to the aggregator server. This is a fundamental architectural change.
What Changed:
| Component | Before | After |
|---|---|---|
| Agent | Exposed 11 meta-tools + bridged to aggregator | Transport bridge only (OAuth shim + stdio↔HTTP) |
| Aggregator | Exposed 36+ core tools directly | Exposes ONLY meta-tools - no direct tool access |
| Tool Access | Direct tool calls to aggregator | All tool calls go through call_tool meta-tool |
| What Continues Working (Transparent Migration): |
- CLI commands (
muster list,muster get, etc.) - client wraps calls automatically - Agent REPL (
muster agent --repl) - uses same client with transparent wrapping - BDD test scenarios - test client wraps calls automatically
- MCP native protocol methods (
tools/list,resources/list) - not affected What Breaks (Requires Update): - External integrations calling tools directly via HTTP
- Custom clients connecting directly to aggregator Migration for External Clients:
// Before: Direct tool call
{"method": "tools/call", "params": {"name": "core_service_list", "arguments": {}}}
// After: Wrap through call_tool
{"method": "tools/call", "params": {
"name": "call_tool",
"arguments": {"name": "core_service_list", "arguments": {}}
}}
Benefits:
- OAuth-capable clients can connect directly to server without agent
- Simpler agent architecture (~200 lines vs ~700 lines)
- Consistent tool visibility across all clients
- Centralized meta-tool logic See ADR-010 for design details. Known External Integrations Affected:
- Any HTTP clients calling the aggregator directly
- Custom MCP clients not using
muster agent - CI/CD pipelines with direct tool calls Recommended Migration Timeline:
- Review your integration code for direct tool calls
- Update to wrap calls through
call_toolmeta-tool - Test with the new Muster version before deploying
Changed
- MCPServer CRD State Exposes Auth Required - The MCPServer CRD now shows
Auth Requiredstate when a remote server returns 401 Unauthorized (#337)- Before: 401 response mapped to
Connected(hiding auth requirement) - After: 401 response shows as
Auth Requiredin CRD state - This gives operators clear visibility into which servers need authentication
- CLI output updated:
muster list mcpservernow showsAuth Requiredstate - SESSION column values updated:
OK→Authenticated,Required→Pending Auth - Column header renamed:
AUTH→SESSIONto matchmuster auth statusoutput
- Before: 401 response mapped to
Added
- Reconciliation Framework - Automatic synchronization between resource definitions (CRDs/YAML) and running services
- Supports both Kubernetes mode (using controller-runtime informers) and filesystem mode (using fsnotify)
- Auto-detects operating mode based on environment
- Configurable per-resource-type enable/disable
- Work queue with deduplication and exponential backoff
- Status tracking and API for observability
- See ADR 007 for design details
- StateChangeBridge - Real-time sync of runtime state changes to CRD status subresources
- Subscribes to orchestrator service state changes
- Triggers reconciliation to update CRD status when services start/stop/crash
Changed
- BREAKING: Consolidated OAuth Configuration Naming - OAuth configuration structure has been reorganized for clarity (#324)
- Before:
aggregator.oauth(client/proxy) +aggregator.oauthServer(server protection) - After:
aggregator.oauth.mcpClient(MCP client/proxy) +aggregator.oauth.server(server protection) - Both OAuth roles now live under a single
oauthsection with explicitmcpClient/serversub-sections - The
mcpClientname makes it clear this is for authenticating TO remote MCP servers - CLI flags renamed:
--oauth→--oauth-mcp-client,--oauth-public-url→--oauth-mcp-client-public-url - Helm values updated:
muster.oauth.*→muster.oauth.mcpClient.*,muster.oauthServer.*→muster.oauth.server.* - CIMD configuration moved to nested structure:
cimdPath/cimdScopes→cimd.path/cimd.scopes - Migration: Update configuration files and Helm values to use the new structure
- Before:
- BREAKING: CRD Status Field Changes - Status fields have been redesigned for session-aware tool availability
- MCPServerStatus: Removed
availableTools(session-dependent), addedlastConnectedandrestartCount - ServiceClassStatus: Replaced
available/requiredTools/missingTools/toolAvailabilitywithvalid/validationErrors/referencedTools - WorkflowStatus: Replaced
available/requiredTools/missingTools/stepValidationwithvalid/validationErrors/referencedTools/stepCount - Tool availability is now computed per-session at runtime, not stored in CRs
- Existing CRs will have stale status fields that will be updated on first reconciliation
- MCPServerStatus: Removed
- Added Chart annotations to support OCI repositories.
Fixed
- Helm CiliumNetworkPolicy: Fixed incorrect values path for OAuth storage check (now uses
.Values.muster.oauth.server.storage)
Added
- Remote MCP Server Support for Kubernetes Environments
- Added comprehensive support for
stdio,streamable-httpandssetransport protocols - Enhanced CRD Schema: Updated
MCPServerSpecto support all MCP server types- Added new config for
streamable-httpandsse:url,headersandtimeoutfields - Added mutual exclusion validation and required field validation using kubebuilder annotations
- Added new config for
- New CLI Commands: Added subcommands to use new type system
muster create mcpserver <name> --type stdiofor local MCP serversmuster create mcpserver <name> --type streamable-httpfor HTTP remote serversmuster create mcpserver <name> --type ssefor SSE remote servers
- Updated Examples: Enhanced example files to demonstrate both local and remote configurations
- Kubernetes Deployment Ready: Enables deployment patterns where Muster aggregator runs in cluster and connects to MCP servers deployed as separate Kubernetes services
- Added comprehensive support for
- Systemd Socket Activation Support
- Added
muster.socketunit file for socket-activated systemd deployment - Modified
muster.serviceto use socket activation on localhost:8090 - Updated
scripts/setup-systemd.shandscripts/dev-restart.shto handle socket activation - Make use of new dependency
github.com/coreos/go-systemdto handle socket activation
- Added
- Service Health Monitoring
- Added health checks for MCP servers using the
tools/listJSON-RPC method - Added health checks for port forwards by testing TCP connectivity
- Health checks run every 30 seconds for all running services
- Health status is reported through the StateStore and displayed in the TUI
- Created
ServiceHealthCheckerinterface for extensible health checking
- Added health checks for MCP servers using the
- Improved State Reconciliation
- Implemented proper
ReconcileState()method that syncs TUI state with StateStore - Updates service statuses, ports, PIDs, and error states from centralized store
- Synchronizes cluster health information from K8sStateManager
- Ensures UI consistency after startup and state changes
- Implemented proper
- K8s Connections as Services
- Kubernetes connections are now modeled as services in the dependency graph
- K8s connection health monitoring is now handled by dedicated K8s connection services
- Unified service management architecture - all services (K8s, port forwards, MCPs) follow the same lifecycle
- K8s connections can be stopped/restarted like other services with proper cascade handling
- Cascading stop functionality: stopping a service automatically stops all dependent services
- K8s connection health monitoring with automatic service lifecycle management
- Port forwards now depend on their kubernetes context being authenticated and healthy
- The kubernetes MCP server depends on the management cluster connection
- When k8s connections become unhealthy, dependent services are automatically stopped
- Manual stop (x key) now uses cascading stop to cleanly shut down dependent services
- New
StartServicesDependingOnmethod in ServiceManager to restart services when dependencies recover - New
orchestratorpackage that manages application state and service lifecycle for both TUI and non-TUI modes - New
HealthStatusUpdateandReportHealthfor proper health status reporting - Health-aware startup: Services now wait for their K8s dependencies to be healthy before starting
- Add comprehensive dependency management system for services
- Services now track why they were stopped (manual vs dependency cascade)
- Automatically restart services when their dependencies recover
- Ensure correct startup order based on dependency graph
- Prevent manually stopped services from auto-restarting
- Phase 1 of Issue #45: Message Handling Architecture Improvements
- Added correlation ID support to
ManagedServiceUpdatefor tracing related messages and cascading effects - Implemented configurable buffer strategies for TUI message channels:
BufferActionDrop: Drop messages when buffer is fullBufferActionBlock: Block until space is availableBufferActionEvictOldest: Remove oldest message to make room for new ones
- Added priority-based buffer strategies to handle different message types differently
- Introduced
BufferedChannelwith metrics tracking (messages sent, dropped, blocked, evicted) - Enhanced orchestrator with correlation tracking for health checks and cascading operations
- Updated service manager to use new correlation ID system for better debugging
- Added comprehensive test coverage for buffer strategies and correlation tracking
- Added correlation ID support to
- Phase 2 of Issue #45: State Consolidation
- Implemented centralized
StateStoreas single source of truth for all service states - Added
ServiceStateSnapshotfor complete state information with correlation tracking - Introduced state change subscriptions with
StateSubscriptionfor reactive updates - Enhanced
ServiceReporterinterface withGetStateStore()method for direct state access - Updated
TUIReporterandConsoleReporterto use centralized state management - Migrated
ServiceManagerfrom local state tracking to centralizedStateStore - Added comprehensive metrics tracking for state changes and subscription performance
- Implemented state change event system with old/new state tracking
- Added support for filtering services by type and state
- Maintained full backwards compatibility while eliminating state duplication
- Implemented centralized
- Phase 3 of Issue #45: Structured Event System
- Implemented comprehensive event hierarchy with semantic event types:
ServiceStateEventfor service lifecycle changes with old/new state trackingHealthEventfor cluster health status updatesDependencyEventfor cascade start/stop operationsUserActionEventfor user-initiated actionsSystemEventfor system-level operations
- Added
EventBusinterface with publish/subscribe functionality - Implemented flexible event filtering system with composable filters:
- Filter by event type, source, severity, correlation ID
- Combine filters with AND/OR logic for complex subscriptions
- Created
EventBusAdapterfor backwards compatibility with existingServiceReporterinterface - Added comprehensive event metrics tracking (published, delivered, dropped events)
- Implemented both handler-based and channel-based event subscriptions
- Added event severity levels (trace, debug, info, warn, error, fatal) for better categorization
- Enhanced correlation tracking with event metadata support
- Provided thread-safe concurrent event publishing and subscription management
- Added extensive test coverage for all event types and bus functionality
- Implemented comprehensive event hierarchy with semantic event types:
- Phase 4 of Issue #45: Testing and Polish
- Added comprehensive integration tests covering end-to-end event flows
- Implemented performance monitoring utilities with
PerformanceMonitorand metrics tracking - Created event batching system with
EventBatchProcessorfor high-volume scenarios - Built
OptimizedEventBuswith configurable performance optimizations - Added object pooling system with
EventPoolManagerto reduce GC pressure - Implemented extensive error recovery testing including panic handling
- Added memory usage monitoring and subscription cleanup verification
- Created comprehensive documentation covering architecture, usage, and best practices
- Fixed race conditions in event bus concurrent access patterns
- Enhanced thread safety across all components with proper synchronization
- Provided migration guides and troubleshooting documentation
- Achieved high test coverage with robust integration and unit tests
- Improved Dependency Management for Service Restarts
- When restarting a service, its dependencies are now automatically restarted if they’re not active
- This ensures services always have their requirements satisfied (e.g., restarting Grafana MCP will also restart its port forward if needed)
- Dependencies are restarted regardless of their stop reason to guarantee service requirements
- Clear manual stop reason when restarting a service to allow proper dependency management
- Implemented Issue #46: Improved State Management Between TUI and Orchestrator
- Phase 1: Unified State Management
- Added helper methods to TUI Model to use StateStore as single source of truth
- Implemented state reconciliation on TUI startup to ensure consistency
- Updated TUI controller to use StateStore instead of directly updating model maps
- Eliminated state duplication between TUI Model and StateStore
- Phase 2: Message Sequencing
- Added sequence numbers to
ManagedServiceUpdatefor proper message ordering - Implemented
MessageBufferfor handling out-of-order messages - Added global sequence counter with atomic operations for thread safety
- Added sequence numbers to
- Phase 3: Enhanced Correlation Tracking
- Added
CascadeInfotype for tracking cascade relationships between services - Added
StateTransitiontype for tracking state changes with full context - Enhanced StateStore to record state transitions and cascade operations automatically
- Updated orchestrator to record cascade operations for better observability
- Added
- Phase 4: Improved Error Handling
- Added retry logic for critical updates that are dropped due to buffer overflow
- Implemented
BackpressureNotificationMsgfor user notifications about dropped messages - Added configurable retry attempts with exponential backoff
- Enhanced TUIReporter with retry queue processing and user feedback
- Phase 1: Unified State Management
- Comprehensive Documentation Suite
- Added Architecture Overview documenting system design, components, and principles
- Created Quick Start Guide for new users to get up and running quickly
- Added Troubleshooting Guide with common issues and solutions
- Enhanced development documentation with recent architectural improvements
- Documented dependency management, state management, and message flow in detail
- Configurable Namespace for CR Discovery
- Added
namespaceconfiguration option toconfig.yamlfor Kubernetes CR discovery - Allows specifying which namespace to use for MCPServer, ServiceClass, and Workflow resources
- Defaults to
"default"when not specified - Enables muster to work properly in multi-namespace Kubernetes environments
- Added
Changed
- Aggregator Config
- Drop the “Enabled” field (always enabled in modes where it’s used)
- Service Manager Refactoring
- ServiceManager now accepts an optional KubeManager parameter for K8s connection services
- Added support for K8s connection services in the service lifecycle management
- Improved service stop handling to report “Stopping” state before closing channels
- Orchestrator Improvements
- Removed old health monitoring methods in favor of K8s connection services
- Updated dependency graph to use service labels for K8s connections (e.g., “k8s-mc-mymc” instead of “k8s:context-name”)
- Improved service restart logic to properly handle dependencies
- Dependency graph now includes K8sConnection nodes as fundamental dependencies
- Service manager’s StopServiceWithDependents method handles cascading stops
- Health check failures trigger automatic cleanup of dependent services
- Non-TUI mode now uses the orchestrator for health monitoring and dependency management
- TUI mode no longer performs its own health checks - the orchestrator handles all health monitoring and the TUI only displays results
- Proper separation of concerns: orchestrator manages health checks and service lifecycle, TUI only displays status
- Orchestrator now performs initial health check before starting services
- Refactored TUI message handling system
- Introduced specialized controller/dispatcher for better separation of concerns
- Controllers now focus on single responsibilities
- Better error handling and logging throughout the message flow
- Improved startup behavior - the UI now shows loading state until all clusters are fully loaded
- Port forwards no longer start before K8s health checks pass - orchestrator now checks K8s health before starting dependent services
ManagedServiceUpdatenow includesCorrelationID,CausedBy, andParentIDfields for tracingTUIReporternow uses configurable buffered channels instead of simple channels- Service state updates now include correlation information in logs
- Orchestrator operations (stop/restart) now generate and track correlation IDs
- Removed unused
DependsOnServicesfield fromMCPServerDefinition- MCP servers never depend on other MCP servers - Enhanced
RestartServiceto use the newstartServiceWithDependenciesmethod for dependency-aware restarts - Updated
handleServiceStateUpdateto properly restart services with their dependencies - Improved Service Monitoring
- Fixed
monitorAndStartServicesto respectStopReasonDependency- services stopped due to dependency failure won’t be restarted until their dependencies are restored - Added automatic restart of dependent services when a dependency becomes healthy again
- Added 1-second delay before restarting services to ensure ports are properly released
- Fixed
Fixed
- Exit CLI on standalone server failure
- When the mcp-aggregator service (server) fails, the CLI now terminates gracefully
- Port Forwarding State Issue
- Fixed issue where port forwarding services would get stuck in “Stopping” state
- ServiceManager now properly reports the “Stopping” state before closing the stop channel
- Port forwarding processes correctly transition to “Stopped” state
- Code Cleanup
- Removed commented-out
mcpServerProcessstruct that was marked for deletion - Removed duplicate
updatePortForwardFromSnapshotandupdateMcpServerFromSnapshotmethods - Cleaned up unused code and improved code organization
- Removed commented-out
- Dependency-Related Fixes
- Fixed issue where MCP servers would restart even when their port forward dependencies were stopped
- Services with
StopReasonDependencynow properly wait for their dependencies to be restored - When a service becomes healthy, its dependent services that were stopped due to dependency failure are automatically restarted
- Fixed “address already in use” errors by adding proper restart delay
- Fixed spurious error logs when stopping MCP servers
- Suppressed expected “file already closed” errors that occurred when stopping MCP server processes
- Added proper error handling for both stdout and stderr pipe closures during shutdown
- These were harmless errors but created unnecessary noise in the logs
- Fixed cascade stops not triggering when K8s connections fail
- When a K8s connection transitions to Failed state (e.g., due to network issues), all dependent services (port forwards and MCP servers) are now properly stopped
- This prevents orphaned services from continuing to run when their underlying K8s connection is no longer healthy
- Services will automatically restart when the K8s connection recovers
- Set config directory early to avoid bugs handling the empty string (those should be fixed with this change as well)
Documentation
- Added comprehensive documentation about dependency graph implementation
- Enhanced dependency management documentation with detailed examples
- Added explanation of dependency rules and startup/restart behavior
- Documented the relationship between stop reasons and automatic recovery
- Created comprehensive architecture documentation covering all major components
- Added troubleshooting guide with detailed debugging techniques
- Created quick start guide for new users
- Updated development guide with recent architectural improvements
- Documented the entire dependency management system with visual diagrams
- Updated outdated documentation sections
- Removed obsolete “Package Design for Shared Core Logic” section from development.md
- Updated development.md to reference the unified service architecture
- Fixed test examples in development.md to match current implementation
- Updated README.md prerequisites to remove mcp-proxy requirement
- Clarified non-TUI mode behavior in README.md
- Rewritten MCP Integration Notes in README.md to reflect YAML configuration system
Technical Details
- New helper functions:
NewManagedServiceUpdate(),WithCause(),WithError(),WithServiceData() - New types:
BufferStrategy,BufferedChannel,ChannelMetrics,ChannelStats - Backwards compatibility maintained for existing interfaces
- All existing tests updated and new comprehensive test suite added