Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

orion-error Documentation

简体中文

orion-error is the Rust implementation of the WuKong error governance model.

At the documentation entry point, the most important framing is this:

  • contract channel — stable identity, category, retryability, visibility
  • diagnostic channel — detail, source chain, operation context, key fields
  • adaptive output — HTTP / RPC / CLI / log projections generated by policy

In this crate, those ideas map to:

  • #[derive(OrionError)] for stable semantic identities
  • StructError<R> as the unified runtime carrier
  • source_err(...) for first entry and semantic-boundary wrapping
  • conv_err() for reason remapping without rebuilding the error story
  • report() / identity_snapshot() / exposure(...) for boundary output

Suggested reading order: start with the user guide to learn concepts and usage, then check the developer guide for public API contracts and release details.


User Guide

DocumentDescription
Why orion-errorError governance motivation and examples
TutorialGetting started tutorial
Protocol ContractExposure projection contract
Report / Exposure BoundaryDiagnostic vs exposure boundary
LoggingLogging integration
Comparison with thiserrorDifferences and coexistence
Ecosystem Comparisonanyhow / thiserror / color-eyre / orion-error
Large-Scale Error Governance ManifestoWuKong model, governance principles, and industrial validation
Design ConstraintsKnown design constraints

Developer Guide

DocumentDescription
API ContractPublic API boundaries
Compatibility MigrationMigration from older APIs
Public Surface GradingLayered export grading
Release ChecklistPre-release checks
Performance BenchmarksAllocation benchmarks

When docs and implementation conflict, src/, tests/, examples/ are authoritative.

Why orion-error

orion-error is not about prettier error printing. It is about making errors in Rust services governable, traceable, exposable, and evolvable as structured contracts.

Basic error handling answers “how does this function return failure?” Larger services need stronger answers:

  • How does an error carry the environment in which it happened?
  • How do lower-level technical failures become stable upper-layer semantics without losing diagnostics?
  • How can debugging see the error chain across layers?
  • How can logs stay useful without logging the same failure everywhere?
  • How should the same error be shown differently to users, operators, developers, and protocol clients?

orion-error keeps errors structured as they cross layers instead of reducing them to strings.


1. Diagnostics

1.1 Low-level errors often miss critical environment

Low-level errors usually describe the technical failure, not the business environment.

#![allow(unused)]
fn main() {
let content = std::fs::read_to_string(path)?;
}

If this fails, std::io::Error may say:

No such file or directory

But debugging often needs to know:

  • which path was read
  • which operation was running
  • which tenant, order, request, or component was involved
  • whether the file was config, an order record, cache, or temporary data
  • whether the failure should be classified as config, system, or validation failure

Weak approach: add context only to logs

#![allow(unused)]
fn main() {
match std::fs::read_to_string(path) {
    Ok(content) => Ok(content),
    Err(err) => {
        log::error!("read config failed, path={path}, error={err}");
        Err(err)
    }
}
}

This splits diagnostics between logs and the error value. The caller still receives an error without structured context.

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;

let ctx = OperationContext::doing("load config")
    .with_field("path", path.display().to_string())
    .with_meta("component.name", "config_loader");

let content = std::fs::read_to_string(path)
    .source_err(AppReason::system_error(), "read config failed")
    .with_context(&ctx)?;
}

Here:

  • source_err(...) brings std::io::Error into the structured error system.
  • AppReason::system_error() is the stable upper-layer reason.
  • "read config failed" is this layer’s explanation.
  • OperationContext carries fields such as path and component.name.

1.2 Technical failures must be abstracted without losing diagnostics

Two common approaches are both poor:

  1. Drop the lower-level error.
  2. Expose the lower-level error directly and let higher layers depend on the repository’s technical choice.

The better approach is: at layer boundaries, convert the lower-level failure into the current layer’s stable error semantics while preserving source, detail, and context for diagnostics.

Weak approach: leak implementation errors upward

#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), sqlx::Error> {
    repository::insert_order(order).await?;
    Ok(())
}
}

Now the service/API layer knows the repository uses sqlx.

Weak approach: remove the root cause

#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), StoreError> {
    if repository::insert_order(order).await.is_err() {
        return Err(StoreReason::Unavailable.to_err());
    }

    Ok(())
}
}

This hides implementation details, but also removes the original cause.

#![allow(unused)]
fn main() {
use orion_error::prelude::*;

async fn write_order(order: Order) -> Result<(), StructError<StoreReason>> {
    repository::insert_order(&order)
        .await
        .source_err(StoreReason::Unavailable, "insert order failed")
        .with_field("order_id", order.id.to_string())
        .with_meta("component.name", "order_store")?;

    Ok(())
}
}

The caller sees StructError<StoreReason>, not sqlx::Error, while the original database error remains available as internal source.

If the upper layer only remaps reason type, use conv_err():

#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), StructError<AppReason>> {
    write_order(order).await.conv_err()?;
    Ok(())
}
}

If the upper layer creates a new semantic boundary, use source_err(...):

#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), StructError<AppReason>> {
    write_order(order)
        .await
        .source_err(AppReason::system_error(), "submit order failed")?;

    Ok(())
}
}

1.3 Debugging needs an error chain, not an isolated message

Real failures often travel through multiple layers:

HTTP handler
  -> service
    -> repository
      -> database / filesystem / remote API

A final message such as:

submit order failed

is not enough. Debugging needs the path:

submit order failed
  caused by: insert order failed
  caused by: database request failed
  caused by: connection timed out

The chain answers:

  • what the original technical failure was
  • which layers interpreted it
  • where new semantic boundaries were introduced
  • what context each layer added
  • how the external error relates to the internal root cause
#![allow(unused)]
fn main() {
async fn adapter_call(req: Request) -> Result<Response, StructError<AdapterReason>> {
    client.send(req)
        .await
        .source_err(AdapterReason::RemoteUnavailable, "remote call failed")
}

async fn load_quote(id: QuoteId) -> Result<Quote, StructError<ServiceReason>> {
    adapter_call(Request::quote(id))
        .await
        .source_err(ServiceReason::QuoteLoadFailed, "load quote failed")
        .with_field("quote_id", id.to_string())?;

    todo!("map response")
}
}

This preserves service semantics, adapter semantics, the lower source, and structured fields.


2. Operations

Good logging is boundary logging, not more logging

Logging often becomes noisy because each layer emits its own error! line:

#![allow(unused)]
fn main() {
log::error!("repository insert failed: {err}");
log::error!("service submit failed: {err}");
log::error!("http request failed: {err}");
}

This duplicates failures and forces operators to reconstruct the chain manually.

The better model is: the error carries identity, reason, detail, context, and source chain; the boundary logs one structured projection.

#![allow(unused)]
fn main() {
async fn handle_submit(order: Order) -> Result<HttpResponse, StructError<AppReason>> {
    submit_order(order)
        .await
        .source_err(AppReason::system_error(), "handle submit order failed")?;

    Ok(HttpResponse::ok())
}
}

At the handler, worker, or task boundary:

#![allow(unused)]
fn main() {
let report = err.report();
let exposure = err.exposure(&policy);
}

The business code does not need to concatenate log strings at every layer. The boundary can log one structured view containing identity, reason, detail, context, and chain.

OperationContext logging

OperationContext provides structured log methods that automatically include current fields and metadata:

#![allow(unused)]
fn main() {
use orion_error::OperationContext;

let ctx = OperationContext::doing("order_processing")
    .with_field("order_id", "123")
    .with_meta("component.name", "order_service");

ctx.info("start");
ctx.warn("slow upstream");
ctx.error("final failure");
}

For lifecycle-scoped logging, use with_auto_log():

#![allow(unused)]
fn main() {
let mut ctx = OperationContext::doing("sync_user")
    .with_auto_log()
    .with_field("user_id", "42");

do_sync()?;
ctx.mark_suc();
}

If the scope drops without mark_suc() or mark_cancel(), a failure log is emitted automatically. See LOGGING.md for details.

The principle: sparse lifecycle logs + boundary error projection, not repetitive error! at every layer.


3. Presentation

One error needs different views for different audiences

An error is consumed by more than one audience:

  • End users need safe, understandable, actionable messages.
  • Operators / SREs need component, environment, classification, retry, and impact hints.
  • Developers need source chain, detail, context, and lower-level cause.
  • Protocol clients need stable code, field shape, and retry hints.
  • Logs / monitoring / alerting need structured fields, not long strings.

If there is only one error string, it is hard to satisfy all of them well.

For example:

database connection failed: timeout from sqlx pool

This is:

  • too technical for end users
  • too thin for developers
  • unstable for protocol clients
  • not structured enough for logs

orion-error separates internal structure from external presentation:

  • internal: reason, ErrorIdentity.code, detail, context, source chain
  • user-facing: safe and actionable exposure
  • operator-facing: component, operation, category, retryability, severity
  • developer-facing: report and full chain
  • protocol-facing: exposure projection
#![allow(unused)]
fn main() {
let report = err.report();
let exposed = err.exposure(&DefaultExposurePolicy::default());
}
AudienceNeedsProjection
Usersafe message, action hintexposure view
Operator / SREcomponent, operation, retryable, severityexposure snapshot / log JSON
Developersource chain, detail, contextreport
Protocol clientstable code, stable fields, retry hintHTTP/RPC/CLI error JSON
Test / regressionstable structurestable snapshot

This is the key difference between orion-error and a pure display-oriented tool: it keeps one structured error, then projects the right view at the right boundary.


Summary

orion-error is for systems where errors are contracts, not just return values.

It helps you:

  1. Preserve failure environment: attach path, tenant, order id, operation, and component context.
  2. Abstract technical details without losing diagnostics: convert lower failures into stable layer reasons while preserving source and context.
  3. Keep the cross-layer error chain: let debugging see how a low-level failure became the final boundary error.
  4. Log effectively and sparingly: carry structure in the error and log once at the boundary.
  5. Project the right view for each audience: users, operators, developers, protocol clients, logs, and tests need different output.

In one sentence:

orion-error keeps one structured error across layers, then projects the right view at the right boundary.

Tutorial

This document describes the primary usage paths of orion-error, based on the current source code, tests, and examples/.

Installation

[dependencies]
orion-error = "0.8.0"

Optional features:

[dependencies]
orion-error = { version = "0.8.0", features = ["serde"] }
orion-error = { version = "0.8.0", features = ["tracing"] }
orion-error = { version = "0.8.0", features = ["serde_json"] }

Default features: derive, log.

Import Conventions

Prefer one of these two approaches:

Application code (default):

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
}

Architecture boundaries — explicit layered imports:

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::conversion::*;    // cross-layer conversion
use orion_error::protocol::*;      // boundary output
use orion_error::protocol::*;      // boundary output
use orion_error::interop::*;       // std::error::Error bridge
}

1-Minute Example

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;

#[derive(Debug, Clone, PartialEq, OrionError)]
enum AppReason {
    #[orion_error(identity = "biz.invalid")]
    Invalid,
    #[orion_error(transparent)]
    General(UnifiedReason),
}

fn load_config(path: &str) -> Result<String, StructError<AppReason>> {
    let ctx = OperationContext::doing("load_config")
        .with_field("path", path)
        .with_meta("component.name", "config_loader");

    std::fs::read_to_string(path)
        .source_err(AppReason::system_error(), "read failed")
        .doing("read file")
        .with_context(&ctx)
}
}

This covers the four core points:

  • Domain reason defined with OrionError
  • Error entry via source_err(reason, detail) (unified entry)
  • Semantic context via doing(...)
  • Diagnostic fields and metadata on OperationContext

1. Defining Reason

1.1 Domain Reason

New code should use #[derive(OrionError)]:

#![allow(unused)]
fn main() {
use orion_error::{OrionError, UnifiedReason};

#[derive(Debug, Clone, PartialEq, OrionError)]
enum OrderReason {
    #[orion_error(identity = "biz.order_not_found")]
    OrderNotFound,
    #[orion_error(identity = "biz.insufficient_funds")]
    InsufficientFunds,
    #[orion_error(transparent)]
    General(UnifiedReason),
}
}

OrionError generates: Display, DomainReason, ErrorCode, ErrorIdentityProvider.

1.2 Universal Reason

UnifiedReason is the built-in universal reason classification. Common constructors:

  • UnifiedReason::validation_error(), UnifiedReason::business_error()
  • UnifiedReason::system_error(), UnifiedReason::network_error(), UnifiedReason::timeout_error()
  • UnifiedReason::core_conf(), UnifiedReason::logic_error()

1.3 Delegate Constructors

If your domain reason has a transparent UnifiedReason variant, all UnifiedReason constructors are generated automatically:

AppReason::system_error()          // instead of AppReason::from(UnifiedReason::system_error())
AppReason::validation_error()

2. Constructing StructError

2.1 Direct Construction

#![allow(unused)]
fn main() {
use orion_error::prelude::*;

let err = StructError::from(UnifiedReason::validation_error())
    .with_detail("field `email` is required");
}

2.2 Builder

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;

let ctx = OperationContext::doing("validate input");

let err = StructError::builder(UnifiedReason::validation_error())
    .detail("field `email` is required")
    .context_ref(&ctx)
    .finish();
}

2.3 Attaching Source

#![allow(unused)]
fn main() {
use orion_error::prelude::*;

let err = StructError::from(UnifiedReason::system_error())
    .with_detail("read config failed")
    .with_source(std::io::Error::other("disk offline"));
}

Preferred APIs: with_source(...), builder.source(...). These auto-route between StdError and StructError source types.

3. Using Context

OperationContext carries runtime context:

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;

let ctx = OperationContext::doing("place_order")
    .with_field("order_id", "A-1001")
    .with_field("user_id", "42")
    .with_meta("component.name", "order_service");

let result: Result<(), StructError<UnifiedReason>> =
    Err(StructError::from(UnifiedReason::system_error()));

let result = result
    .doing("check inventory")
    .with_context(&ctx);

assert!(result.is_err());
}

Attach context to an error:

#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;

let ctx = OperationContext::doing("place_order")
    .with_field("order_id", "A-1001")
    .with_field("user_id", "42")
    .with_meta("component.name", "order_service");

let result: Result<(), StructError<UnifiedReason>> =
    Err(StructError::from(UnifiedReason::system_error()));

let result = result
    .doing("check inventory")
    .with_context(&ctx);

assert!(result.is_err());
}

Common field types:

  • with_field(...) — human-readable diagnostic entries (appears in Display output)
  • with_meta(...) — machine-oriented metadata (serialization only)

4. Error Entry and Cross-Layer Conversion

4.1 source_err(reason, detail) — Unified Entry

Works for both raw std::error::Error and already-structured StructError sources:

#![allow(unused)]
fn main() {
use orion_error::prelude::*;

let err = std::fs::read_to_string("config.toml")
    .source_err(UnifiedReason::system_error(), "read config failed")
    .unwrap_err();
}

Supported source types: std::io::Error, anyhow::Error (with anyhow feature), serde_json::Error (with serde_json feature), toml::de::Error / toml::ser::Error (with toml feature), and custom RawStdError types via raw_source(...).

4.2 conv_err() — Cross-Layer Reason Remap

When the upstream error is already structured and you only need to change the reason type:

#![allow(unused)]
fn main() {
use derive_more::From;
use orion_error::conversion::ConvErr;
use orion_error::conversion::ToStructError;
use orion_error::prelude::*;

#[derive(Debug, Clone, PartialEq, From, OrionError)]
enum RepoReason {
    #[orion_error(transparent)]
    General(UnifiedReason),
}

#[derive(Debug, Clone, PartialEq, From, OrionError)]
enum ServiceReason {
    #[orion_error(transparent)]
    Repo(RepoReason),
}

fn lower_layer_call() -> Result<(), StructError<RepoReason>> {
    Err(RepoReason::system_error().to_err().with_detail("read config failed"))
}

fn upper_layer_call() -> Result<(), StructError<ServiceReason>> {
    lower_layer_call().conv_err()?;
    Ok(())
}
}

Requires ServiceReason: From<RepoReason>.

5. Error Objects Summary

ObjectPurposeEntry Point
StructError<R>Runtime carrierPropagation
DiagnosticReportHuman diagnosticserr.report()
ErrorProtocolSnapshotProtocol projectionerr.exposure(&policy)

Standard Error interop: as_std(), into_std(), into_boxed_std(), into_dyn_std().

6. Stable Identity and Protocol Projection

6.1 Stable Identity

Each error variant has a permanent machine-readable name:

#![allow(unused)]
fn main() {
use orion_error::{OrionError, StructError};
use orion_error::reason::ErrorIdentityProvider;
use orion_error::protocol::DefaultExposurePolicy;
use orion_error::UnifiedReason;

#[derive(Debug, PartialEq, OrionError)]
enum ApiReason {
    #[orion_error(identity = "biz.invalid_input")]
    InvalidInput,
    #[orion_error(transparent)]
    General(UnifiedReason),
}

assert_eq!(ApiReason::InvalidInput.stable_code(), "biz.invalid_input");
assert_eq!(ApiReason::InvalidInput.error_category().as_str(), "biz");
}

Stable identity never changes — unlike display text, numeric codes, or Rust paths.

The identity prefix (biz, sys, conf, logic) also determines the default ExposurePolicy behaviour.

6.2 Protocol Projection

The same error produces different JSON shapes for different protocol boundaries:

use orion_error::{OrionError, StructError};
use orion_error::protocol::DefaultExposurePolicy;
use orion_error::UnifiedReason;

#[derive(Debug, PartialEq, OrionError)]
enum ApiReason {
    #[orion_error(identity = "biz.invalid_input")]
    InvalidInput,
    #[orion_error(transparent)]
    General(UnifiedReason),
}

let err = StructError::from(ApiReason::system_error())
    .with_detail("disk offline at /dev/sda");

let proto = err.exposure(&DefaultExposurePolicy);

// HTTP response — minimal, safe for external clients
proto.to_http_error_json();

// Log output — full context for debugging
proto.to_log_error_json();

// RPC response — hides internal detail
proto.to_rpc_error_json();

// CLI output — human-readable summary
proto.to_cli_error_json();

7. Testing

#![allow(unused)]
fn main() {
use orion_error::dev::testing::assert_err_identity;
use orion_error::prelude::SourceErr;
use orion_error::reason::ErrorCategory;
use orion_error::reason::UnifiedReason;

let err = std::fs::read_to_string("config.toml")
    .source_err(UnifiedReason::system_error(), "read config failed")
    .unwrap_err();

assert_err_identity(&err, "sys.io_error", ErrorCategory::Sys);
}

Test helpers: assert_err_code(), assert_err_category(), assert_err_identity(), assert_err_operation(), assert_err_path().

8. Best Practices

  • Define domain reasons with #[derive(OrionError)]
  • Use source_err(reason, detail) as the unified error entry point
  • Use conv_err() for cross-layer reason conversion
  • Use identity_snapshot() for stable identity inspection
  • Use exposure(...) for protocol boundary output
  • Use explicit interop APIs when entering std::error::Error ecosystem

Protocol Contract

1. Three-Layer Structure

  1. Stable identity: ErrorIdentity
  2. Exposure decision: ExposureDecision
  3. Output projections: HTTP / CLI / log / RPC / user debug

Roles:

  • StructError<R> — runtime propagation
  • ErrorIdentity — stable identification
  • DiagnosticReport — human diagnostics
  • ErrorProtocolSnapshot — identity + decision + report assembly

2. Stable Identity

ErrorIdentity

Fields:

  • code — stable machine key
  • category — stable classification
  • reason — stable human summary
  • detail — variable description (not a key)
  • position — source location
  • path — stable path projection

Entry points:

  • StructError::identity_snapshot()
  • assert_err_code(…) — asserts stable code string, not numeric error_code()
  • assert_err_category(…)
  • assert_err_identity(…)

3. Exposure

protocol::ExposureDecision

Fields:

  • http_status
  • visibility
  • default_hints
  • retryable

Default policy (DefaultExposurePolicy):

Categoryhttp_statusvisibility
Biz400Public
Conf / Logic / Sys500Internal

sys.network_error, sys.timeoutretryable = true. All others retryable = false.

Entry points:

  • ExposurePolicy::decide(…)
  • StructError::exposure(…)
  • StructError::into_exposure(…)

4. ErrorProtocolSnapshot

Fields:

  • identity
  • decision
  • report (read-only via report())

Entry points:

  • StructError::exposure(…)
  • StructError::into_exposure(…)

Use cases: test snapshot, gateway reprojection, unified protocol output, debug summary.

5. HTTP Projection

Requires serde_json feature.

JSON fields: status, code, category, message, visibility, hints

Rules:

  • Publicmessage uses detail
  • Internalmessage uses stable reason

Entry: ErrorProtocolSnapshot::to_http_error_json()

6. CLI Projection

Requires serde_json feature.

JSON fields: code, category, summary, detail, visibility, hints

Rules:

  • summary uses compact render
  • detail uses verbose render

Entry: ErrorProtocolSnapshot::to_cli_error_json()

7. Log Projection

Requires serde_json feature.

JSON fields: code, category, reason, detail, path, visibility, hints, root_metadata, context, source_frames

Rules:

  • Full context preserved
  • Full root_metadata preserved
  • Full source_frames preserved

Entry: ErrorProtocolSnapshot::to_log_error_json()

8. RPC Projection

Requires serde_json feature.

JSON fields: status, code, category, reason, detail, visibility, hints, retryable

Rules:

  • detail only visible when Public
  • retryable from exposure decision

Entry: ErrorProtocolSnapshot::to_rpc_error_json()

9. User Debug Summary

render_user_debug(…) is a human-readable debug summary, not a machine protocol.

Entry: ErrorProtocolSnapshot::render_user_debug(), .render_user_debug_redacted(…)

Use cases: local debugging, sample output, manual troubleshooting.

Not: HTTP message, stable JSON schema.

10. DiagnosticReport

DiagnosticReport does not require ErrorIdentityProvider. Suitable for text rendering, redaction, human diagnostics.

Entry: StructError::report(), StructError::into_report()

  1. Runtime propagation → StructError<R>
  2. Stable identification → identity_snapshot()
  3. Unified output → exposure(…)
  4. Protocol output → projection API
  5. Human summary → render_user_debug(…)

Avoid:

  • Using Display text as protocol key
  • Using CLI text as machine protocol
  • Using raw detail as stable assertion

Report / Exposure Boundary

This document describes the responsibility boundary between DiagnosticReport and ErrorProtocolSnapshot.

Current State

category and code have been removed from DiagnosticReport. Identity data now lives exclusively in ErrorProtocolSnapshot.identity. All exposure bridge methods on DiagnosticReport (exposure_identity, http_status, visibility, default_hints, decision, exposure, to_exposure_json) have been deleted.

StructError<T>::report() only requires DomainReason, not ErrorIdentityProvider.

1. Object Roles

ObjectResponsibility
StructError<R>Runtime propagation, source chain, context attachment
DiagnosticReportHuman diagnostic view, redaction, text rendering
ErrorProtocolSnapshotIdentity + exposure decision + report, user debug, protocol JSON projection

Human diagnostics:

#![allow(unused)]
fn main() {
let report = err.report();
let text = report.render();
}

Protocol/projection:

#![allow(unused)]
fn main() {
let proto = err.exposure(&policy);
let debug = proto.render_user_debug();
let http = proto.to_http_error_json()?;
}

3. Principles

  • DiagnosticReport stays a diagnostic object.
  • ErrorProtocolSnapshot is the sole exposure/projection closure.
  • StructError routes runtime errors into either report or protocol layer.

In short:

  • Need text diagnostics → report()
  • Need exposure / JSON projection → exposure(…)

4. From DiagnosticReport to Protocol

If the caller starts from an existing DiagnosticReport (not StructError):

#![allow(unused)]
fn main() {
let proto = ErrorProtocolSnapshot::from_report_skeleton(report, identity, &policy);
}

But if full projection data (root metadata, source frames, path) is needed, prefer StructError::exposure(...).

5. Summary

The current design keeps DiagnosticReport focused on diagnostics while ErrorProtocolSnapshot handles all exposure and projection concerns. The two paths are independent and should not be mixed.

Logging

orion-error logging capabilities are built around OperationContext and OperationScope.

1. Feature

[dependencies]
orion-error = { version = "0.8.0", features = ["log"] }
# or
orion-error = { version = "0.8.0", features = ["tracing"] }

Default features include log.

Behavior:

  • log only: uses log macros
  • tracing enabled: prefers tracing
  • Both enabled: prefers tracing

2. Basic Usage

#![allow(unused)]
fn main() {
use orion_error::OperationContext;

let ctx = OperationContext::doing("order_processing")
    .with_field("order_id", "123")
    .with_field("amount", "100.0")
    .with_meta("component.name", "order_service");

ctx.info("start");
ctx.debug("payload prepared");
ctx.warn("slow upstream");
ctx.error("final failure");
ctx.trace("verbose trace");
}

Aliases: log_info, log_debug, log_warn, log_error, log_trace.

3. Automatic Result Logging

#![allow(unused)]
fn main() {
use orion_error::OperationContext;

let mut ctx = OperationContext::doing("sync_user")
    .with_auto_log()
    .with_field("user_id", "42");

do_sync()?;
ctx.mark_suc();
}

Default result is Fail. If with_auto_log() is enabled but neither mark_suc() nor mark_cancel() is called before drop, a failure log is emitted.

4. OperationScope

OperationScope is a guard for scoped lifecycle management.

#![allow(unused)]
fn main() {
use orion_error::OperationContext;

let mut ctx = OperationContext::doing("sync_user").with_auto_log();

{
    let mut scope = ctx.scope();
    scope.with_field("user_id", "42");
    validate()?;
    scope.mark_success();
}
}

Methods:

  • scope() — default failure; must call mark_success() explicitly
  • scoped_success() — default success; use mark_failure() or cancel() to override
  • mark_success() — mark as success
  • mark_failure() — revert to failure
  • cancel() — mark as cancelled

5. When to Use scoped_success()

scoped_success() is suitable when:

  • The scope already handles failure branches internally
  • Failure is explicitly handled via mark_failure()
  • The code does not use ? to return early

Example:

#![allow(unused)]
fn main() {
let mut ctx = OperationContext::doing("process_order").with_auto_log();

{
    let mut scope = ctx.scoped_success();
    let ok = validate_order();
    if !ok {
        scope.mark_failure();
    }
}
}

Not recommended:

let mut scope = ctx.scoped_success();
validate()?;

Because scoped_success() defaults to success on creation. If ? returns early, the scope is still marked as success on drop.

For fallible flows with early returns, prefer:

#![allow(unused)]
fn main() {
let mut scope = ctx.scope();
validate()?;
scope.mark_success();
}

6. op_context! Macro

#![allow(unused)]
fn main() {
use orion_error::op_context;

let ctx = op_context!("load_config").with_auto_log().with_field("path", "config.toml");
}

This macro expands module_path!() at the call site, adding more accurate module paths to automatic result logs.

7. Best Practices

  • Use doing(...) to name operations
  • Use with_field(...) / with_meta(...) for chained construction
  • Use record_field(...) / record_meta(...) only when a mutable reference already exists
  • Use with_auto_log() only on scopes that need result logging
  • For fallible logic with ?, prefer scope() + mark_success()
  • Use scoped_success() only when failure paths are explicitly handled

Ecosystem Comparison: orion-error vs anyhow / thiserror / color-eyre

Scope: anyhow / thiserror / color-eyre / orion-error


1. Positioning

Dimensionanyhowthiserrorcolor-eyreorion-error
PositioningQuick error handlingStandard error deriveDiagnostic error reportingStructured error governance framework
Target usersApp developers (rapid prototyping)Library authorsApp developers (diagnostics)Large multi-team projects
Problem domainReduce error handling boilerplateReduce Error impl boilerplateImprove error diagnostic outputUnified error modeling → runtime propagation → boundary protocol projection
Abstraction levelType erasureType-safe enumType erasure + diagnosticsGeneric structured carrier

2. Core Capabilities

Error Definition

Capabilityanyhowthiserrorcolor-eyreorion-error
Custom error typesNot directly#[derive(Error)]Not directly#[derive(OrionError)]
Generic error typeBox<dyn Error>User-defined enumBox<dyn Error>StructError<T: DomainReason>
Stable identityNoNoNostable_code() + ErrorCategory
Numeric ErrorCodeNoVia #[error(...)]NoBuilt-in error_code()
Display / sourceAutoAutoAutoAuto (OrionError derive)

Runtime Propagation

Capabilityanyhowthiserrorcolor-eyreorion-error
Context attachment.context() / .with_context()No.sections() / .note()OperationContext (doing/at/path + KV + metadata)
Context pathSingle-layer contextNoSingle-layerMulti-layer nested path via target_path()
Custom metadataNo (message only)NoSection traitErrorMetadata (typed KV, not in Display)
Source chainStandard chainStandard chainStandard + SpanTraceDual-channel (Std/Struct) + rich SourceFrame
Cross-type conversionanyhow!() macro#[from]eyre!() macrosource_err() / conv_err()

Boundary Output

Capabilityanyhowthiserrorcolor-eyreorion-error
Human diagnostics.display_chain()NoColored outputreport().render() + RedactPolicy
Protocol JSON (HTTP/RPC)NoNoNoexposure()to_*_error_json()
Stable snapshotNoNoNoStableErrorSnapshot + versioned schema
Exposure policyNoNoNoExposurePolicy (status/visibility/hints/retryable)
RedactionNoNoLimitedRedactPolicy trait

std::error::Error Ecosystem

Capabilityanyhowthiserrorcolor-eyreorion-error
Implements StdErrorYesYesYesExplicit bridge (as_std() / into_std())
dyn Error compatibleNativelyNativelyNativelyLossy (OwnedDynStdStructError)
Third-party interop.context() / anyhow!()#[from].sections() / eyre!()source_err() / raw_source()

3. Coexistence Strategy

LayerRecommended
Outside boundary (3rd-party libs, FFI)thiserror / standard Error trait
Entering structured systemorion-error source_err()
Business layer propagationorion-error StructError<R>
Cross-layer (repo → service → handler)orion-error conv_err()
Boundary outputorion-error exposure()
Quick prototyping / glue codeanyhow (supported via anyhow feature)
Terminal diagnosticsorion-error report().render() or color-eyre

4. When to Use What

Choose orion-error

  • Multi-layer Rust backend services (repo → service → handler → protocol)
  • External HTTP/RPC/gRPC interfaces with unified error responses
  • Microservices with stable error codes and monitoring classification
  • Multi-team projects needing consistent error conventions
  • Persistent/versioned error snapshots

Choose alternatives

  • Single-file scripts or CLI tools → anyhow
  • Low-level libraries exposing std::error::Error → thiserror
  • Terminal applications needing pretty error output → color-eyre
  • Projects with only 1-2 layers, no structured context needed → thiserror + anyhow

Comparison with thiserror

orion-error and thiserror are not mutually exclusive, but their positioning differs.

Positioning

thiserror: Define standard Rust error types, serving the std::error::Error ecosystem.

orion-error: Define runtime structured error carriers, managing context, source frames, snapshots, and protocol projections.

Capability Comparison

Capabilitythiserrororion-error
Define standard error typesStrongNot primary goal
Domain reason deriveNeeds extra identityOrionError is recommended
Runtime structured contextNoYes
Source frame trackingNoYes
stable code / categoryNoYes
snapshot / report / projectionNoYes

When to Use thiserror

  • Exposing standard std::error::Error types
  • Library APIs requiring standard error types

The Wukong Error Governance Model: Stable Contracts, Reliable Diagnostics, and Adaptive Output

This article discusses one question: how an industrial-grade system can bring highly variable failures into a structure that is governable, diagnosable, and evolvable.

If you only want the main thread, start with these four sections:

  1. Core Tension: why error governance is fundamentally about convergence vs. diagnostics.
  2. Our Approach: The Wukong Error Governance Model: how the model governs errors through stable contracts, reliable diagnostics, and adaptive output.
  3. Error Governance in Rust: how to implement the model with orion-error.
  4. Industrial Validation: WarpParse: how a high-throughput ETL system validates the approach.

This article has three layers. Read as needed:


Error Handling Is the Boundary Between Prototypes and Industrial Systems

A prototype only needs to prove that the happy path works. An industrial system must remain operable, diagnosable, recoverable, and evolvable under non-ideal conditions.

Systems do not live in ideal conditions for long. Inputs change. Dependencies degrade. Networks jitter. Configurations drift. Data accumulates dirty state. Business rules evolve. Execution paths branch dynamically based on users, environment, state, and policy. The happy path is not the whole system. Failure, degradation, retry, rollback, compensation, and manual intervention are also part of the lifecycle.

So an error is not just “an unexpected string outside normal logic”. It is information the system must carry when it continues operating under imperfect conditions, restores state, decides external responses, and supports diagnosis.

Many projects treat error handling early on as “each function’s own business”. Each function decides how to express failure, and that decision gets remade in the next function, the next module, and the next boundary.

That can work in small systems: short call chains, few boundaries, shared memory of context among participants. But once errors start crossing team boundaries, subsystem boundaries, service boundaries, protocol boundaries, or long-term compatibility boundaries without a unified shape, the failure path becomes ungovernable:

  • The same failure becomes a string in module A, an enum in module B, and a panic in module C.
  • Each layer rebuilds JSON at the boundary, but the structure is inconsistent.
  • Troubleshooting finds scattered messages in logs, but no complete error path.
  • Refactoring avoids touching error types because no one knows which upper layers depend on string content.

These problems do not always come from a single function being “badly written”. They can also come from missing tools, weak team conventions, historical drift, turnover, or unclear boundary ownership. The key point is this: once an error must travel across boundaries, be consumed by multiple roles, and remain compatible over time, it is no longer just local control flow.

Error governance defines how a system preserves information after failure, carries it across layers, exposes it externally, supports diagnosis, and evolves it over time. It is not decoration around business logic. It is the information architecture for when business logic fails.


The Industry Has Been Exploring Error Handling for a Long Time

Professional engineers and industry practice already agree that error handling matters. The hard part is not whether to handle errors, but how: without letting errors swallow business code, without letting failures collapse into ungovernable strings, while still giving callers stable decisions and giving troubleshooters enough detail.

Different language designs show that this problem has never had a single answer.

  • C relies mainly on return codes, errno, and conventions. Direct and cheap, but error information fragments easily and callers often miss checks.
  • Java makes exceptions the primary path and distinguishes checked from unchecked exceptions. It strengthens propagation, but also brings exploding exception hierarchies, blurry boundary semantics, and over-catching.
  • Go emphasizes explicit error returns, making failure visible in the call path. But without team discipline, errors easily become layer after layer of wrapped strings.
  • Rust uses Result<T, E>, ?, enums, and the type system to make errors part of ordinary control flow. But classification, context, boundary exposure, and diagnostic policy still require engineering design.

Each design makes tradeoffs. Language mechanisms can lower the cost of error handling, but they do not replace error governance itself. In large systems, the real problem is not choosing exceptions, return codes, or Result. It is deciding how failure information is classified, preserved, transformed, exposed, and observed across the system.

For engineering teams, error handling spans type design, call-chain propagation, logging and observability, protocol output, user experience, operational policy, and long-term compatibility. If each part is handled independently, the cost eventually surfaces in troubleshooting, refactoring, and cross-boundary collaboration.

So error handling cannot rely only on personal experience or local habits. It needs a methodology that can be discussed, executed, and evolved.

To this day, the industry still has no unified cross-language, cross-framework, cross-domain model for error governance. But strong projects have converged on useful practices in different directions:

  • Stable error codes
  • Structured diagnostics
  • Centralized boundary policy
  • State-oriented error presentation
  • Observable failure signals
  • User-facing repair hints

These practices show that error governance is not one API problem. It is a set of engineering constraints built around failure information.

Evidence from Strong Projects

ProjectPracticeLesson
gRPCCross-language RPC failures converge to standard status codesStable classification lets callers retry, degrade, alert, and map user responses
PostgreSQLStable SQLSTATE codes instead of depending on message textMachine contracts and human prose should be separate
KubernetesReadiness, failure reason, and conditions are written into statusErrors can become queryable, automatable system state
TerraformDiagnostics carry severity, summary, detail, and attribute pathErrors should identify location, cause, and repair direction
rustcError codes, source location, labels, notes, and help shape the diagnostic experienceDiagnostics themselves are part of product quality
EnvoyAccess log response flags express stable failure reasonsBoundary-layer errors should be searchable, aggregatable, alertable, and analyzable

These projects differ in form, but point in the same direction: strong error handling designs the failure path as a stable information system. Machines can classify it, humans can diagnose it, internal detail is preserved, external exposure is policy-driven, and the same error serves both the current request and later troubleshooting, monitoring, and evolution.


Core Tension

Any error governance model must resolve one fundamental tension:

Convergence vs. diagnostics

  • Callers need stable, finite classifications, otherwise they cannot make governance decisions such as retry, degrade, alert, or return a user response.
  • Troubleshooters need complete, detail-preserving information, otherwise they cannot identify root cause.

Both demands are valid, but they naturally pull in different directions.

If errors expose too much technical detail to callers, upper layers start depending on the exact failure shapes of databases, network libraries, filesystems, and third-party SDKs. System boundaries are pierced by implementation detail. Refactoring the lower layer then forces contract changes.

If errors preserve only upper-layer business classifications, troubleshooting loses the critical path: what the original failure was, which component it occurred in, which layers it passed through, what each layer added, and why it was finally mapped to the external response.

So the central problem is not “whether to wrap errors”. It is this: how to make errors converge for governance while remaining faithful for diagnostics.

Inadequate Solutions

StrategyFor callersFor troubleshooters
Throw only technical exceptionsNot governableFull information
Throw only business errorsGovernableRoot cause lost
Pure string chainingNot governableHuman-readable, but not structurally queryable
Typed wrapping with preserved causeSome local governanceCause chain preserved, but classification and boundary policy still need extra rules
Swallow the errorClean surfaceAll information lost

Pure string chains only concatenate prose. Typed wrapping with preserved cause information (cause chains, typed wrapping, errors.Is/errors.As) supports some structured querying, but it solves only one part of the problem: how causes are preserved and queried. It does not automatically solve stable error identity, classification boundaries, exposure policy, or mappings to governance actions.

When one single representation is forced to satisfy both governance and diagnostics, one side usually loses: either callers get information too scattered for automation, or troubleshooters get too little information and must grep logs and reproduce failures manually.

Our Approach: The Wukong Error Governance Model

This article calls the methodology the Wukong Error Governance Model. What it “subdues” is not mythological demons, but the wildly varying failures inside industrial systems: using stable contracts to make errors legible, reliable diagnostics to preserve root causes and context, adaptive output to provide the right stable view for each audience, and production observation to keep the model evolving.

The name comes from the image of Wukong defeating monsters on the journey west. In software engineering, what we want to subdue and contain is not literal monsters, but errors themselves: giving them names, classes, traceability, and control, instead of letting them spread as scattered strings, implicit wrapping, and boundary leakage.

The core method is simple: separate stable contracts and reliable diagnostics into two channels, then generate adaptive output per audience.

Internal error model = contract channel + diagnostic channel
Adaptive output = audience-specific projections generated from the internal model under policy

The contract channel contains stable error identity, stable classification, and policy semantics. It serves governance decisions such as retry, degrade, alert, HTTP/RPC/CLI mapping, user hints, and SLA accounting. It should be finite, stable, documentable, and testable.

The diagnostic channel contains the diagnostic chain, context, and detail. It serves root-cause analysis: lower-level causes, traversed layers, the current operation, important fields, component, and environment. It can be richer and more dynamic, but it should not become the stable contract exposed to external callers.

The contract and diagnostic channels describe the information structure carried by errors internally. HTTP responses, RPC errors, CLI output, log records, metric labels, and debug reports are adaptive output views generated at different boundaries. Output views should not leak back and distort the internal error model.

ChannelContainsServesStability requirement
Contract channelStable error identity, stable classification, category, retryable, exposure levelCallers, gateways, monitoring, operations policy, protocol clientsHigh; should be documented and enforced by tests
Diagnostic channelCause chain, operation context, key fields, dynamic detail, lower-level errorsDevelopers, SREs, troubleshooting tools, logging systemsMay change, but must remain detailed, reliable, and traceable

category is a fixed governance dimension for stable classification, such as business, system, configuration, or logic error. It helps distinguish ownership domains quickly and supports alert routing, log aggregation, and boundary policy. retryable, exposure level, HTTP status, and log level are not part of error prose. They are boundary decisions derived from stable identity, classification, and environment policy.

Four concepts must stay distinct:

  • identity is the long-term stable error key used by protocols, monitoring, alerts, documentation, and compatibility.
  • reason is the domain classification expression in code, usually an enum, sealed class, tagged union, or exception subtype.
  • category is the coarse governance dimension used for routing and ownership decisions.
  • policy is the boundary decision rule that derives output views from identity, category, and environment policy.

In short: identity is the contract primary key, reason is the code-level expression, category is the governance dimension, and policy is the boundary action. Do not substitute message text for identity, do not substitute category for identity, and do not scatter policy decisions across handlers.

The relation between reason and identity needs to be understood precisely. reason is the in-process type used by code to express domain classification. identity is the cross-boundary, cross-version error key. One reason variant usually provides one stable identity. External protocols, monitoring, and alerts should depend on identity.code, not on a Rust enum name, a Java class name, or Display prose. When errors cross semantic domains, upper layers should not expose lower-layer reason types, but they may preserve lower-layer errors in the diagnostic chain.

ComponentMeaningExample
Stable error identityMachine-readable error key for long-term compatibilityorder.not_found, system.timeout
Stable classificationFinite categories for governance decisionsbusiness error, config error, system error, timeout, rate limit
Governance attributesAuxiliary decision fields derived from identity and classificationcategory, retryable, exposure level, HTTP status
Diagnostic chainPreserved cause/source path across layersservice failure -> repository failure -> database timeout
ContextStructured environment of the current operation: where, for whom, doing whatoperation, tenant, path, order_id, component
DetailThe current layer’s explanation of this failureread config failed, upstream returned 503

The same error serves both needs through different views, so callers and troubleshooters no longer have to trade away each other’s requirements.

The contract channel should have low cardinality, low dynamism, and long-term stability. It should not include tenant IDs, file paths, SQL, HTTP bodies, third-party message text, user input, or specific field values. Those belong in the diagnostic channel as detail, context, or source chain. The more stable the contract channel is, the more reliable monitoring aggregation, alert routing, protocol compatibility, and automated decisions become. The richer and more reliable the diagnostic channel is, the more effective troubleshooting and repair become.

Different information has different stability. Design and testing should treat it accordingly:

InformationStabilitySuitable as an external contract
Error identity codeHighestYes
CategoryHighUsually yes
Reason variantMedium-highInternal code contract
Retryable / visibility / HTTP statusMediumPolicy contract
DetailLowNo
Context fieldsLow to mediumUsually no
Source chainLowNo

So tests should prioritize identity and policy results, not exact detail strings. Documentation should promise stable identity and classification semantics, not specific diagnostic wording.

Bricks Are Not the Building

A common objection is this: Java exceptions plus error codes, Go sentinel errors plus wrapping, and Rust enums plus cause chains already cover most of what the Wukong model wants. Why add another model?

Because those mechanisms are bricks, not the building. The gap appears in three places.

Error identity is unstable. In the Java ecosystem, exception types often act as routing signals: catch (OrderNotFoundException e). But exception types follow inheritance structure and can change during refactoring. Error codes become optional side fields. Callers first do instanceof, then read getErrorCode(). The code is not the routing key. The Wukong model makes stable error identity a first-class member of the contract channel. Boundary policy routes by identity, decoupled from inheritance. Whether you use enums, error code strings, or tagged unions, the identity itself must be stable, documentable, and enforced by tests.

The classification space grows naturally without bound. Exception-based designs encourage “one class per failure”: SubmitDependencyUnavailableException, InvalidStateException, and so on. The number of classes grows with business failure patterns, with no mechanism forcing convergence into a finite classification space. If exception types also carry classification semantics, classification stops being stable. Every new exception class can affect every boundary that routes by exception hierarchy. The Wukong model constrains the classification space (R) to a finite set. Adding a new variant becomes an intentional compatibility evolution, not casual class proliferation.

Contracts and diagnostics share one channel and constrain each other. Cause chains, structured wrapping, and errors.Is/errors.As solve diagnostic preservation: how root causes are kept and queried. They do not decide whether to retry or degrade, which HTTP status to map, whether to expose the error to users or only to logs. If governance actions are derived ad hoc from exception types, fields, or local judgments, they spread into every handler, every catch block, and every errors.Is call. The Wukong model separates contract information from diagnostic information and makes governance decisions come from centralized policy rather than local code.

So the key difference is not which language feature you use. It is whether your error architecture has four load-bearing walls: stable error identity that survives type refactors, a finite classification space governed by compatibility rules, diagnostics preserved across layers, and centralized boundary policy instead of repeated decisions in handlers. Without those structures, any language’s error handling turns into an ungovernable thicket, even when the underlying bricks are excellent.


Design Principles

The Wukong Error Governance Model needs five principles working together: a unified carrier, a stable contract channel, a faithful diagnostic channel, centralized boundary output, and explicit bridges to external ecosystems.

Code snippets in this section are language-agnostic pseudocode intended only to show the methodological shape.

Principle 1: Use a Unified Carrier for Contract and Diagnostic Information

Your own cross-layer propagation paths should use one structured model.

Anti-pattern:

#![allow(unused)]
fn main() {
// Module A returns io::Error
fn read_file() -> io::Result<Data>

// Module B returns a custom enum
fn validate() -> Result<Data, ValidationError>

// Module C returns a string
fn process() -> Result<Data, String>
}

Callers must learn a new error shape for every path. When composing multiple functions, the caller must classify, rebuild diagnostics, and decide boundary output format all over again.

Preferred shape:

read_file() -> Result<Data, StructuredError<ErrorClass>>
validate() -> Result<Data, StructuredError<ErrorClass>>
process() -> Result<Data, StructuredError<ErrorClass>>

Only a unified carrier can hold stable classification and diagnostics at the same time. What varies is the classification space and the context. Different layers may define different classification spaces, but cross-layer propagation must have clear convergence or boundary-conversion rules.

A unified carrier does not mean third-party libraries, the standard library, framework exceptions, or protocol errors must all become one type. It means the propagation paths your team controls should use one structured model internally, and bridges to external ecosystems should be explicit.

Principle 2: Keep the Contract Channel Stable

Error classification contracts should evolve under backward-compatible rules.

Classification is a contract because callers depend on it to make governance decisions. “Stable” does not mean you can never add classifications. It means the machine key and semantics of existing classifications should not change casually.

The error identity is the machine primary key of the contract channel. In practice it is often a stable string, numeric code, or protocol field such as business.not_found or system.timeout. Callers, gateways, monitoring, alerts, and documentation should depend on that identity, not on message text.

Compatibility rules for classification contracts:

  • You may add new identities or classifications to represent new business or system failures.
  • You should not delete committed public identities. If one must be retired, keep a compatibility mapping or versioned migration path.
  • You should not change the meaning of an existing identity, such as turning business.not_found from “resource does not exist” into “permission denied”.
  • You should not let one identity produce contradictory governance actions at different boundaries, such as retryable in one place and non-retryable in another.
  • You may change wording, diagnostic detail, context fields, and lower-level source chains, as long as identity and classification semantics remain intact.
Should remain stableMay change
Stable error identityDiagnostic detail
Classification semanticsError wording
Category (business / system / config)Specific technical detail

Stable classification has another benefit: it becomes the shared interface between humans and systems. Operations rules, gateway status mappings, API documentation, and response specs all depend on stable identity and classification semantics rather than message text. Enums, exception types, numeric codes, or tagged unions are only implementation forms for expressing that contract.

Granularity must be constrained. A new stable error identity usually needs at least one of the following:

  • Callers need a different governance action such as retry, degrade, stop retrying, or manual intervention.
  • A boundary needs a different protocol status, public error code, user message, or repair hint.
  • Monitoring, alerts, SLA reporting, or operations reporting need independent aggregation.
  • SREs, business owners, or rule developers need to track it as a distinct failure class.
  • The semantics are stable over time and do not depend on the current database, SDK, network library, or implementation detail.

The following usually do not justify a new stable identity:

  • Only the wording differs.
  • Only the field name, file name, tenant, path, line/column, or sample content differs.
  • Only the lower-level library error type differs, but the governance action is the same.
  • It only exists to make logs more detailed.

Those dynamic differences belong in the diagnostic channel. Otherwise the classification space keeps expanding until the contract channel degenerates into another form of log text.

Principle 3: Preserve the Diagnostic Channel Across Layers

As errors propagate internally, layers should add information without damaging the existing diagnostic chain.

Anti-pattern:

repository() -> Result<Data, RepoError> {
    // database connection failed, returns RepositoryConnectionFailed
}

service() -> Result<Data, ServiceError> {
    data = repository()?  // lower-level specifics are discarded
    return data
}

Preferred shape:

repository() -> Result<Data, StructuredError<RepositoryClass>> {
    // database connection failed, original database error preserved
}

service() -> Result<Data, StructuredError<ServiceClass>> {
    data = repository()
        .source_err(ServiceDependencyFailed, "load repository data failed")
    return data
}

The information preserved by each layer forms a complete error chain, allowing diagnosis to trace from the final error back to the original root cause.

There are two different operations here:

  • If the current layer only converges lower-level classification into the upper layer’s classification space and does not introduce a new semantic boundary, it should preserve the existing diagnostic chain without inventing a new error narrative.
  • If the current layer expresses a new failure meaning, it should keep the lower-level error as a cause and add the current layer’s explanation.

The deciding factor is the semantic domain, not the number of stack frames.

If upper and lower layers belong to the same semantic domain, the conversion is usually only classification convergence. For example, a database driver, a query executor, and a repository helper all live in the data-access domain. They may converge a lower connection failure into RepositoryConnectionFailed while preserving the original database error and context, without adding business narrative at every level.

If the error crosses a semantic domain or an architectural responsibility boundary, a new semantic boundary should be introduced. For example, once a data-access failure enters the order service, the upper layer should not care about “database connection failed”. It cares about “load order draft failed” or “submit-order dependency unavailable”. At that point the service layer should add its own semantic meaning and keep the lower data-access error as a cause.

Useful diagnostic questions:

  • Is this layer hiding implementation details from the layer above?
  • Does this layer have new business meaning, user intent, or operation goals?
  • Will this layer change governance action, such as mapping a low-level timeout into business dependency unavailable?
  • If the lower implementation is replaced in the future, should the upper-layer error contract stay the same?

If the answer is yes, this is usually a semantic boundary. If it is only module splitting, a helper, or technical layering within one domain, then classification convergence and diagnostic preservation are enough. Redaction and formatting can happen later at output boundaries.

Principle 4: Centralize Output at Boundaries

Boundary exposure policy should be defined centrally, not re-decided at each boundary point.

Structured errors carry both the contract and diagnostic channels as they propagate internally. At the boundary, the boundary layer reads the stable error identity from the contract channel and passes it to a unified policy to generate output views.

StructuredError<ErrorClass>
    -> error_identity
    -> exposure_policy
    -> HTTP response / RPC error / CLI output / log record / metric label

Anti-pattern:

// handler A
match err {
    NotFound => HttpResponse(404, "not found"),
    Timeout => HttpResponse(503, "try again"),
}

// handler B
match err {
    NotFound => HttpResponse(404, "resource missing"),
    Timeout => HttpResponse(504, "gateway timeout"),
}

The two handlers expose the same error inconsistently.

Preferred shape:

// central policy definition
policy.status(error_identity) {
    match error_identity {
        "business.not_found" => 404
        "system.timeout" => 503
        _ => 500
    }
}

// all boundaries use the same policy
render_error_response(err, policy)

Centralized policy must cover more than HTTP status. It should also cover:

  • Public error codes and user-visible messages
  • HTTP/RPC/CLI format mapping
  • Log levels and structured log fields
  • Whether to trigger alerts or count toward SLA
  • Whether callers should retry, degrade, or stop retrying
  • Diagnostic redaction and exposure level
  • Metric labels and aggregation dimensions

If those decisions are scattered across handlers, workers, and controllers, the same error identity will produce contradictory output across boundaries and the contract channel will stop being stable.

Principle 5: Bridge External Ecosystems Explicitly

Crossing into external ecosystems such as logging systems, standard error interfaces, or third-party libraries should be explicit.

Anti-pattern:

// the caller unknowingly degrades the error into a plain string
handle(error_as_text)  // structured information erased

Preferred shape:

// explicit choice to enter an external ecosystem
plain_error = err.to_plain_error()
log_record = err.to_log_record(redaction_policy)

Explicit bridges ensure that loss of structure, redaction, or degradation is intentional rather than accidental. Each bridge function should have a clear bridge contract:

  • Who is the target consumer: user, protocol client, logging system, monitoring system, third-party library, or standard error interface?
  • What is preserved: stable error identity, classification, cause-chain summary, operation context, key fields, retryable, visibility?
  • What is dropped: internal implementation types, sensitive fields, excessively long lower-level errors, dynamically formatted text that is not stable to parse?
  • What is redacted: tokens, secrets, personal data, tenant-isolation data, internal topology, SQL fragments, or payload bodies?
  • How does degradation work: if the target ecosystem accepts only strings or ordinary exceptions, which fields are compressed into text and which are lost entirely?

Different bridge targets need different contracts. Logging should keep error identity, classification, context, key fields, and a cause summary. External responses should expose only the public error code, public message, and repair hint. Bridging to a standard error interface may preserve only text and the source chain. The point is not “carry everything everywhere”. The point is that every output is auditable, testable, and predictable.

Boundary Safety and Cross-Process Output

Structured errors preserve more information internally, so boundary output must handle governance, safety, and privacy together. The diagnostic channel may be rich, but that does not mean everything may leave the current trust domain.

Trust domains should be handled in layers:

Trust domainRecommended payload
In-processFull structured error, source chain, detail, context
Service-to-service / cross-processProtocol snapshot, identity, category, public message, retryable, correlation ID
User boundaryStable error code, public message, necessary repair hint, correlation ID
Observability systemsRedacted diagnostic summary, key context, aggregation labels
Support bundles / debug reportsRicher diagnostics under permission control, with redaction and lifecycle constraints

Different views should have different exposure levels:

  • User responses should include only stable error code, public message, required repair hints, and correlation ID.
  • Protocol clients should receive machine-usable governance fields such as identity, category, and retryable, but not the internal source chain.
  • Logs and reports should preserve redacted context, cause summaries, and key diagnostic fields.
  • Support bundles or debug reports may include richer diagnostics, but only with permission, redaction, and lifecycle controls.

Across services, processes, or message queues, you should usually avoid shipping the full internal error object directly. A safer approach is to transmit a protocol snapshot: stable identity, category, public message, retryable, correlation ID, or trace ID. The full diagnostic chain stays inside the service that produced the error and is correlated through logs, traces, and reports.

That avoids two problems: leaking internal implementation details to external callers, and making downstream systems depend on upstream internal error types, source chains, or library shapes. At cross-process boundaries, the error contract should be a protocol contract, not a direct serialization of in-process diagnostic structure.


Three Error Propagation Modes

The five principles above define the static structure of error governance: a unified carrier, a stable contract channel, a faithful diagnostic channel, centralized boundary output, and explicit external bridging. The three propagation modes below describe the dynamic lifecycle of an error: how it first enters the structured system, how it changes across semantic domains, and how it is finally projected at a boundary. Principles answer “what the structure should look like”. Modes answer “how it moves at runtime”. They are complementary views of the same methodology.

Error propagation is not just mechanically throwing upward. In an industrial system, an error goes through three actions: first entry, cross-layer conversion, and boundary output.

First Entry

When a raw failure such as I/O, parse, or network error enters the structured system for the first time, three things must happen together:

  1. Choose the classification (business vs. system vs. configuration)
  2. Add the current layer’s explanation (detail)
  3. Preserve the raw error as the lower-level cause

The diagnostic concepts divide responsibilities like this: source/cause answers what the root problem was, context answers where, for whom, and while doing what, and detail answers how the current layer interprets the failure.

Cross-Layer Conversion

The upper layer converges lower-level classifications into its own classification space. If it is only remapping classification, it preserves all diagnostics. If it needs a new semantic boundary, it wraps the lower-level error as a cause. The deciding factor is whether the current layer is a new semantic boundary.

A semantic boundary is not the same as an architectural layer boundary. Crossing a function, file, helper, or adapter does not automatically require a new error narrative. Crossing business intent, user operation, governance action, or implementation-hiding boundaries usually does. Over-wrapping fills the source chain with repetitive “failed to process” messages. Under-wrapping leaks low-level technical failures into upper-layer contracts. The criterion is semantic responsibility, not call-stack depth.

Three quick questions help:

Current actionAdd new stable identity?Add new source frame?Mental model
Only converge lower classification into upper classificationNo, only remapNoReason convergence, e.g. conv_err()
Express new business or architectural semanticsYesYesNew semantic boundary, e.g. source_err(...)
Only add current operation path or fieldsNoNoContext enrichment, e.g. doing(...) / with_context(...)

The point of this table is to avoid both extremes: wrapping every layer into a new story, or directly exposing cross-domain low-level technical failures to upper layers.

Boundary Output

At a system boundary such as an HTTP handler, RPC endpoint, CLI entry, or log-writing point, choose the output format, apply exposure policy, and emit the result.

A Full Propagation Example

Here is the full path of one “submit order” failure across the three modes.

First, a database failure enters the structured system in the repository layer. The repository chooses a stable classification in the data-access domain, preserves the database error as source, and adds operation context.

repository.insert_order(order) -> Result<(), StructuredError<RepositoryClass>> {
    db.insert(order)
        .on_error(source_error) {
            return StructuredError {
                identity: "repository.connection_failed",
                class: RepositoryConnectionFailed,
                detail: "insert order failed",
                context: {
                    operation: "insert_order",
                    order_id: order.id,
                    component: "order_repository"
                },
                source: source_error
            }
        }
}

Second, the service layer crosses into the business semantic domain. It does not expose “database connection failed” to upper layers. Instead it expresses a business failure, dependency unavailable during order submission, while preserving the repository error as the cause.

service.submit_order(order) -> Result<(), StructuredError<ServiceClass>> {
    repository.insert_order(order)
        .on_error(repo_error) {
            return StructuredError {
                identity: "order.submit_dependency_unavailable",
                class: SubmitDependencyUnavailable,
                detail: "submit order failed",
                context: {
                    operation: "submit_order",
                    order_id: order.id,
                    tenant: order.tenant
                },
                source: repo_error
            }
        }
}

Third, the HTTP handler reaches the boundary. It does not reinterpret the error. It hands the error to centralized policy to generate output.

handler.post_orders(req) -> HttpResponse {
    result = service.submit_order(req.order)

    if result is error {
        err = result.error
        identity = err.identity

        log_record = policy.to_log_record(err)
        metrics.record(policy.metric_labels(identity))

        return HttpResponse {
            status: policy.http_status(identity),
            body: policy.public_body(identity),
            retry_after: policy.retry_after(identity)
        }
    }
}

At the boundary, the contract channel presents order.submit_dependency_unavailable, which drives status code, user message, retry hint, and metric labels. The diagnostic channel still preserves service detail, repository detail, context, and the original database error. Callers do not need database details, but troubleshooters can still trace root cause.

The Lifecycle of Error Governance

Runtime propagation is only part of governance. Once stable classifications reach production, they must be observed and evolved:

detect -> classify -> enrich -> propagate -> project -> observe -> review/evolve
  • detect: capture the raw technical error or business failure where it occurs.
  • classify: choose the stable identity and classification in the current semantic domain.
  • enrich: add detail, context, and source without polluting the contract channel.
  • propagate: preserve the diagnostic chain across layers and create a new semantic boundary where needed.
  • project: generate output views at HTTP/RPC/CLI/log/metric boundaries under policy.
  • observe: inspect error distribution and governance effectiveness through logs, metrics, traces, and reports.
  • review/evolve: merge, retire, or add error identities based on production feedback, and update policy and documentation.

This step matters. Classification is not one-time modeling. If one identity carries too many different governance actions over time, classification is too coarse. If a group of identities differs only in wording while governance action stays the same, classification is too fine. After L2, production observation should calibrate the classification contract in reverse.


Governance Levels

Error governance maturity has four levels:

L0: No governance

  • Error types are scattered: std::io::Error, String, Box<dyn Error>, custom enums, all mixed together
  • Boundary output is manually concatenated strings
  • Troubleshooting depends on grepping logs

L1: Unified carrier

  • Internal cross-layer paths return the same structured model
  • A basic cause chain exists, but may still be dropped across layers
  • There is still no stable classification contract, and the same failure may be categorized differently in different modules
  • This is only unified expression and propagation, not governance yet

L2: Stable classification

  • The classification contract is stable and documented
  • Boundary output uses unified policy
  • Cause chains are preserved across layers
  • Tests assert error identities rather than message text

L3: Governance-driven

  • Error classifications map directly to governance actions such as retry, degrade, alert, and SLA handling
  • Boundary policy is configurable and may vary by environment
  • Error metrics enter monitoring systems
  • New error types require review before being added

Most teams live between L0 and L1. The step from L1 to L2 is the most underestimated one. Switching return types to a unified carrier is not enough. The team also needs shared semantics around which failures share one identity, which classifications imply retry, and which errors may be exposed externally. Java exception mechanisms can help a project reach L1, but they do not automatically provide stable classification, unified boundary policy, or testing constraints.

Moving from L1 to L2 requires:

  • Standardizing the classification contract: stable identities, classification semantics, category, and governance meaning.
  • Refactoring existing errors: migrate scattered strings, technical exceptions, and temporary enums into stable classifications.
  • Establishing boundary policy: unify HTTP/RPC/CLI/log/metric output rules.
  • Establishing test rules: assert identity and governance decisions rather than message text.
  • Establishing review habits: when adding a new error, discuss semantic ownership rather than only whether the code compiles.

Once you reach L2, tests should not stop at “an error was returned”. A more valuable test matrix includes:

  • Whether identity is stable and independent from message text
  • Whether category, retryable, visibility, and HTTP status match policy
  • Whether the source chain preserves the lower-level root cause
  • Whether context includes the key fields needed to locate the problem
  • Whether exposure is correctly redacted and user responses do not leak internal detail
  • Whether HTTP/RPC/CLI/log/metric views are consistent for the same error identity

The move from L1 to L2 is not just a local refactor. It changes team collaboration. Error classification stops being an individual implementation detail and becomes a shared engineering language.

L3 means error governance has entered organizational process. New error types need review because every new stable identity can affect alerts, retries, SLA handling, user wording, protocol compatibility, and operations dashboards. At that stage, changes to error classification should be managed like API changes: naming conventions, compatibility rules, policy mappings, test coverage, and retirement or migration paths.


When the Model Does Not Fit

  1. Small projects, prototypes, scripts. If there are few boundaries, a short lifecycle, and errors are handled locally, layered governance adds little value.
  2. Extremely performance-sensitive paths. Structured error paths carry costs such as allocation, cause chains, context collection, and serialization. In statically typed languages, generics or templates may also add compile time and code size.
  3. Errors do not cross layers. If all errors are fully handled inside one layer, the benefit approaches zero.

Interim Summary

Up to this point, we have covered the general methodology: why error governance matters, what the core tension is, and how the Wukong model organizes failure information. The next section moves into the Rust implementation.


Error Governance in Rust

Rust is well suited to structured error governance, but it does not perform governance automatically. Result<T, E>, enums, ?, and traits solve syntax for error expression and propagation. Stable error identity, semantic boundaries, diagnostic preservation, boundary output, and bridge contracts still require engineering design.

orion-error turns the Wukong model into Rust infrastructure:

Result<T, StructError<R>>

R                 -> reason / identity / category in the contract channel
StructError<R>    -> runtime carrier for detail / context / source chain
ExposurePolicy    -> boundary output policy
report / interop  -> diagnostics and external ecosystem bridges

R is the error classification contract of the current semantic domain. Different bounded contexts, architectural layers, or business domains may define their own Reason types. Cross-domain propagation expresses semantic boundaries through explicit conversion.

The diagram below shows how an error enters the structured system from a low-level failure, crosses semantic domains, and finally reaches boundary output. Keep three points in mind:

  • Internal propagation uses the unified carrier StructError<R>.
  • One error carries both the contract and diagnostic channels.
  • Boundary layers generate output views from policy instead of reinterpreting the error.
flowchart TB
    raw["Raw failure<br/>IO / DB / Network / Parser"]
    repo["Repository layer<br/>StructError&lt;RepositoryReason&gt;"]
    service["Service layer<br/>StructError&lt;OrderReason&gt;"]
    boundary["System boundary<br/>HTTP / RPC / CLI / Worker"]

    raw -->|"First entry<br/>source_err(reason, detail)"| repo
    repo -->|"Cross semantic domain<br/>source_err(new_reason, detail)"| service
    service -->|"Boundary output<br/>exposure(policy)"| boundary

    subgraph governance["Contract channel: stable, finite, testable"]
        identity["identity<br/>order.submit_dependency_unavailable"]
        category["category<br/>biz / sys / conf / logic"]
        policy["policy<br/>status / retry / visibility / hints"]
    end

    subgraph diagnostic["Diagnostic channel: faithful, traceable"]
        detail["detail<br/>submit order failed"]
        context["context<br/>operation / order_id / tenant / component"]
        source["source chain<br/>service -> repository -> database"]
    end

    service -.carries.-> identity
    service -.carries.-> category
    service -.carries.-> detail
    service -.carries.-> context
    service -.carries.-> source

    identity --> policy
    category --> policy
    policy --> boundary

    boundary --> user["External response<br/>stable error code + public message"]
    boundary --> log["Logs / Report<br/>diagnostic summary + redacted context"]
    boundary --> metric["Metrics / Alerting<br/>identity + category"]

The key idea in the diagram is that errors are not flattened into strings during internal propagation. The boundary layer generates three output views from policy: user response, logs/reports, and metrics/alerts.

Mapping the Five Principles to Rust and orion-error

Methodology principleRust / orion-error implementationEffect
Unified carrier for contract and diagnosticsUse Result<T, StructError<R>> for internal cross-layer propagationCallers face one error shape, with classification space parameterized by R
Stable contract channelDomain reasons define stable identity, category, and classification semanticsCallers, monitoring, and protocol boundaries depend on identity, not wording
Faithful diagnostic channel across layersUse detail, context, and source chain to preserve lower causes and current-layer explanationUpper layers may converge classification without losing root cause
Centralized boundary outputUse exposure policy to decide HTTP/RPC/CLI/log/metric output centrallyAvoid every handler building its own response and redaction rules
Explicit bridges to external ecosystemsUse report, redacted render, standard-error interop, protocol JSON, and similar explicit conversion pathsEvery degradation, redaction, or exposure step has a clear contract

Design Rule 1: Define Reason by Semantic Domain

Each semantic domain should define its own reason type instead of putting every system error into one giant global enum.

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, OrionError)]
enum RepositoryReason {
    #[orion_error(identity = "repository.connection_failed")]
    ConnectionFailed,

    #[orion_error(identity = "repository.write_failed")]
    WriteFailed,

    #[orion_error(transparent)]
    General(UnifiedReason),
}

#[derive(Debug, Clone, OrionError)]
enum OrderReason {
    #[orion_error(identity = "order.submit_dependency_unavailable")]
    SubmitDependencyUnavailable,

    #[orion_error(identity = "order.invalid_state")]
    InvalidState,

    #[orion_error(transparent)]
    General(UnifiedReason),
}
}

That matches the stable classification contract described earlier: repository.connection_failed belongs to the data-access semantic domain, while order.submit_dependency_unavailable belongs to the business domain. The same lower-level failure may trigger both at different boundaries, but they should not be merged into one classification.

Design Rule 2: Build Structured Errors at First Entry

When ordinary I/O, database, network, or parse errors first enter the governance system, three things must happen together: choose the current-layer classification, provide detail, and preserve the lower-level source.

#![allow(unused)]
fn main() {
fn insert_order(order: &Order) -> Result<(), StructError<RepositoryReason>> {
    let ctx = OperationContext::doing("insert_order")
        .with_field("order_id", order.id.to_string())
        .with_meta("component.name", "order_repository");

    db_insert(order)
        .source_err(RepositoryReason::ConnectionFailed, "insert order failed")
        .map_err(|err| err.with_context(ctx))?;

    Ok(())
}
}

Do not convert the lower-level error into a string. The lower-level error is the source, "insert order failed" is repository-layer detail, and order_id plus component.name are context.

Design Rule 3: Create a New Boundary When Crossing Semantic Domains

Within one semantic domain, classification convergence should only convert reasons and should not create a new error story. When crossing into a new business semantic domain, create a new semantic boundary and preserve the lower structured error as the source.

#![allow(unused)]
fn main() {
fn submit_order(order: &Order) -> Result<(), StructError<OrderReason>> {
    let ctx = OperationContext::doing("submit_order")
        .with_field("order_id", order.id.to_string())
        .with_field("tenant", order.tenant.to_string());

    insert_order(order)
        .source_err(
            OrderReason::SubmitDependencyUnavailable,
            "submit order failed",
        )
        .map_err(|err| err.with_context(ctx))?;

    Ok(())
}
}

The service layer does not expose the repository’s connection failure directly to the handler. It expresses a business failure instead: dependency unavailable during order submission. The repository error still remains in the source chain.

Design Rule 4: Boundaries Only Output; They Do Not Reinterpret Errors

HTTP handlers, RPC endpoints, CLI entries, and worker boundaries should not rebuild error semantics. They should pass structured errors to centralized policy to generate responses, logs, metrics, and debug reports.

#![allow(unused)]
fn main() {
fn handle_submit(req: Request) -> HttpResponse {
    match submit_order(&req.order) {
        Ok(()) => HttpResponse::ok(),
        Err(err) => {
            let snapshot = err.exposure(&DefaultExposurePolicy);
            log_error(err.report());
            HttpResponse::from(snapshot)
        }
    }
}
}

The boundary has multiple output views: redacted exposure for users, full report for developers and SREs, and stable identity plus category for monitoring.

Design Rule 5: Test Error Identity, Not Error Wording

After reaching L2, tests should enforce stable error identity and governance decisions, not exact message text. Error wording may improve, be translated, or be redacted. Identity and classification semantics are the long-term contract.

#![allow(unused)]
fn main() {
let err = submit_order(&order).unwrap_err();

assert_eq!(
    err.identity_snapshot().code,
    "order.submit_dependency_unavailable"
);

let exposed = err.exposure(&DefaultExposurePolicy);
assert_eq!(exposed.decision.http_status, 503);
}

These tests force the team to maintain a stable classification contract: new errors need identities, identity changes must consider compatibility, and boundary policy must have explicit expectations.

For a runnable end-to-end example, see orion-error/examples/order_case.rs. It defines reasons separately for parsing, user, storage, and order-service layers. Lower-level failures enter the structured system once, diagnostics are preserved through reason convergence, and boundary output is unified at the end. That example mainly demonstrates the conv_err() convergence path. If an upper layer needs a new business semantic boundary, it should follow this article and use source_err(...) to keep the lower structured error as the source.


Industrial Validation: WarpParse

orion-error is the Rust infrastructure implementation of the Wukong Error Governance Model. But infrastructure still needs validation in real industrial systems: high throughput, long call chains, many roles, many boundaries, and strong observability requirements. That is where you learn whether error governance is truly usable.

WarpParse is the core high-throughput log parsing and ETL engine in the Orion stack. According to the Linux single-machine benchmark in wp-examples/benchmark/report/report_linux.md, WarpParse 0.12.0 achieved an EPS multiplier range of 1.56x-20.30x for pure parsing and 1.34x-17.90x for parse-plus-transform against Vector-VRL 0.49.0, across five log categories (Nginx, AWS ELB, Firewall, APT Threat, Mixed Log) and three topologies (File -> BlackHole, TCP -> BlackHole, TCP -> File).

The benchmark proves industrial intensity: high throughput, multiple formats, multiple topologies, parsing and transformation together. It does not by itself prove error governance quality. The value of error governance has to be judged by whether failure paths can be located, classified, projected, and automated.

So WarpParse validates this methodology not by throughput numbers alone, but by whether complex failure paths can be expressed stably:

  • Can rule errors point to file, line, column, and field?
  • Can configuration errors block rollout instead of triggering system-failure paging?
  • Can data-quality errors be aggregated separately without polluting system-error metrics?
  • Can runtime failures be distinguished into retryable, non-retryable, and manual-intervention-required?
  • Do user view, operator view, and debug view all come from the same stable error identity?

In such a system, if a rule syntax failure returns only a string like this:

unexpected token at line 12

the rule developer still has to open the rule file, find the location, guess which field failed, and decide whether the problem is syntax or sample mismatch. The system also cannot reliably distinguish configuration errors, data-quality issues, and runtime system failures from text alone.

With Wukong-style governance, the same failure becomes structured information:

identity : rule.syntax
category : config
detail   : unexpected token in extractor expression
context  : {
  rule_file      : "rules/nginx.wpl",
  line           : 12,
  column         : 18,
  field          : "request_time",
  expected_token : "identifier",
  actual_token   : ")"
}
policy   : block rule activation, show repair hint, do not page SRE

This is the key validation point in WarpParse: rule developers get precise location and repair clues, the runtime gets stable error identity and governance policy, and operations can count and alert configuration, data, and system failures separately. The higher the throughput, the more you need structured failure paths. Otherwise, stronger processing capacity only amplifies error spread and troubleshooting cost.

WarpParse Error Governance Structure

WarpParse error handling covers the full path of rule development, rule validation, runtime parsing, pipeline execution, boundary output, and operational observation. Read the diagram in three layers: failure sources, contract/diagnostic carriers, and output views.

flowchart TB
    sample["Sample logs<br/>Nginx / ELB / Firewall / APT / Mixed"]
    rule["WPL rules<br/>field extraction / type conversion / enrichment"]
    check["Rule validation<br/>syntax / sample / schema"]
    engine["Parsing runtime<br/>high-throughput parse / transform"]
    pipeline["ETL pipeline<br/>input -> parse -> transform -> output"]
    boundary["System boundary<br/>CLI / API / worker / report"]

    sample --> check
    rule --> check
    check -->|"rule accepted"| engine
    engine --> pipeline
    pipeline --> boundary

    subgraph failure["Failure sources"]
        syntax["Rule syntax error"]
        mismatch["Sample mismatch"]
        typeerr["Type conversion failure"]
        dirty["Dirty data / anomalous field"]
        runtime["Runtime I/O / backpressure / resource issue"]
    end

    syntax --> check
    mismatch --> check
    typeerr --> engine
    dirty --> engine
    runtime --> pipeline

    subgraph governance_wp["Contract channel"]
        wp_identity["Stable error identity<br/>rule.syntax / parse.mismatch / transform.type / runtime.io"]
        wp_category["category<br/>config / data / system"]
        wp_policy["policy<br/>abort? / skip? / alert? / retryable?"]
    end

    subgraph diagnostic_wp["Diagnostic channel"]
        wp_rule_ctx["Rule context<br/>rule file / line / field / pattern"]
        wp_sample_ctx["Sample context<br/>sample id / input slice / expected field"]
        wp_runtime_ctx["Runtime context<br/>source / sink / batch / offset / component"]
        wp_source["source chain<br/>parser -> engine -> pipeline"]
    end

    check -.produces.-> wp_identity
    engine -.produces.-> wp_identity
    pipeline -.produces.-> wp_identity

    check -.preserves.-> wp_rule_ctx
    check -.preserves.-> wp_sample_ctx
    engine -.preserves.-> wp_rule_ctx
    engine -.preserves.-> wp_source
    pipeline -.preserves.-> wp_runtime_ctx

    wp_identity --> wp_policy
    wp_category --> wp_policy
    wp_policy --> boundary

    boundary --> user_view["Rule developer view<br/>error location + repair hint"]
    boundary --> ops_view["Operations view<br/>metrics + alerting + failure classification"]
    boundary --> debug_view["Debug view<br/>redacted context + source chain"]

The core point of this diagram is that WarpParse’s high-performance parsing and its error governance must coexist. orion-error provides the governance infrastructure, and WarpParse validates that the methodology works in an industrial high-throughput ETL system.


Engineering Reuse for AI

The Wukong Error Governance Model should not stop at documentation. A more effective approach is to organize the methodology, design principles, crate or library usage rules, example code, anti-patterns, and migration guidance into reusable engineering skills. In the Orion ecosystem, these skills are maintained in the orion-skills repository: https://github.com/galaxio-labs/orion-skills

AI can use those skills to produce project-level error design documents. One example is Warp Insight’s error-handling system design: https://github.com/wp-labs/warp-insight/blob/main/doc/design/foundation/error-handling-system.md . The value of such documents is not merely listing error types. It is letting AI and human engineers reason about classification, propagation, boundary output, observability, and migration using the same governance model.

That changes AI from “generate a few ad hoc error-handling snippets” into “work inside a defined governance model”. In a new project, AI first identifies error boundaries, semantic domains, stable identities, diagnostic chains, and boundary output, then proposes a governance plan. During implementation, it uses orion-error according to convention, and turns reason definitions, source preservation, context attachment, exposure policy, and test assertions into code.

Skills turn error governance from “prompt the AI with experience” into “give the AI a set of engineering constraints”:

  • Planning stage: identify whether the current system is at L0, L1, or L2, then design the classification contract and migration path.
  • Design stage: split semantic domains and define reason, identity, category, and governance attributes.
  • Implementation stage: choose the correct API for first entry, cross-layer convergence, semantic-boundary wrapping, and boundary output.
  • Review stage: check whether source was lost, whether code depends on error wording, whether handlers rebuild responses repeatedly, and whether tests for stable identities are missing.
  • Migration stage: gradually converge string errors, temporary enums, and generic wrapping into stable contract and diagnostic structures.

Error handling spans architecture, protocols, observability, tests, and team conventions. A single prompt is rarely enough to make it consistent. Once the methodology and library constraints are distilled into skills, AI can reuse the same engineering judgment across projects and generate implementations that stay consistent and maintainable.


Appendix: Language Mechanisms and Ecosystem Adoption

The methodology itself is language-agnostic, but implementation cost differs by language. Two dimensions matter:

  • Language expressiveness: how naturally the language can express stable classification, structured carriers, cause chains, and boundary output.
  • Ecosystem adoption cost: how much organizational and migration effort it takes to adopt the governance model within the existing ecosystem.

High affinity does not mean low adoption cost. Rust’s type system fits this model very well, but the error ecosystem has multiple established paths. Go’s type expressiveness is weaker, but explicit error returns are highly uniform, so introducing a lightweight classification discipline may actually cost less organizationally.

In any language, implementation should answer the same questions:

  • Where does stable error identity live, and can it stay compatible across versions?
  • How is the diagnostic chain preserved, and can root cause be lost during cross-layer conversion?
  • Where do context and detail live, and do they pollute governance classification?
  • Where is boundary policy centralized, and are HTTP/RPC/CLI/log/metric outputs consistent?
  • When entering logs, protocols, standard error interfaces, or third-party frameworks, what is preserved, redacted, or dropped?

Rust — Native Fit

Rust satisfies three important properties together: algebraic types (enum) express classification, match provides exhaustive checking, generics parameterize carriers with type safety, and there is no exception mechanism dominating control flow. Errors are returned as values, which naturally composes with structured carriers.

But real-world Rust adoption is not trivial. The ecosystem has long had multiple orientations such as failure, error-chain, anyhow, thiserror, and eyre: some optimized for fast propagation, some for diagnostics, some for domain error definition. Teams still need to decide which layers use structured governance errors, which boundaries allow fast aggregation, and which identities become long-term contracts.

TypeScript — High Affinity

type AppErrorClass =
  | { kind: "not_found"; id: string }
  | { kind: "system_error" };

Union types and discriminated unions are a natural fit for error classification. Libraries such as neverthrow and Either in fp-ts provide return-value-style error handling. The weakness is runtime type information. Across processes, packages, or JSON boundaries, you still need explicit runtime tags, schemas, or protocol fields to preserve stable identity and classification.

Swift — High Affinity

Algebraic types (enums with associated values) fit error classification well. Result<T, E> is built into Swift 5.0+. The community also has real practice using Result instead of throws.

C# — Needs Mapping into the Exception Ecosystem

Generics are strong and preserve runtime type information, but the ecosystem is exception-centric. There is no native discriminated union, though libraries such as OneOf can simulate it. The more natural mapping is not to force everything into Result, but to use exception hierarchies for classification, inner exceptions for cause chains, and ASP.NET Core middleware for centralized policy.

Java — Needs Mapping into Framework Conventions

Java has generic erasure and an exception-dominant ecosystem. But it has mature cause-chain support, and Spring’s @ControllerAdvice, filters, and interceptors already provide common centralized-boundary patterns. Java 17+ sealed classes, records, and pattern matching also make finite classification much more natural than before.

The key mapping is this: each semantic domain defines its own sealed class, and domains do not inherit from one another. This mirrors Rust’s separate enums such as RepositoryReason and OrderReason. When crossing domains, construct a new domain exception and preserve the old one as the cause, rather than upcasting everything into one shared base type.

// data-access semantic domain (simplified)
public sealed abstract class RepositoryError extends RuntimeException
    permits RepositoryError.ConnectionFailed, ... {

    public abstract String identity();  // e.g. "repository.connection_failed"
    public abstract String category();  // e.g. "system"
    public abstract boolean retryable();

    private DiagnoseContext ctx;
    protected RepositoryError(String detail, Throwable cause, DiagnoseContext ctx) {
        super(detail, cause);
        this.ctx = ctx;
    }

    public static final class ConnectionFailed extends RepositoryError {
        public ConnectionFailed(String detail, Throwable cause, DiagnoseContext ctx) { super(detail, cause, ctx); }
        public String identity() { return "repository.connection_failed"; }
        public String category() { return "system"; }
        public boolean retryable() { return true; }
    }
}

When crossing domains, the service layer constructs OrderError and keeps RepositoryError as the cause:

catch (RepositoryError e) {
    throw new OrderError.DependencyUnavailable("submit order failed", e, ctx);
}

The resulting cause chain is OrderError.DependencyUnavailable -> RepositoryError.ConnectionFailed -> SQLException. The boundary layer then routes centrally with @ControllerAdvice: @ExceptionHandler(OrderError.class) chooses status codes and response bodies from identity().

ConceptRustJava
Semantic-domain classificationenum RepositoryReasonsealed class RepositoryError
Contract channelReason variants + identity stringOverridden identity() / category() / retryable() on subclasses
Diagnostic channelDetail / context / source inside StructErrorgetMessage() / context record / getCause()
Unified carrierStructError<R> generic carrierNot possible — the JLS forbids generic classes from extending Throwable
Cross-domain conversionOne source_err(...) callExplicit try/catch constructing a new exception

Java’s hard constraint is that the JLS forbids generic classes from extending Throwable, so one generic StructuredError<R> carrier cannot unify all domains. But the architectural idea remains the same: independent semantic domains, stable error identity as the primary key, diagnostics preserved via causes, and centralized routing at the boundary.

C++ — Technically Feasible, No Ecosystem Convention

Templates preserve type information. std::expected (C++23) offers a Result-like mechanism, and libraries such as Boost.Outcome provide richer modeling. But C++ has long supported several parallel error paths: exceptions, error codes, expected, Outcome, custom status types, and more. The model is technically feasible, but organizational unification cost is high.

Go — Requires Stronger Team Discipline

The error interface only requires Error() string by default. Structured information has to be added through custom error types, errors.Is/errors.As, and wrapping. Go is not incapable of error governance. Its default ecosystem path is simply optimized for lightweight wrapping, so governance constraints have to be designed deliberately by the team.

Comparison Across Two Dimensions

LanguageLanguage expressivenessEcosystem adoption costMain reason
RustHighMediumType system fits well, but the error ecosystem has multiple established paths
SwiftHighMediumenum and Result fit naturally, but throws remains an important ecosystem path
TypeScriptMedium-highMediumDiscriminated unions are convenient, but runtime schemas or tags are still needed
C#MediumMediumGenerics and middleware are strong, but the ecosystem is exception-centric and DUs are simulated
JavaMediumMediumCause chains and framework boundaries are mature; sealed classes improve finite classification
C++Medium-highHighType capability is strong, but error handling paths are fragmented and hard to standardize
GoLow-mediumMedium-lowType expression is weaker, but explicit error returns are uniform and lightweight conventions spread easily

This table describes implementation friction, not language quality. What really determines error-governance quality is usually not the language itself, but whether the team established stable error identity, diagnostic preservation, boundary policy, and evolution rules.


Conclusion

Error handling is the boundary between a prototype and an industrial system. A prototype only proves that the happy path runs. An industrial system must remain operable, diagnosable, recoverable, and evolvable under input drift, dependency degradation, configuration drift, abnormal data, evolving rules, and unstable runtime conditions.

The Wukong Error Governance Model proposed here splits failure information into two channels:

  • Contract channel: stable error identity, stable classification, category, retryable, and exposure level, used for caller decisions, protocol output, monitoring and alerting, SLA accounting, and long-term compatibility.
  • Diagnostic channel: cause chain, context, detail, and lower-level errors, used for troubleshooting, debugging, rule repair, runtime observation, and system evolution.

These two channels resolve the core tension: errors must converge at the governance level, or automated decisions become impossible; errors must remain faithful at the diagnostic level, or root causes become invisible. A mature error system cannot stop at “pretty wrapping”, nor can it rely on language mechanisms alone. It must explicitly define how stable identities evolve, how diagnostic chains survive across layers, how boundary policy is centralized, and what is preserved or discarded when bridging into external ecosystems.

In Rust, orion-error turns this model into reusable infrastructure: StructError<R> carries contract and diagnostic information, domain reasons provide stable classification contracts, source chains and context preserve diagnostic paths, and exposure/report/interop provide boundary output and bridging. orion-error/examples/order_case.rs gives a small runnable example. WarpParse provides industrial validation: in a high-throughput ETL system, error governance directly affects rule-development experience, runtime observability, boundary output quality, and long-term operations cost.

Error governance is not a side effect of exception syntax, nor a local optimization of log formatting. It is one of the information architectures of an industrial system. Only when failure paths also have stable classification, complete diagnostics, centralized output, and evolvable contracts does a system truly move from “it runs” to “it can keep running”.

Error Governance and AI Programming: orion-error’s Structured Path

Why Error Handling Becomes a Bottleneck

In industrial software, fault localization, bug fixing, and avoidable rework consume a large amount of engineering effort. Studies and industry reports commonly place finding/fixing bugs and avoidable rework in the 40-60% range; this does not mean that “error-handling code itself is 40-60% of the code or effort.” Error governance is concerned with what happens when failure occurs: whether classification is stable, context is preserved, boundary output is consistent, and the diagnostic path is complete.

Useful data points:

  • Hamill and Goseva-Popstojanova, in a NASA fault-fix effort study, cite a Cambridge University report that developers spend about 50% of their time finding and fixing bugs; the same passage cites Boehm/Basili’s 40-50% effort on avoidable rework.
  • Capers Jones, in an ASQ / Software Quality Professional article, summarizes that finding and fixing bugs often exceeds 60% of total software effort.
  • Cabral and Marques, in a field study of 32 Java/.NET applications, show that exception-handling code itself is much smaller: about 5% on average for Java, about 3% on average for .NET, and up to about 7%.

So this article is not about “writing more error-handling code.” It is about using structured error mechanisms to reduce the cost of fault localization, boundary governance, and cross-layer diagnostics. Rust has no exception mechanism and no try-catch; every ? is a propagation decision. Without structure, those decisions become harder to govern as the codebase grows.

The problem is not simply “more code.” The problem is that these decisions lack structure. A typical error-handling decision tree looks like this:

What error can this call return? -> Should I intercept it or propagate it?
If I intercept it, what category should the new error use?
Should I preserve the original error?
Am I at a boundary?
What is the correct format for the caller?

Every layer repeats these decisions, and different developers often answer them differently. The larger the codebase, the more fragmented error handling becomes.

The Core Tension

Every error-governance approach has to handle one tension:

  • Convergence: concrete technical errors need to be abstracted into a small, stable set of upper-layer categories, otherwise callers cannot govern retry, fallback, alerting, or user-facing output.
  • Diagnostics: the convergence process must not lose the information needed for troubleshooting.

In code, the tension often looks like this:

#![allow(unused)]
fn main() {
// Converges, but loses diagnostics.
Err(AppError::SystemError)

// Preserves some detail, but gives up governance.
Err(anyhow::format_err!("concrete error: {e}"))
}

orion-error handles this by separating the two dimensions: classification converges into a reason, while diagnostics stay in the source chain and context.

What This Means for AI Programming

Structure as Prompt

AI models, especially LLMs, generate code by following patterns. When error handling is structured, the model has a clearer pattern to follow.

Unstructured pattern:

#![allow(unused)]
fn main() {
// The model has to guess which error type matters here.
fn load_config() -> Result<Config, Box<dyn Error>> {
    let text = std::fs::read_to_string("config.toml")?;
    let cfg = toml::from_str(&text)?;
    Ok(cfg)
}
}

The model has to infer what is inside Box<dyn Error> and how callers should handle it.

Structured pattern:

fn load_config() -> Result<Config, StructError<ConfigReason>> {
    let text = std::fs::read_to_string("config.toml")
        .source_err(ConfigReason::ReadFailed, "read config file")
        .doing("load config")?;
    let cfg = toml::from_str(&text)
        .source_err(ConfigReason::ParseFailed, "parse config")
        .doing("parse config")?;
    Ok(cfg)
}

The reason variant is explicit, so both the model and the developer choose from a constrained classification space. The source_err + doing pattern is easier to generate, inspect, and review than free-form string wrapping.

Constrained Classification Space

UnifiedReason provides built-in categories such as validation, system, network, timeout, and config. Common technical failures get default categories first; domain-specific failures can then add project-specific reasons. This means many error paths do not need a new classification scheme from scratch.

For project-specific reasons, the transparent-variant pattern is stable:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, OrionError)]
enum AppReason {
    #[orion_error(identity = "biz.xxx")]
    SpecificError,
    #[orion_error(transparent)]
    General(UnifiedReason),
}
}

This gives code generation a stable template: business variants need a stable identity, and common failures reuse UnifiedReason through a transparent variant. The actual business semantics still need human review.

Boundary Projection Becomes Centralized

Protocol-boundary output is one of the easiest places for generated code to get things wrong:

  • exposing internal detail directly to users
  • choosing the wrong HTTP status
  • producing inconsistent shapes across protocols

orion-error’s ExposurePolicy centralizes that decision:

#![allow(unused)]
fn main() {
impl ExposurePolicy for MyPolicy {
    fn http_status(&self, identity: &ErrorIdentity) -> u16 {
        match identity.code.as_str() {
            "biz.not_found" => 404,
            "biz.invalid" => 400,
            _ => 500,
        }
    }
    // visibility, retryable, and hints have defaults.
}
}

Boundary projection no longer has to be hand-written in every handler. Generated code can follow one policy invocation pattern; exact status codes, visibility, retryability, and hints remain team-defined and reviewable.

Test Paths Become Easier to Infer

Structured errors also make test assertions more direct:

#![allow(unused)]
fn main() {
// Easier for generated code to produce, and easier for reviewers to check.
let err = function_that_fails().unwrap_err();
assert_err_identity(&err, "biz.not_found", ErrorCategory::Biz);
assert_err_operation(&err, "load config");
}

Instead of:

#![allow(unused)]
fn main() {
// Requires guessing exact display text.
let err = function_that_fails().unwrap_err();
assert!(err.to_string().contains("not found"));
}

Deeper Impact on AI Programming

From Code Generation to Decision Structuring

Most AI programming tools operate at the level of generating code snippets. Structured error governance turns part of the decision into data: when an error path is represented by an enum variant rather than a free-form string, the model is no longer only writing prose; it is choosing from a constrained set. That choice can still be wrong, but it is easier to constrain with types, tests, and review.

Error-Path Coverage

Generated code often underinvests in error paths. Error paths appear less frequently than happy paths in training data and examples. A structured system turns error paths into repeatable patterns such as source_err + doing + conv_err. Once the model recognizes that a call can fail, there is a clearer API path to follow, and the result is easier to cover with tests.

Cross-Layer Consistency

In multi-person codebases, different developers often handle the same kind of failure differently. Generated code can make this worse because the model may produce different styles in different contexts. Structured governance moves consistency requirements into shared definitions: reason enums and exposure policies. Both human-written and generated code then work under the same constraints.

LLMs and Error Governance

Traditional error handlingConstraint qualityStructured error governance
Free-form stringsHard to constrainEnum variants
Ad-hoc classificationHard to reviewFixed classification space
Local error-message decisionsInconsistentRepeatable API patterns
Boundary output decided per handlerFragmentedCentralized policy

The reliability benefit is not that the model becomes “smart enough” to understand every failure. The benefit is that the task is shifted away from free-form generation and toward choosing from constrained options that can be checked by types, tests, and review.

Limits

  1. Up-front modeling cost remains. Reason categories and exposure policies still require human design.
  2. Small projects may not need this. A short script or prototype is often better served by thiserror or anyhow.
  3. Business semantics are still hard. Choosing between validation_error and business_error still requires domain judgment.

Summary

orion-error’s structured error model fits AI-assisted programming because both benefit from the same principle: turn implicit decisions into explicit structure. Implicit decisions rely on context interpretation, where both humans and models make mistakes. Explicit structure gives the system enums, source chains, context, policies, and tests.

This is one possible direction for Rust error governance: not a smarter error type by itself, but a system that makes error-handling decisions more predictable, enumerable, and reviewable for both humans and AI-assisted tools.

Design Constraints

Cross-StructError From Conversion: Orphan Rule Limitation

Problem

Cross-layer error conversion (StructError<ParseReason>StructError<OrderReason>) requires an explicit .conv_err() call. A blanket From to make ? work automatically is blocked by Rust’s orphan rule.

#![allow(unused)]
fn main() {
// Desired but impossible:
fn place_order() -> Result<OrderDraft, StructError<OrderReason>> {
    let draft = parse_order()?;  // expected auto From<ParseError> → OrderError
    Ok(draft)
}

// Actual:
fn place_order() -> Result<OrderDraft, StructError<OrderReason>> {
    let draft = parse_order().conv_err()?;  // explicit conversion
    Ok(draft)
}
}

Root Cause

Rust’s orphan rule prohibits implementing From<Foreign<Local>> for Foreign<Local2> from a downstream crate:

#![allow(unused)]
fn main() {
impl From<orion_error::StructError<UserLocalReason>>   // Foreign<Local>
    for orion_error::StructError<UserLocalReason2>      // Foreign<Local2>
}
  • From = std trait (foreign)
  • StructError = foreign type (from orion-error)
  • Even though LocalReason and LocalReason2 are local types

The orphan rule requires at least one local anchor in either the trait or the implementing type. Neither From nor StructError<_> satisfy this when the impl is written in a downstream crate.

Attempted Workarounds

ApproachResult
Direct impl From<StructError<A>> for StructError<B> in downstream❌ orphan rule
Derive attribute upcast_from(SubReason) on target type❌ orphan rule
Derive attribute upcast_to(MainReason) on source type❌ orphan rule
Make ? auto-convert across reasons❌ can’t use From
newtype struct AppError(StructError<T>)✅ works, but changes every return type

Conclusion

.conv_err()` is the recommended path. The newtype wrapper can technically bypass the orphan rule but the cost (wrapping every function return type) far outweighs the benefit of saving one explicit call. Rust’s orphan rule is a core guarantee for ecosystem compatibility and is unlikely to change for this use case in the foreseeable future.

orion-error 0.8.0 Architecture

This document describes the ideal design architecture of orion-error 0.8.0: the design constraints behind the public API, the core error flow, and the governance goals. Struct snippets are conceptual models, not exact source snapshots; the implementation in src/ remains the source of truth for precise fields.

The Problem

In large Rust services, error handling faces five unmet needs:

  1. Convergence without loss. Lower-layer technical errors must be abstracted into upper-layer stable semantics — but the original cause (source chain, detail, context) must remain available for diagnostics.
  2. Cross-layer propagation. An error passes through multiple layers (handler → service → repository → database). Each layer needs to attach its own context without discarding what came before.
  3. Boundary projection. The same error must be presented differently to different audiences: end users (safe message), operators (component + retryability), protocol clients (stable code + structure), and developers (full chain).
  4. Governable identity. Errors need stable, machine-readable identities that survive refactoring, across HTTP/RPC/log/CLI boundaries.
  5. Structured carrier. Errors carry detail, source chain, operation context, and metadata — all as structured fields, not string concatenation.

Existing libraries solve a subset:

LibraryStrengthsLeaves open
thiserrorLocal error enum modeling, Display + From generationCross-layer propagation, context attachment, protocol projection
anyhowApplication-level error unification, context()Stable identity, protocol output, fine-grained category routing
color-eyreRich diagnostic reportsSame as anyhow — no protocol or identity layer

orion-error targets the gap: governance at scale — what happens when errors travel through 3–5 layers and must emerge at a protocol boundary with stable structure.


Core Insight: Reason/Carrier Separation

The central design decision: separate the error’s semantic classification (reason) from its propagation mechanism (carrier).

#![allow(unused)]
fn main() {
// Reason = what kind of error
enum AppReason {
    InvalidInput,
    OrderNotFound,
    General(UnifiedReason),
}

// Carrier = how it propagates
let err: StructError<AppReason> = AppReason::OrderNotFound
    .to_err()
    .with_detail("order #42 not found")
    .with_source(db_error)
    .with_context(ctx);
}

Why separate?

If reason and carrier are combined — as in typical thiserror enum usage — every piece of runtime machinery (context attachment, source tracking, protocol projection) must be reimplemented for each enum. The carrier (StructError<T>) implements it once.

The reason stays thin — a DomainReason marker trait requiring only PartialEq + Display + Debug + Send + Sync + 'static. The carrier does the rest.

#![allow(unused)]
fn main() {
pub trait DomainReason: PartialEq + Display + Debug + Send + Sync + 'static {}
}
ConstraintReason
Display + DebugErrors must be printable for diagnostics and logging.
PartialEqEnables assertion in tests.
Send + SyncRequired for StructError to cross async task boundaries.
'staticEnables type erasure via dyn Error and storage in SourceFrame.

Error Flow

raw std error ──→ .source_err(reason, detail) ──→ first entry into structured system
                                                        │
                                                  conv_err()
                                              (reason remap)
                                                        │
                              report / exposure / display_chain

1. Entry: source_err(reason, detail)

The unified entry point. Works for both raw std::error::Error and already-structured StructError sources:

#![allow(unused)]
fn main() {
let result = std::fs::read_to_string("config.toml")
    .source_err(AppReason::system_error(), "read config failed")?;
}
  • The raw error is stored as a source frame, preserving its Display and Debug output.
  • The reason becomes the error’s stable classification.
  • The detail provides layer-specific explanation.

2. Cross-layer conversion: conv_err()

When the upstream error is already StructError<R1> and only the reason type needs to change:

#![allow(unused)]
fn main() {
fn upper_layer() -> Result<(), StructError<UpperReason>> {
    lower_layer().conv_err()?;
    Ok(())
}
}

Requires UpperReason: From<LowerReason>. All detail, context, source chain, and metadata survive the conversion.

A blanket From<StructError<R1>> for StructError<R2> is blocked by Rust’s orphan rule (neither From nor StructError are local to the user’s crate). An explicit trait method is the intended path.

3. First entry vs. cross-layer distinction

MethodSemanticsSource preservation
source_err(reason, detail)Creates a new semantic boundaryWraps as unstructured or structured source
conv_err()Only remaps reason typePreserves all detail, context, source, metadata

Core Types

StructError<T: DomainReason>

The universal runtime carrier. Conceptually, it stores the reason and the runtime propagation state behind a small carrier:

#![allow(unused)]
fn main() {
pub struct StructError<T: DomainReason> {
    imp: Box<StructErrorImpl<T>>,
}
}

Box is used to keep StructError small (pointer-sized), as it is expected to be returned through Result frequently.

StructErrorImpl<T>

Holds the data needed for error propagation. Simplified model:

#![allow(unused)]
fn main() {
struct StructErrorImpl<T> {
    reason: T,
    detail: Option<String>,
    position: Option<String>,
    context: Option<Arc<Vec<OperationContext>>>,
    source_payload: Option<InternalSourcePayload>,
}
}

Key decisions:

  • context: Option<Arc<Vec<...>>> — lazy allocation: no heap allocation for errors without context. Arc enables cheap clone of the context chain.
  • Box<StructErrorImpl<T>>StructError itself stays small (one pointer), minimizing Result size.

OperationContext

Carries runtime context. Conceptually it describes what the current layer was doing, what it was accessing, which diagnostic fields were attached, and whether operation logging should be emitted:

#![allow(unused)]
fn main() {
pub struct OperationContext {
    action: Option<String>,
    locator: Option<String>,
    fields: Vec<(String, String)>,
    path: Vec<String>,
    metadata: ErrorMetadata,
    result: OperationResult,
    exit_log: bool,
}
}
  • doing(...) — what operation was running (“load config”, “validate order”)
  • at(...) — what resource was being accessed (“config.toml”, “order #42”)
  • with_field(...) — human-readable diagnostic fields
  • with_meta(...) — machine-oriented metadata (serialization only)
  • success() / fail() / cancel() and logging helpers — record operation outcome with little call-site code

SourceFrame

Represents one element in the source chain. Simplified model:

#![allow(unused)]
fn main() {
pub struct SourceFrame {
    pub index: usize,
    pub message: SmolStr,
    pub display: Option<SmolStr>,
    pub debug: Option<SmolStr>,
    pub type_name: Option<SmolStr>,
    pub error_code: Option<i32>,
    pub reason: Option<SmolStr>,
    pub path: Option<SmolStr>,
    pub detail: Option<SmolStr>,
    pub metadata: ErrorMetadata,
    pub is_root_cause: bool,
    pub context_fields: Vec<(SmolStr, SmolStr)>,
}
}

String fields use SmolStr (zero-allocation for short strings) for fast clone in source chain traversal.


Consumption Paths

Three independent consumption paths, each returning a different view of the same error:

report()DiagnosticReport

Human-readable diagnostics. Only requires DomainReason.

#![allow(unused)]
fn main() {
let report: DiagnosticReport = err.report();
println!("{}", report.render());
}

Output:

reason: system error
detail: read config failed
context:
  [0] place_order [user_id: 42]

exposure(&policy)ErrorProtocolSnapshot

Protocol-boundary projection. Requires ErrorIdentityProvider (provided by #[derive(OrionError)]).

#![allow(unused)]
fn main() {
let proto = err.exposure(&MyPolicy);
let http_json = proto.to_http_error_json()?;   // {"status": 500, "code": "sys.io_error", ...}
let log_json = proto.to_log_error_json()?;     // full structured log output
let cli_json = proto.to_cli_error_json()?;     // operator-facing summary
let rpc_json = proto.to_rpc_error_json()?;     // upstream-facing protocol
}

The ExposurePolicy trait controls the decision:

MethodDefaultOverride frequency
http_status()500Most common
visibility()Internal (Biz → Public)Common
retryable()falseOccasional
default_hints()[]Rare

Visibility controls which error information reaches the external caller:

PublicInternal
HTTP messageUses detailUses reason (hides detail)
RPC detailExposednull

display_chain() → formatted string

Source chain expansion for debugging. No trait requirement beyond DomainReason.

system error
  -> Info: read config failed
  -> Caused by:
      1. outer source
      2. inner source

identity_snapshot()ErrorIdentity

Stable identity inspection without protocol projection:

#![allow(unused)]
fn main() {
let id = err.identity_snapshot();
assert_eq!(id.code, "sys.io_error");
}

UnifiedReason

UnifiedReason is the built-in universal reason classification. It covers the common error categories found in most services:

CategoryCode rangeExamples
Business100-105validation_error, not_found
Infrastructure200-204system_error, network_error, timeout
Configuration300-301core_conf, external_error

Designed as a catch-all for errors that don’t need a domain-specific reason. Domain enums typically include it as a transparent variant:

#![allow(unused)]
fn main() {
#[derive(OrionError)]
enum AppReason {
    #[orion_error(identity = "biz.invalid")]
    Invalid,
    #[orion_error(transparent)]
    General(UnifiedReason),
}
}

The #[orion_error(transparent)] attribute delegates stable_code(), error_category(), and Display to the inner UnifiedReason.


Explicit StdError Bridge

StructError<T> does not implement std::error::Error. This is intentional:

  1. Prevents accidental type erasure. If StructError implemented StdError, calling code could unintentionally erase the reason type with .into() or Box<dyn Error>, losing structured identity.
  2. Keeps boundary crossing explicit. When interop with StdError ecosystem is needed, the conversion is explicit:
#![allow(unused)]
fn main() {
let std_ref: StdStructRef<'_, AppReason> = err.as_std();
let owned: OwnedStdStructError<AppReason> = err.into_std();
let dyn_owned: OwnedDynStdStructError = err.into_dyn_std();
}

Derive Macro

#[derive(OrionError)] generates the core trait implementations:

TraitPurposeSource
DisplayHuman-readable error messageFrom message attribute, or auto-generated from identity
DomainReasonCarrier compatibilityEmpty marker impl
ErrorCodeLegacy numeric compatibility codeFrom code attribute, or default 500
ErrorIdentityProviderStable code + categoryFrom identity and category attributes

Attributes

AttributeRequired?Generates
identity = "biz.foo"Yes (unless transparent)stable_code() returns "biz.foo"
category = BizNo (inferred from identity prefix)error_category() returns specified category
transparentAlternative to identityDelegates all methods to inner type
message = "..."No (auto-generated from identity)Custom Display output
code = ...No (default 500)Legacy numeric error_code()

Protocol outputs, log aggregation, and monitoring should use ErrorIdentity.code / stable_code() as the stable identity. ErrorCode is a numeric compatibility layer, not the recommended primary key for new external contracts.

Transparent Variant Constructor Delegation

When an enum has a transparent variant wrapping UnifiedReason, all UnifiedReason constructors are generated as methods on the enum:

#![allow(unused)]
fn main() {
#[derive(OrionError)]
enum AppReason {
    #[orion_error(transparent)]
    General(UnifiedReason),
}

// Generated automatically:
AppReason::system_error()   // instead of AppReason::General(UnifiedReason::system_error())
AppReason::validation_error()
AppReason::not_found_error()
}

Third-Party Error Integration

Third-party error types enter the structured system through source_err(). Supported types:

TypeFeatureMechanism
std::io::ErrorBuilt-in (no feature)Direct UnstructuredSource impl
serde_json::Errorserde_jsonDirect UnstructuredSource impl
anyhow::ErroranyhowAttempts structured recovery, falls back to unstructured
toml::de::ErrortomlDirect UnstructuredSource impl
Custom typesOpt-in via RawStdError + raw_source()

The opt-in design (RawStdError) prevents silent structured-to-unstructured downgrade:

impl RawStdError for MyError {}

let result: Result<(), MyError> = Err(MyError);
let err = result
    .map_err(raw_source)
    .source_err(AppReason::system_error(), "my operation failed")?;

Design Evolution

Naming: UvsReason → CommonReason → UnifiedReason

The built-in reason type went through three names:

  • UvsReason — original name, meaning unclear to new users
  • CommonReason — intermediate rename, but “common” sounded like “ordinary” rather than “unified”
  • UnifiedReason — final name, reflecting its role: concrete errors converge (are unified) into this classification

The deprecated pub type UvsReason = UnifiedReason; alias is retained for migration compatibility.

Variant name: Uvs → General

The transparent variant in domain enums was renamed to General:

#![allow(unused)]
fn main() {
// Before
Uvs(UnifiedReason),

// After
General(UnifiedReason),
}

General communicates “this is the catch-all for non-domain-specific errors” more clearly than the opaque Uvs.

Consumption path convergence: snapshot is not the main path

The orion-error 0.8.0 architecture centers on report(), exposure(), display_chain(), and identity_snapshot().

Stable machine identity is provided by identity_snapshot(). HTTP/RPC/CLI/log boundary output is handled by exposure() and ErrorProtocolSnapshot. Human diagnostics are handled by report(). This avoids making users learn a separate snapshot type hierarchy while preserving stable identity and protocol projection.

API naming: exposure

Consistency with report(). The shorter name reflects the intent: expose this error at a boundary according to a policy, without making users first learn an internal snapshot model.


Feature Gating

FeatureEnablesDefault
deriveProc-macro derive macros (OrionError, ErrorCode, ErrorIdentityProvider)Yes
logOperationContext log methods (ctx.info(), .debug(), .warn(), .error()) and Drop auto-loggingYes
tracingTracing integration (preferred over log when both are enabled)No
serdeSerialize / Deserialize on core typesNo
serde_jsonProtocol JSON projection methods (to_http_error_json(), etc.)No
anyhowanyhow::Error interop with structured source recoveryNo
tomltoml::de::Error / toml::ser::Error interopNo

Project Structure

src/
  lib.rs              — Crate root, re-exports, layered modules
  core/
    domain.rs         — DomainReason trait
    reason.rs         — ErrorCode trait, ErrorCategory enum, ErrorIdentityProvider trait
    universal.rs      — UnifiedReason enum (built-in classification)
    error/
      carrier.rs      — StructError<T>, StructErrorImpl<T>
      builder.rs      — StructErrorBuilder<T>
      identity.rs     — ErrorIdentity struct, identity_snapshot()
      source_chain.rs — SourceFrame, source payload infrastructure
      std_bridge.rs   — StdStructRef, OwnedStdStructError, OwnedDynStdStructError
    context/
      types.rs        — OperationContext, OperationScope
      convert.rs      — ContextAdd trait
    metadata.rs       — ErrorMetadata, MetadataValue
    report/
      diagnostic.rs   — DiagnosticReport, redaction
      protocol.rs     — ErrorProtocolSnapshot, ExposurePolicy, Visibility
  traits/
    contextual.rs     — ErrorWith trait
    conversion.rs     — ConvErr, ConvStructError, ToStructError
    source_err.rs     — SourceErr, RawStdError, RawSource
  testing.rs          — Test assertion helpers
docs/
  en/book.toml        — English mdBook config
  en/src/             — English mdBook source
  zh/book.toml        — Chinese mdBook config
  zh/src/             — Chinese mdBook source
  index.html          — Language selector copied to site root
site/
  en/                 — Generated English book
  zh/                 — Generated Chinese book

Constraints

Orphan Rule

A blanket From<StructError<R1>> for StructError<R2> cannot be provided — neither From (std) nor StructError (this crate) are local to the user’s crate. The explicit conv_err() method is the intended path:

#![allow(unused)]
fn main() {
let result: Result<(), StructError<UpperReason>> = lower_result.conv_err()?;
}

Send + Sync

DomainReason requires Send + Sync. This is necessary for StructError to be used across async task boundaries and captured by anyhow::Error or Box<dyn Error>. For single-threaded use, this adds a small but unavoidable constraint.

0.8 API Contract

更新时间:2026-05-01

本文档固定 orion-error 0.8.x 的公开 API 契约。它描述当前承诺的主路径、 分层模块、feature-gated API、稳定快照和协议 JSON 边界。

如果本文档与 src/tests/examples/ 冲突,以代码和测试为准,并同步修正 本文档。

1. Root Exports

crate root 只承诺保留最小主路径入口:

  • StructError
  • OperationContext
  • UnifiedReason
  • derive feature 开启时的 derive 宏:
    • OrionError
    • ErrorCode
    • ErrorIdentityProvider

root 不承诺重新暴露 reason trait、protocol type、report type、 interop type 或测试 helper。它们的正式归属在分层模块中。

ErrorCode 作为 derive 宏名字和兼容数值码能力存在;面向外部协议、日志、快照和 监控的稳定机器主键是 ErrorIdentity.code / stable_code()

2. Prelude

orion_error::prelude::* 是新业务代码的推荐导入入口,当前承诺包含:

  • StructError
  • ErrorWith
  • ConvErr
  • SourceErr
  • derive feature 开启时的 OrionError

prelude 只放主传播路径需要的最小集合。协议、report、interop 和测试 helper 应从各自分层模块导入。

3. Layered Modules

分层模块是非 root 类型和 trait 的正式归属。

  • runtime 运行时传播载体和上下文:StructErrorStructErrorBuilderOperationContextOperationScopeWithContextErrorMetadata
  • runtime::source source 观察模型:SourceFrameSourcePayloadKindSourcePayloadRef
  • conversion 主路径转换 trait:SourceErrErrorWithConvErrConvStructErrorToStructError
  • reason reason trait、分类和内置 reason:DomainReasonErrorCodeErrorIdentityProviderErrorCategoryUnifiedReasonConfErrReason
  • report 人类诊断与 redaction:DiagnosticReportRedactPolicy
  • protocol 协议/exposure 投影:DefaultExposurePolicyExposurePolicyExposureDecisionErrorProtocolSnapshotVisibility
  • interop 标准错误生态互操作:StdStructRefOwnedStdStructErrorOwnedDynStdStructErrorraw_sourceRawSourceRawStdError
  • cli CLI 输出辅助:print_error(...)
  • dev::testing 测试断言 helper,不属于业务主路径。
  • dev::prelude 协议/schema 测试和迁移验证用宽导入,不属于业务主路径。

bridge::* 不是 0.8 当前公开分层入口;标准错误生态边界统一称为 interop

4. Source Attachment

source 挂载的推荐主路径是:

  • StructError::with_source(...)
  • StructErrorBuilder::source(...)

调用者不需要区分 source 是普通 StdError 还是下层 StructError<_>;路由由 crate 内部完成。

以下 API 保留为维护旧代码、测试 source 分类或调试 auto-routing 的底层入口, 不作为教程和新业务代码的默认推荐:

  • with_std_source(...)
  • with_struct_source(...)
  • StructErrorBuilder::source_std(...)
  • StructErrorBuilder::source_struct(...)

5. Error Flow

当前推荐的错误流转决策:

  • 上游是普通错误,第一次进入结构化体系:source_err(reason, detail)
  • 上游是 StructError<R1>,当前层只改变 reason 类型:conv_err()
  • 上游是 StructError<R1>,当前层建立新的语义边界:统一使用 source_err(reason, detail)
  • 需要挂载 cause 到已有 StructErrorwith_source(...)builder.source(...)
  • 需要进入 std::error::Error 生态:as_std()into_std()into_boxed_std()into_dyn_std()

owe(...) / owe_*() / err_wrap(...) / want(...) / with(...) 不属于 0.8 当前主 API。

6. Feature-Gated API

默认 feature:

  • log
  • derive

可选 feature:

  • derive 开启 root derive 宏 re-export,并启用 #[derive(OrionError)] 等宏。
  • log 开启 log 集成和 OperationContext drop 日志路径。
  • tracing 开启 tracing 集成;同时启用 logtracing 时,drop 日志优先走 tracing 分支。
  • serde 开启主要结构的 Serialize / Deserialize 支持。
  • serde_json 开启 stable snapshot 和 protocol JSON projection 方法: to_stable_snapshot_json()to_http_error_json()to_cli_error_json()to_log_error_json()to_rpc_error_json()
  • anyhow 开启 anyhow::Error 进入 source_err(...) 的适配,并支持官方 dyn interop wrapper 的结构化 source 恢复。
  • toml 开启 toml::de::Error / toml::ser::Error 进入 source_err(...) 的适配。

文档示例如果依赖 feature,应显式说明或用测试门控覆盖。

7. Protocol JSON

协议投影主入口:

  • identity_snapshot()
  • exposure(...)
  • into_exposure(...)
  • ErrorProtocolSnapshot::to_http_error_json()
  • ErrorProtocolSnapshot::to_cli_error_json()
  • ErrorProtocolSnapshot::to_log_error_json()
  • ErrorProtocolSnapshot::to_rpc_error_json()

ErrorProtocolSnapshot 的稳定输入由三部分组成:

  • identity
  • decision
  • embedded DiagnosticReport

稳定承诺:

  • identity.code 是协议、日志、监控、测试断言的稳定机器主键。
  • identity.category 是稳定分类。
  • ExposureDecision 的字段名和含义稳定:http_statusvisibilitydefault_hintsretryable
  • HTTP / CLI / log / RPC projection 的顶层用途稳定。

不承诺:

  • render_user_debug() 的文本格式不是机器协议。
  • JSON 中用于人工排障的 summary / rendered_detail 文本不作为精确稳定 schema。
  • source_framesdebugdisplaytype_name 等诊断字段可能随实现调整。
  • 未在 docs/protocol-contract.md 和测试中锁定的内部 helper 字段不作为公共协议。

8. Report And Redaction

DiagnosticReport 面向人类诊断,不要求 reason 实现 ErrorIdentityProvider

主入口:

  • report()
  • into_report()
  • render()
  • render_redacted(...)
  • report_redacted(...)

redaction 适用于 report、protocol projection 和 source frame 诊断视图。机器协议中的 稳定 code/category 不应被当成自然语言 detail 处理。

9. Compatibility Policy

0.8 当前策略:

  • 保持主路径稳定。
  • 保持 observation surface 可用,但不把它们放进 quick start。
  • 保持 dev::* 面向测试和迁移验证。
  • 不恢复 0.6 / 0.7 legacy API 作为 root 或 prelude 主路径。
  • archive 文档保留历史语境,不代表当前推荐用法。

Compatibility & Migration

API Renames

Old NameNew NameDescription
into_as(reason, detail)source_err(reason, detail)Unified error entry point
wrap_as(reason, detail)source_err(reason, detail)Same, unified
upcast()conv_err()Cross-layer reason conversion
err_conv()conv_err()Same

Old names are no longer available. If you see a compilation error, replace with the new name — parameters are unchanged.

0.7 → 0.8 Migration

0.8 removed the following 0.7 compatibility paths:

  • compat_prelude / compat_traits modules
  • ErrorOwe family of traits (owe() / owe_source() etc.)
  • ErrorWith methods want() / attach_context() / with()
  • OperationContext::with_want()

Public Surface Grading

更新时间:2026-04-30

本文档基于当前 orion-error 0.8.x 代码,给公开 surface 做分级整理。

目标不是继续删除 API,而是固定下面四类边界:

  1. 主路径 API
  2. 观察面 API
  3. 测试 / 适配器入口
  4. 兼容保留 API

如果后续要继续提升到 9+,这份分级表应作为 public API review 的参考基线。

1. 主路径 API

这些 API 构成当前推荐主路径,应长期稳定保留:

  • StructError<R>

  • OperationContext::doing(...)

  • OperationContext::at(...)

  • with_context(...)

  • with_source(...)

  • StructErrorBuilder::source(...)

  • report()

  • render()

  • identity_snapshot()

  • exposure(...)

  • source_err(...)

  • conv_err()

  • cli::print_error(...)

特征:

  • README / tutorial / docs 主文档会优先描述它们
  • 新业务代码默认优先使用它们
  • 不应再为相同任务引入并列“主路径”

2. 观察面 API

这些 API 有明确价值,但更适合诊断、测试、观测、辅助断言:

  • source_frames()
  • root_cause_frame()
  • source_payload()
  • source_payload_kind()
  • action_main()
  • locator_main()
  • target_path()
  • render_redacted(...)
  • render_user_debug()
  • render_user_debug_redacted(...)

特征:

  • 它们不是主传播 / 主构造入口
  • 应在文档里明确属于 observation / diagnostics surface
  • 不应在 quick start 中抢占主路径叙事位

3. 测试 / 适配器入口

这些 API 主要服务测试、schema 校验、中间层适配或协议拼装:

  • ErrorProtocolSnapshot::from_report_skeleton(...)
  • dev::prelude::*
  • dev::testing::*
  • interop::*
  • runtime::source::*

特征:

  • 允许公开存在
  • 但应明确不是正常业务主路径
  • 文档中应把它们描述成 secondary path
  • 其中 dev::prelude::* 应保持在对象级检查面,不再扩成 frame 级宽导出

4. 兼容保留 API

这些字段或投影仍然有现实兼容价值,但名字本身带有历史包袱:

  • context / snapshot frame 中的 target

当前统一口径:

  • runtime 主语义应优先理解为 action / locator / path segments
  • target 继续存在,主要作为 compat projection
  • path 是稳定导出的路径投影

5. 当前结论

当前 orion-error 的主要结构问题已经不是“大量兼容 API 混在主路径里”,而是:

  • 少量 compat projection 字段仍公开存在
  • 少量 observation / secondary path 仍需要靠文档说明降级

这意味着下一阶段如果要继续打磨:

  • 不应再优先做内部模型重写
  • 应优先做 public surface review 与分级锁定

6. 后续建议

如果进入下一个版本线,可以按这个顺序评估:

  1. 是否继续保留 frame 中的 target
  2. 是否需要继续缩窄 dev::prelude::*
  3. 是否要给 observation / adapter API 增加更明确的模块或命名提示

在没有明确版本策略前,当前更合理的做法是:

  • 保持主路径稳定
  • 保持观察面可用
  • 用文档和测试锁住 secondary / compat 的定位

Release Checklist

Steps for publishing 0.8.x.

Pre-release

  1. Confirm CHANGELOG.md, README.md, docs/ are in sync with current code.
  2. Confirm orion-error and orion-error-derive have matching versions.
  3. Run:
    • cargo fmt --all
    • cargo clippy --all-targets --all-features -- -D warnings
    • cargo test --all-features -- --test-threads=1
    • cargo test --doc --no-default-features
    • bash scripts/check-feature-matrix.sh
    • bash scripts/check-doc-code.sh
    • bash scripts/check-v3-policy.sh
  4. In a networked environment:
    • cargo package --manifest-path orion-error-derive/Cargo.toml
    • cargo package
    • cargo publish --manifest-path orion-error-derive/Cargo.toml --dry-run
    • cargo publish --dry-run

Pre-release Boundary Checks

  1. src/lib.rs root surface compile-fail doctests still pass.
  2. tests/test_layered_exports.rs, tests/test_versioned_namespaces.rs still cover current layered export boundaries.
  3. README / tutorial / reason identity guide code blocks match current source.
  4. New or migrated public surface: add tests / compile guards first, then update README / docs, then update changelog.

Publishing Order

  1. Publish orion-error-derive first.
  2. Wait for crates.io index propagation.
  3. Publish orion-error.

The GitHub Actions release workflow is already configured in this order.

Post-release

  1. Confirm both crates are visible on crates.io.
  2. Confirm the default derive feature correctly resolves orion-error-derive.
  3. Confirm docs.rs pages generate:
    • orion-error
    • orion-error-derive

StructError 堆分配性能基线

硬件:Apple M4 (Mac mini, 2024) 系统:macOS 15, aarch64 Rust:stable 2025-04-30 运行:cargo test --release --test perf_context_allocation -- --nocapture


测试场景

每个场景重复 500,000 次,测量总耗时后计算均值和吞吐量。

场景构造内容
bareStructError::from(UnifiedReason::validation_error())
with-detail同上 + .with_detail("port number out of range")
with-detail+pos同上 + .with_position("src/config.rs:42")
builderbuilder API 等同 with-detail+pos

结果

Before:context: Arc<Vec<OperationContext>>

场景吞吐量ns/iter总耗时
bare28 M/s35.917 ms
with-detail19 M/s53.326 ms
with-detail+pos15 M/s64.632 ms
builder15 M/s65.132 ms

After:context: Option<Arc<Vec<OperationContext>>>

场景吞吐量ns/iter总耗时提升
bare55 M/s18.29 ms+97%
with-detail27 M/s36.618 ms+46%
with-detail+pos20 M/s48.924 ms+32%
builder20 M/s48.824 ms+33%

优化方法

StructErrorImpl 中的 context: Arc<Vec<OperationContext>>context: Option<Arc<Vec<OperationContext>>>

空 context 时不再堆分配,仅在 with_context()ContextAdd::add_context() 首次调用时懒初始化。

分析

  • bare(18.2 ns)现为主要来自 Box::new + 栈构造
  • with-detail 比 bare 多一次 String 堆分配(约 18 ns)
  • with-detail+pos 比 bare 多两次 String 堆分配(约 30 ns)
  • 预期符合:去掉一次空 Arc 堆分配 reduce ~18 ns

测试文件:tests/perf_context_allocation.rs 优化改动:src/core/error/carrier.rs + src/core/report/diagnostic.rs

Source Debug 格式化性能影响

测试 eager format!("{source:?}")collect_source_frames 中的实际开销及优化效果。

运行:cargo test --release --test perf_context_allocation -- --nocapture

结果

Before:eager debug: format!("{source:?}")

场景吞吐量ns/iter说明
bare56 M/s18.0baseline
with-std-source2.5 M/s400.9+ io::Error
with-std-verbose1.7 M/s581.0+ 256-byte io::Error
with-struct-src458 K/s2184.8+ StructError (2 contexts)
deep-struct-src420 K/s2381.9+ 3 层 StructError 链

After:lazy debug: None(优化后)

场景吞吐量ns/iter提升
bare58 M/s17.3+4% (noise)
with-std-source3.9 M/s259.3+55%
with-std-verbose4.0 M/s252.1+130%
with-struct-src849 K/s1177.8+86%
deep-struct-src1.2 M/s821.3+190%

分析

  • with-std-source 从 400.9 → 259.3 ns,Debug 格式化占 ~140ns
  • with-std-verbose 从 581.0 → 252.1 ns,长消息的 Debug 开销被完全消除
  • with-struct-src 从 2184.8 → 1177.8 ns(-46%),Debug 遍历 context 栈的开销消失
  • deep-struct-src 从 2381.9 → 821.3 ns(-65%),最深层的帧直接拷贝已有帧,无额外格式化

优化方法

SourceFrame.debugString 改为 Option<String>

#![allow(unused)]
fn main() {
// Before
pub debug: String,
// 在 collect_source_frames 中:debug: format!("{source:?}"),

// After
pub debug: Option<String>,
// 在 collect_source_frames 中:debug: None,
}

Redaction 仍然支持 debug 字段——测试中显式设置了 Some(...) 的值会被正常处理。None 的帧在 redaction 中跳过。