orion-error Documentation
orion-error is the Rust implementation of the WuKong error governance model.
At the documentation entry point, the most important framing is this:
- contract channel — stable identity, category, retryability, visibility
- diagnostic channel — detail, source chain, operation context, key fields
- adaptive output — HTTP / RPC / CLI / log projections generated by policy
In this crate, those ideas map to:
#[derive(OrionError)]for stable semantic identitiesStructError<R>as the unified runtime carriersource_err(...)for first entry and semantic-boundary wrappingconv_err()for reason remapping without rebuilding the error storyreport()/identity_snapshot()/exposure(...)for boundary output
Suggested reading order: start with the user guide to learn concepts and usage, then check the developer guide for public API contracts and release details.
User Guide
| Document | Description |
|---|---|
| Why orion-error | Error governance motivation and examples |
| Tutorial | Getting started tutorial |
| Protocol Contract | Exposure projection contract |
| Report / Exposure Boundary | Diagnostic vs exposure boundary |
| Logging | Logging integration |
| Comparison with thiserror | Differences and coexistence |
| Ecosystem Comparison | anyhow / thiserror / color-eyre / orion-error |
| Large-Scale Error Governance Manifesto | WuKong model, governance principles, and industrial validation |
| Design Constraints | Known design constraints |
Developer Guide
| Document | Description |
|---|---|
| API Contract | Public API boundaries |
| Compatibility Migration | Migration from older APIs |
| Public Surface Grading | Layered export grading |
| Release Checklist | Pre-release checks |
| Performance Benchmarks | Allocation benchmarks |
When docs and implementation conflict, src/, tests/, examples/ are authoritative.
Why orion-error
orion-error is not about prettier error printing. It is about making errors in Rust services governable, traceable, exposable, and evolvable as structured contracts.
Basic error handling answers “how does this function return failure?” Larger services need stronger answers:
- How does an error carry the environment in which it happened?
- How do lower-level technical failures become stable upper-layer semantics without losing diagnostics?
- How can debugging see the error chain across layers?
- How can logs stay useful without logging the same failure everywhere?
- How should the same error be shown differently to users, operators, developers, and protocol clients?
orion-error keeps errors structured as they cross layers instead of reducing them to strings.
1. Diagnostics
1.1 Low-level errors often miss critical environment
Low-level errors usually describe the technical failure, not the business environment.
#![allow(unused)]
fn main() {
let content = std::fs::read_to_string(path)?;
}
If this fails, std::io::Error may say:
No such file or directory
But debugging often needs to know:
- which path was read
- which operation was running
- which tenant, order, request, or component was involved
- whether the file was config, an order record, cache, or temporary data
- whether the failure should be classified as config, system, or validation failure
Weak approach: add context only to logs
#![allow(unused)]
fn main() {
match std::fs::read_to_string(path) {
Ok(content) => Ok(content),
Err(err) => {
log::error!("read config failed, path={path}, error={err}");
Err(err)
}
}
}
This splits diagnostics between logs and the error value. The caller still receives an error without structured context.
Recommended approach: attach structured context to the error
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
let ctx = OperationContext::doing("load config")
.with_field("path", path.display().to_string())
.with_meta("component.name", "config_loader");
let content = std::fs::read_to_string(path)
.source_err(AppReason::system_error(), "read config failed")
.with_context(&ctx)?;
}
Here:
source_err(...)bringsstd::io::Errorinto the structured error system.AppReason::system_error()is the stable upper-layer reason."read config failed"is this layer’s explanation.OperationContextcarries fields such aspathandcomponent.name.
1.2 Technical failures must be abstracted without losing diagnostics
Two common approaches are both poor:
- Drop the lower-level error.
- Expose the lower-level error directly and let higher layers depend on the repository’s technical choice.
The better approach is: at layer boundaries, convert the lower-level failure into the current layer’s stable error semantics while preserving source, detail, and context for diagnostics.
Weak approach: leak implementation errors upward
#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), sqlx::Error> {
repository::insert_order(order).await?;
Ok(())
}
}
Now the service/API layer knows the repository uses sqlx.
Weak approach: remove the root cause
#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), StoreError> {
if repository::insert_order(order).await.is_err() {
return Err(StoreReason::Unavailable.to_err());
}
Ok(())
}
}
This hides implementation details, but also removes the original cause.
Recommended approach: abstract the reason and preserve source
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
async fn write_order(order: Order) -> Result<(), StructError<StoreReason>> {
repository::insert_order(&order)
.await
.source_err(StoreReason::Unavailable, "insert order failed")
.with_field("order_id", order.id.to_string())
.with_meta("component.name", "order_store")?;
Ok(())
}
}
The caller sees StructError<StoreReason>, not sqlx::Error, while the original database error remains available as internal source.
If the upper layer only remaps reason type, use conv_err():
#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), StructError<AppReason>> {
write_order(order).await.conv_err()?;
Ok(())
}
}
If the upper layer creates a new semantic boundary, use source_err(...):
#![allow(unused)]
fn main() {
async fn submit_order(order: Order) -> Result<(), StructError<AppReason>> {
write_order(order)
.await
.source_err(AppReason::system_error(), "submit order failed")?;
Ok(())
}
}
1.3 Debugging needs an error chain, not an isolated message
Real failures often travel through multiple layers:
HTTP handler
-> service
-> repository
-> database / filesystem / remote API
A final message such as:
submit order failed
is not enough. Debugging needs the path:
submit order failed
caused by: insert order failed
caused by: database request failed
caused by: connection timed out
The chain answers:
- what the original technical failure was
- which layers interpreted it
- where new semantic boundaries were introduced
- what context each layer added
- how the external error relates to the internal root cause
Recommended approach: preserve source chain at boundaries
#![allow(unused)]
fn main() {
async fn adapter_call(req: Request) -> Result<Response, StructError<AdapterReason>> {
client.send(req)
.await
.source_err(AdapterReason::RemoteUnavailable, "remote call failed")
}
async fn load_quote(id: QuoteId) -> Result<Quote, StructError<ServiceReason>> {
adapter_call(Request::quote(id))
.await
.source_err(ServiceReason::QuoteLoadFailed, "load quote failed")
.with_field("quote_id", id.to_string())?;
todo!("map response")
}
}
This preserves service semantics, adapter semantics, the lower source, and structured fields.
2. Operations
Good logging is boundary logging, not more logging
Logging often becomes noisy because each layer emits its own error! line:
#![allow(unused)]
fn main() {
log::error!("repository insert failed: {err}");
log::error!("service submit failed: {err}");
log::error!("http request failed: {err}");
}
This duplicates failures and forces operators to reconstruct the chain manually.
The better model is: the error carries identity, reason, detail, context, and source chain; the boundary logs one structured projection.
#![allow(unused)]
fn main() {
async fn handle_submit(order: Order) -> Result<HttpResponse, StructError<AppReason>> {
submit_order(order)
.await
.source_err(AppReason::system_error(), "handle submit order failed")?;
Ok(HttpResponse::ok())
}
}
At the handler, worker, or task boundary:
#![allow(unused)]
fn main() {
let report = err.report();
let exposure = err.exposure(&policy);
}
The business code does not need to concatenate log strings at every layer. The boundary can log one structured view containing identity, reason, detail, context, and chain.
OperationContext logging
OperationContext provides structured log methods that automatically include current fields and metadata:
#![allow(unused)]
fn main() {
use orion_error::OperationContext;
let ctx = OperationContext::doing("order_processing")
.with_field("order_id", "123")
.with_meta("component.name", "order_service");
ctx.info("start");
ctx.warn("slow upstream");
ctx.error("final failure");
}
For lifecycle-scoped logging, use with_auto_log():
#![allow(unused)]
fn main() {
let mut ctx = OperationContext::doing("sync_user")
.with_auto_log()
.with_field("user_id", "42");
do_sync()?;
ctx.mark_suc();
}
If the scope drops without mark_suc() or mark_cancel(), a failure log is emitted automatically. See LOGGING.md for details.
The principle: sparse lifecycle logs + boundary error projection, not repetitive error! at every layer.
3. Presentation
One error needs different views for different audiences
An error is consumed by more than one audience:
- End users need safe, understandable, actionable messages.
- Operators / SREs need component, environment, classification, retry, and impact hints.
- Developers need source chain, detail, context, and lower-level cause.
- Protocol clients need stable code, field shape, and retry hints.
- Logs / monitoring / alerting need structured fields, not long strings.
If there is only one error string, it is hard to satisfy all of them well.
For example:
database connection failed: timeout from sqlx pool
This is:
- too technical for end users
- too thin for developers
- unstable for protocol clients
- not structured enough for logs
Recommended approach: keep one structured error, project different views
orion-error separates internal structure from external presentation:
- internal:
reason,ErrorIdentity.code,detail,context,source chain - user-facing: safe and actionable exposure
- operator-facing: component, operation, category, retryability, severity
- developer-facing: report and full chain
- protocol-facing: exposure projection
#![allow(unused)]
fn main() {
let report = err.report();
let exposed = err.exposure(&DefaultExposurePolicy::default());
}
| Audience | Needs | Projection |
|---|---|---|
| User | safe message, action hint | exposure view |
| Operator / SRE | component, operation, retryable, severity | exposure snapshot / log JSON |
| Developer | source chain, detail, context | report |
| Protocol client | stable code, stable fields, retry hint | HTTP/RPC/CLI error JSON |
| Test / regression | stable structure | stable snapshot |
This is the key difference between orion-error and a pure display-oriented tool: it keeps one structured error, then projects the right view at the right boundary.
Summary
orion-error is for systems where errors are contracts, not just return values.
It helps you:
- Preserve failure environment: attach path, tenant, order id, operation, and component context.
- Abstract technical details without losing diagnostics: convert lower failures into stable layer reasons while preserving source and context.
- Keep the cross-layer error chain: let debugging see how a low-level failure became the final boundary error.
- Log effectively and sparingly: carry structure in the error and log once at the boundary.
- Project the right view for each audience: users, operators, developers, protocol clients, logs, and tests need different output.
In one sentence:
orion-errorkeeps one structured error across layers, then projects the right view at the right boundary.
Tutorial
This document describes the primary usage paths of orion-error, based on the current source code, tests, and examples/.
Installation
[dependencies]
orion-error = "0.8.0"
Optional features:
[dependencies]
orion-error = { version = "0.8.0", features = ["serde"] }
orion-error = { version = "0.8.0", features = ["tracing"] }
orion-error = { version = "0.8.0", features = ["serde_json"] }
Default features: derive, log.
Import Conventions
Prefer one of these two approaches:
Application code (default):
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
}
Architecture boundaries — explicit layered imports:
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::conversion::*; // cross-layer conversion
use orion_error::protocol::*; // boundary output
use orion_error::protocol::*; // boundary output
use orion_error::interop::*; // std::error::Error bridge
}
1-Minute Example
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
#[derive(Debug, Clone, PartialEq, OrionError)]
enum AppReason {
#[orion_error(identity = "biz.invalid")]
Invalid,
#[orion_error(transparent)]
General(UnifiedReason),
}
fn load_config(path: &str) -> Result<String, StructError<AppReason>> {
let ctx = OperationContext::doing("load_config")
.with_field("path", path)
.with_meta("component.name", "config_loader");
std::fs::read_to_string(path)
.source_err(AppReason::system_error(), "read failed")
.doing("read file")
.with_context(&ctx)
}
}
This covers the four core points:
- Domain reason defined with
OrionError - Error entry via
source_err(reason, detail)(unified entry) - Semantic context via
doing(...) - Diagnostic fields and metadata on
OperationContext
1. Defining Reason
1.1 Domain Reason
New code should use #[derive(OrionError)]:
#![allow(unused)]
fn main() {
use orion_error::{OrionError, UnifiedReason};
#[derive(Debug, Clone, PartialEq, OrionError)]
enum OrderReason {
#[orion_error(identity = "biz.order_not_found")]
OrderNotFound,
#[orion_error(identity = "biz.insufficient_funds")]
InsufficientFunds,
#[orion_error(transparent)]
General(UnifiedReason),
}
}
OrionError generates: Display, DomainReason, ErrorCode, ErrorIdentityProvider.
1.2 Universal Reason
UnifiedReason is the built-in universal reason classification. Common constructors:
UnifiedReason::validation_error(),UnifiedReason::business_error()UnifiedReason::system_error(),UnifiedReason::network_error(),UnifiedReason::timeout_error()UnifiedReason::core_conf(),UnifiedReason::logic_error()
1.3 Delegate Constructors
If your domain reason has a transparent UnifiedReason variant, all UnifiedReason constructors are generated automatically:
AppReason::system_error() // instead of AppReason::from(UnifiedReason::system_error())
AppReason::validation_error()
2. Constructing StructError
2.1 Direct Construction
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
let err = StructError::from(UnifiedReason::validation_error())
.with_detail("field `email` is required");
}
2.2 Builder
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
let ctx = OperationContext::doing("validate input");
let err = StructError::builder(UnifiedReason::validation_error())
.detail("field `email` is required")
.context_ref(&ctx)
.finish();
}
2.3 Attaching Source
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
let err = StructError::from(UnifiedReason::system_error())
.with_detail("read config failed")
.with_source(std::io::Error::other("disk offline"));
}
Preferred APIs: with_source(...), builder.source(...). These auto-route between StdError and StructError source types.
3. Using Context
OperationContext carries runtime context:
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
let ctx = OperationContext::doing("place_order")
.with_field("order_id", "A-1001")
.with_field("user_id", "42")
.with_meta("component.name", "order_service");
let result: Result<(), StructError<UnifiedReason>> =
Err(StructError::from(UnifiedReason::system_error()));
let result = result
.doing("check inventory")
.with_context(&ctx);
assert!(result.is_err());
}
Attach context to an error:
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
use orion_error::runtime::OperationContext;
let ctx = OperationContext::doing("place_order")
.with_field("order_id", "A-1001")
.with_field("user_id", "42")
.with_meta("component.name", "order_service");
let result: Result<(), StructError<UnifiedReason>> =
Err(StructError::from(UnifiedReason::system_error()));
let result = result
.doing("check inventory")
.with_context(&ctx);
assert!(result.is_err());
}
Common field types:
with_field(...)— human-readable diagnostic entries (appears in Display output)with_meta(...)— machine-oriented metadata (serialization only)
4. Error Entry and Cross-Layer Conversion
4.1 source_err(reason, detail) — Unified Entry
Works for both raw std::error::Error and already-structured StructError sources:
#![allow(unused)]
fn main() {
use orion_error::prelude::*;
let err = std::fs::read_to_string("config.toml")
.source_err(UnifiedReason::system_error(), "read config failed")
.unwrap_err();
}
Supported source types: std::io::Error, anyhow::Error (with anyhow feature), serde_json::Error (with serde_json feature), toml::de::Error / toml::ser::Error (with toml feature), and custom RawStdError types via raw_source(...).
4.2 conv_err() — Cross-Layer Reason Remap
When the upstream error is already structured and you only need to change the reason type:
#![allow(unused)]
fn main() {
use derive_more::From;
use orion_error::conversion::ConvErr;
use orion_error::conversion::ToStructError;
use orion_error::prelude::*;
#[derive(Debug, Clone, PartialEq, From, OrionError)]
enum RepoReason {
#[orion_error(transparent)]
General(UnifiedReason),
}
#[derive(Debug, Clone, PartialEq, From, OrionError)]
enum ServiceReason {
#[orion_error(transparent)]
Repo(RepoReason),
}
fn lower_layer_call() -> Result<(), StructError<RepoReason>> {
Err(RepoReason::system_error().to_err().with_detail("read config failed"))
}
fn upper_layer_call() -> Result<(), StructError<ServiceReason>> {
lower_layer_call().conv_err()?;
Ok(())
}
}
Requires ServiceReason: From<RepoReason>.
5. Error Objects Summary
| Object | Purpose | Entry Point |
|---|---|---|
StructError<R> | Runtime carrier | Propagation |
DiagnosticReport | Human diagnostics | err.report() |
ErrorProtocolSnapshot | Protocol projection | err.exposure(&policy) |
Standard Error interop: as_std(), into_std(), into_boxed_std(), into_dyn_std().
6. Stable Identity and Protocol Projection
6.1 Stable Identity
Each error variant has a permanent machine-readable name:
#![allow(unused)]
fn main() {
use orion_error::{OrionError, StructError};
use orion_error::reason::ErrorIdentityProvider;
use orion_error::protocol::DefaultExposurePolicy;
use orion_error::UnifiedReason;
#[derive(Debug, PartialEq, OrionError)]
enum ApiReason {
#[orion_error(identity = "biz.invalid_input")]
InvalidInput,
#[orion_error(transparent)]
General(UnifiedReason),
}
assert_eq!(ApiReason::InvalidInput.stable_code(), "biz.invalid_input");
assert_eq!(ApiReason::InvalidInput.error_category().as_str(), "biz");
}
Stable identity never changes — unlike display text, numeric codes, or Rust paths.
The identity prefix (biz, sys, conf, logic) also determines the default ExposurePolicy behaviour.
6.2 Protocol Projection
The same error produces different JSON shapes for different protocol boundaries:
use orion_error::{OrionError, StructError};
use orion_error::protocol::DefaultExposurePolicy;
use orion_error::UnifiedReason;
#[derive(Debug, PartialEq, OrionError)]
enum ApiReason {
#[orion_error(identity = "biz.invalid_input")]
InvalidInput,
#[orion_error(transparent)]
General(UnifiedReason),
}
let err = StructError::from(ApiReason::system_error())
.with_detail("disk offline at /dev/sda");
let proto = err.exposure(&DefaultExposurePolicy);
// HTTP response — minimal, safe for external clients
proto.to_http_error_json();
// Log output — full context for debugging
proto.to_log_error_json();
// RPC response — hides internal detail
proto.to_rpc_error_json();
// CLI output — human-readable summary
proto.to_cli_error_json();
7. Testing
#![allow(unused)]
fn main() {
use orion_error::dev::testing::assert_err_identity;
use orion_error::prelude::SourceErr;
use orion_error::reason::ErrorCategory;
use orion_error::reason::UnifiedReason;
let err = std::fs::read_to_string("config.toml")
.source_err(UnifiedReason::system_error(), "read config failed")
.unwrap_err();
assert_err_identity(&err, "sys.io_error", ErrorCategory::Sys);
}
Test helpers: assert_err_code(), assert_err_category(), assert_err_identity(), assert_err_operation(), assert_err_path().
8. Best Practices
- Define domain reasons with
#[derive(OrionError)] - Use
source_err(reason, detail)as the unified error entry point - Use
conv_err()for cross-layer reason conversion - Use
identity_snapshot()for stable identity inspection - Use
exposure(...)for protocol boundary output - Use explicit interop APIs when entering
std::error::Errorecosystem
Protocol Contract
1. Three-Layer Structure
- Stable identity:
ErrorIdentity - Exposure decision:
ExposureDecision - Output projections: HTTP / CLI / log / RPC / user debug
Roles:
StructError<R>— runtime propagationErrorIdentity— stable identificationDiagnosticReport— human diagnosticsErrorProtocolSnapshot— identity + decision + report assembly
2. Stable Identity
ErrorIdentity
Fields:
code— stable machine keycategory— stable classificationreason— stable human summarydetail— variable description (not a key)position— source locationpath— stable path projection
Entry points:
StructError::identity_snapshot()assert_err_code(…)— asserts stable code string, not numericerror_code()assert_err_category(…)assert_err_identity(…)
3. Exposure
protocol::ExposureDecision
Fields:
http_statusvisibilitydefault_hintsretryable
Default policy (DefaultExposurePolicy):
| Category | http_status | visibility |
|---|---|---|
| Biz | 400 | Public |
| Conf / Logic / Sys | 500 | Internal |
sys.network_error, sys.timeout → retryable = true. All others retryable = false.
Entry points:
ExposurePolicy::decide(…)StructError::exposure(…)StructError::into_exposure(…)
4. ErrorProtocolSnapshot
Fields:
identitydecisionreport(read-only viareport())
Entry points:
StructError::exposure(…)StructError::into_exposure(…)
Use cases: test snapshot, gateway reprojection, unified protocol output, debug summary.
5. HTTP Projection
Requires serde_json feature.
JSON fields: status, code, category, message, visibility, hints
Rules:
Public→messageusesdetailInternal→messageuses stablereason
Entry: ErrorProtocolSnapshot::to_http_error_json()
6. CLI Projection
Requires serde_json feature.
JSON fields: code, category, summary, detail, visibility, hints
Rules:
summaryuses compact renderdetailuses verbose render
Entry: ErrorProtocolSnapshot::to_cli_error_json()
7. Log Projection
Requires serde_json feature.
JSON fields: code, category, reason, detail, path, visibility, hints, root_metadata, context, source_frames
Rules:
- Full
contextpreserved - Full
root_metadatapreserved - Full
source_framespreserved
Entry: ErrorProtocolSnapshot::to_log_error_json()
8. RPC Projection
Requires serde_json feature.
JSON fields: status, code, category, reason, detail, visibility, hints, retryable
Rules:
detailonly visible whenPublicretryablefrom exposure decision
Entry: ErrorProtocolSnapshot::to_rpc_error_json()
9. User Debug Summary
render_user_debug(…) is a human-readable debug summary, not a machine protocol.
Entry: ErrorProtocolSnapshot::render_user_debug(), .render_user_debug_redacted(…)
Use cases: local debugging, sample output, manual troubleshooting.
Not: HTTP message, stable JSON schema.
10. DiagnosticReport
DiagnosticReport does not require ErrorIdentityProvider. Suitable for text rendering, redaction, human diagnostics.
Entry: StructError::report(), StructError::into_report()
11. Recommended Consumption Path
- Runtime propagation →
StructError<R> - Stable identification →
identity_snapshot() - Unified output →
exposure(…) - Protocol output → projection API
- Human summary →
render_user_debug(…)
Avoid:
- Using
Displaytext as protocol key - Using CLI text as machine protocol
- Using raw
detailas stable assertion
Report / Exposure Boundary
This document describes the responsibility boundary between DiagnosticReport and ErrorProtocolSnapshot.
Current State
category and code have been removed from DiagnosticReport. Identity data now lives exclusively in ErrorProtocolSnapshot.identity. All exposure bridge methods on DiagnosticReport (exposure_identity, http_status, visibility, default_hints, decision, exposure, to_exposure_json) have been deleted.
StructError<T>::report() only requires DomainReason, not ErrorIdentityProvider.
1. Object Roles
| Object | Responsibility |
|---|---|
StructError<R> | Runtime propagation, source chain, context attachment |
DiagnosticReport | Human diagnostic view, redaction, text rendering |
ErrorProtocolSnapshot | Identity + exposure decision + report, user debug, protocol JSON projection |
2. Recommended Primary Paths
Human diagnostics:
#![allow(unused)]
fn main() {
let report = err.report();
let text = report.render();
}
Protocol/projection:
#![allow(unused)]
fn main() {
let proto = err.exposure(&policy);
let debug = proto.render_user_debug();
let http = proto.to_http_error_json()?;
}
3. Principles
DiagnosticReportstays a diagnostic object.ErrorProtocolSnapshotis the sole exposure/projection closure.StructErrorroutes runtime errors into either report or protocol layer.
In short:
- Need text diagnostics →
report() - Need exposure / JSON projection →
exposure(…)
4. From DiagnosticReport to Protocol
If the caller starts from an existing DiagnosticReport (not StructError):
#![allow(unused)]
fn main() {
let proto = ErrorProtocolSnapshot::from_report_skeleton(report, identity, &policy);
}
But if full projection data (root metadata, source frames, path) is needed, prefer StructError::exposure(...).
5. Summary
The current design keeps DiagnosticReport focused on diagnostics while ErrorProtocolSnapshot handles all exposure and projection concerns. The two paths are independent and should not be mixed.
Logging
orion-error logging capabilities are built around OperationContext and OperationScope.
1. Feature
[dependencies]
orion-error = { version = "0.8.0", features = ["log"] }
# or
orion-error = { version = "0.8.0", features = ["tracing"] }
Default features include log.
Behavior:
logonly: useslogmacrostracingenabled: preferstracing- Both enabled: prefers
tracing
2. Basic Usage
#![allow(unused)]
fn main() {
use orion_error::OperationContext;
let ctx = OperationContext::doing("order_processing")
.with_field("order_id", "123")
.with_field("amount", "100.0")
.with_meta("component.name", "order_service");
ctx.info("start");
ctx.debug("payload prepared");
ctx.warn("slow upstream");
ctx.error("final failure");
ctx.trace("verbose trace");
}
Aliases: log_info, log_debug, log_warn, log_error, log_trace.
3. Automatic Result Logging
#![allow(unused)]
fn main() {
use orion_error::OperationContext;
let mut ctx = OperationContext::doing("sync_user")
.with_auto_log()
.with_field("user_id", "42");
do_sync()?;
ctx.mark_suc();
}
Default result is Fail. If with_auto_log() is enabled but neither mark_suc() nor mark_cancel() is called before drop, a failure log is emitted.
4. OperationScope
OperationScope is a guard for scoped lifecycle management.
#![allow(unused)]
fn main() {
use orion_error::OperationContext;
let mut ctx = OperationContext::doing("sync_user").with_auto_log();
{
let mut scope = ctx.scope();
scope.with_field("user_id", "42");
validate()?;
scope.mark_success();
}
}
Methods:
scope()— default failure; must callmark_success()explicitlyscoped_success()— default success; usemark_failure()orcancel()to overridemark_success()— mark as successmark_failure()— revert to failurecancel()— mark as cancelled
5. When to Use scoped_success()
scoped_success() is suitable when:
- The scope already handles failure branches internally
- Failure is explicitly handled via
mark_failure() - The code does not use
?to return early
Example:
#![allow(unused)]
fn main() {
let mut ctx = OperationContext::doing("process_order").with_auto_log();
{
let mut scope = ctx.scoped_success();
let ok = validate_order();
if !ok {
scope.mark_failure();
}
}
}
Not recommended:
let mut scope = ctx.scoped_success();
validate()?;
Because scoped_success() defaults to success on creation. If ? returns early, the scope is still marked as success on drop.
For fallible flows with early returns, prefer:
#![allow(unused)]
fn main() {
let mut scope = ctx.scope();
validate()?;
scope.mark_success();
}
6. op_context! Macro
#![allow(unused)]
fn main() {
use orion_error::op_context;
let ctx = op_context!("load_config").with_auto_log().with_field("path", "config.toml");
}
This macro expands module_path!() at the call site, adding more accurate module paths to automatic result logs.
7. Best Practices
- Use
doing(...)to name operations - Use
with_field(...)/with_meta(...)for chained construction - Use
record_field(...)/record_meta(...)only when a mutable reference already exists - Use
with_auto_log()only on scopes that need result logging - For fallible logic with
?, preferscope() + mark_success() - Use
scoped_success()only when failure paths are explicitly handled
Ecosystem Comparison: orion-error vs anyhow / thiserror / color-eyre
Scope: anyhow / thiserror / color-eyre / orion-error
1. Positioning
| Dimension | anyhow | thiserror | color-eyre | orion-error |
|---|---|---|---|---|
| Positioning | Quick error handling | Standard error derive | Diagnostic error reporting | Structured error governance framework |
| Target users | App developers (rapid prototyping) | Library authors | App developers (diagnostics) | Large multi-team projects |
| Problem domain | Reduce error handling boilerplate | Reduce Error impl boilerplate | Improve error diagnostic output | Unified error modeling → runtime propagation → boundary protocol projection |
| Abstraction level | Type erasure | Type-safe enum | Type erasure + diagnostics | Generic structured carrier |
2. Core Capabilities
Error Definition
| Capability | anyhow | thiserror | color-eyre | orion-error |
|---|---|---|---|---|
| Custom error types | Not directly | #[derive(Error)] | Not directly | #[derive(OrionError)] |
| Generic error type | Box<dyn Error> | User-defined enum | Box<dyn Error> | StructError<T: DomainReason> |
| Stable identity | No | No | No | stable_code() + ErrorCategory |
| Numeric ErrorCode | No | Via #[error(...)] | No | Built-in error_code() |
| Display / source | Auto | Auto | Auto | Auto (OrionError derive) |
Runtime Propagation
| Capability | anyhow | thiserror | color-eyre | orion-error |
|---|---|---|---|---|
| Context attachment | .context() / .with_context() | No | .sections() / .note() | OperationContext (doing/at/path + KV + metadata) |
| Context path | Single-layer context | No | Single-layer | Multi-layer nested path via target_path() |
| Custom metadata | No (message only) | No | Section trait | ErrorMetadata (typed KV, not in Display) |
| Source chain | Standard chain | Standard chain | Standard + SpanTrace | Dual-channel (Std/Struct) + rich SourceFrame |
| Cross-type conversion | anyhow!() macro | #[from] | eyre!() macro | source_err() / conv_err() |
Boundary Output
| Capability | anyhow | thiserror | color-eyre | orion-error |
|---|---|---|---|---|
| Human diagnostics | .display_chain() | No | Colored output | report().render() + RedactPolicy |
| Protocol JSON (HTTP/RPC) | No | No | No | exposure() → to_*_error_json() |
| Stable snapshot | No | No | No | StableErrorSnapshot + versioned schema |
| Exposure policy | No | No | No | ExposurePolicy (status/visibility/hints/retryable) |
| Redaction | No | No | Limited | RedactPolicy trait |
std::error::Error Ecosystem
| Capability | anyhow | thiserror | color-eyre | orion-error |
|---|---|---|---|---|
| Implements StdError | Yes | Yes | Yes | Explicit bridge (as_std() / into_std()) |
dyn Error compatible | Natively | Natively | Natively | Lossy (OwnedDynStdStructError) |
| Third-party interop | .context() / anyhow!() | #[from] | .sections() / eyre!() | source_err() / raw_source() |
3. Coexistence Strategy
| Layer | Recommended |
|---|---|
| Outside boundary (3rd-party libs, FFI) | thiserror / standard Error trait |
| Entering structured system | orion-error source_err() |
| Business layer propagation | orion-error StructError<R> |
| Cross-layer (repo → service → handler) | orion-error conv_err() |
| Boundary output | orion-error exposure() |
| Quick prototyping / glue code | anyhow (supported via anyhow feature) |
| Terminal diagnostics | orion-error report().render() or color-eyre |
4. When to Use What
Choose orion-error
- Multi-layer Rust backend services (repo → service → handler → protocol)
- External HTTP/RPC/gRPC interfaces with unified error responses
- Microservices with stable error codes and monitoring classification
- Multi-team projects needing consistent error conventions
- Persistent/versioned error snapshots
Choose alternatives
- Single-file scripts or CLI tools → anyhow
- Low-level libraries exposing
std::error::Error→ thiserror - Terminal applications needing pretty error output → color-eyre
- Projects with only 1-2 layers, no structured context needed → thiserror + anyhow
Comparison with thiserror
orion-error and thiserror are not mutually exclusive, but their positioning differs.
Positioning
thiserror: Define standard Rust error types, serving the std::error::Error ecosystem.
orion-error: Define runtime structured error carriers, managing context, source frames, snapshots, and protocol projections.
Capability Comparison
| Capability | thiserror | orion-error |
|---|---|---|
| Define standard error types | Strong | Not primary goal |
| Domain reason derive | Needs extra identity | OrionError is recommended |
| Runtime structured context | No | Yes |
| Source frame tracking | No | Yes |
| stable code / category | No | Yes |
| snapshot / report / projection | No | Yes |
When to Use thiserror
- Exposing standard
std::error::Errortypes - Library APIs requiring standard error types
The Wukong Error Governance Model: Stable Contracts, Reliable Diagnostics, and Adaptive Output
This article discusses one question: how an industrial-grade system can bring highly variable failures into a structure that is governable, diagnosable, and evolvable.
If you only want the main thread, start with these four sections:
- Core Tension: why error governance is fundamentally about convergence vs. diagnostics.
- Our Approach: The Wukong Error Governance Model: how the model governs errors through stable contracts, reliable diagnostics, and adaptive output.
- Error Governance in Rust: how to implement the model with
orion-error. - Industrial Validation: WarpParse: how a high-throughput ETL system validates the approach.
This article has three layers. Read as needed:
- Methodology (Error Handling Is the Boundary Between Prototypes and Industrial Systems -> Governance Levels): the core tension, the Wukong model, five principles, three propagation modes, and maturity levels.
- Engineering implementation (Error Governance in Rust): how
orion-errorturns the methodology into Rust code, including design rules and testing guidance. - Industrial validation (Industrial Validation: WarpParse -> Engineering Reuse for AI): validation in a high-throughput ETL setting, plus how to distill the approach into reusable engineering skills for AI.
- Appendix (Appendix: Language Mechanisms and Ecosystem Adoption): implementation tradeoffs in Java, TypeScript, Go, C++, Swift, and C#. You can skip it without losing the main argument.
Error Handling Is the Boundary Between Prototypes and Industrial Systems
A prototype only needs to prove that the happy path works. An industrial system must remain operable, diagnosable, recoverable, and evolvable under non-ideal conditions.
Systems do not live in ideal conditions for long. Inputs change. Dependencies degrade. Networks jitter. Configurations drift. Data accumulates dirty state. Business rules evolve. Execution paths branch dynamically based on users, environment, state, and policy. The happy path is not the whole system. Failure, degradation, retry, rollback, compensation, and manual intervention are also part of the lifecycle.
So an error is not just “an unexpected string outside normal logic”. It is information the system must carry when it continues operating under imperfect conditions, restores state, decides external responses, and supports diagnosis.
Many projects treat error handling early on as “each function’s own business”. Each function decides how to express failure, and that decision gets remade in the next function, the next module, and the next boundary.
That can work in small systems: short call chains, few boundaries, shared memory of context among participants. But once errors start crossing team boundaries, subsystem boundaries, service boundaries, protocol boundaries, or long-term compatibility boundaries without a unified shape, the failure path becomes ungovernable:
- The same failure becomes a string in module A, an enum in module B, and a panic in module C.
- Each layer rebuilds JSON at the boundary, but the structure is inconsistent.
- Troubleshooting finds scattered messages in logs, but no complete error path.
- Refactoring avoids touching error types because no one knows which upper layers depend on string content.
These problems do not always come from a single function being “badly written”. They can also come from missing tools, weak team conventions, historical drift, turnover, or unclear boundary ownership. The key point is this: once an error must travel across boundaries, be consumed by multiple roles, and remain compatible over time, it is no longer just local control flow.
Error governance defines how a system preserves information after failure, carries it across layers, exposes it externally, supports diagnosis, and evolves it over time. It is not decoration around business logic. It is the information architecture for when business logic fails.
The Industry Has Been Exploring Error Handling for a Long Time
Professional engineers and industry practice already agree that error handling matters. The hard part is not whether to handle errors, but how: without letting errors swallow business code, without letting failures collapse into ungovernable strings, while still giving callers stable decisions and giving troubleshooters enough detail.
Different language designs show that this problem has never had a single answer.
- C relies mainly on return codes,
errno, and conventions. Direct and cheap, but error information fragments easily and callers often miss checks. - Java makes exceptions the primary path and distinguishes checked from unchecked exceptions. It strengthens propagation, but also brings exploding exception hierarchies, blurry boundary semantics, and over-catching.
- Go emphasizes explicit
errorreturns, making failure visible in the call path. But without team discipline, errors easily become layer after layer of wrapped strings. - Rust uses
Result<T, E>,?, enums, and the type system to make errors part of ordinary control flow. But classification, context, boundary exposure, and diagnostic policy still require engineering design.
Each design makes tradeoffs. Language mechanisms can lower the cost of error handling, but they do not replace error governance itself. In large systems, the real problem is not choosing exceptions, return codes, or Result. It is deciding how failure information is classified, preserved, transformed, exposed, and observed across the system.
For engineering teams, error handling spans type design, call-chain propagation, logging and observability, protocol output, user experience, operational policy, and long-term compatibility. If each part is handled independently, the cost eventually surfaces in troubleshooting, refactoring, and cross-boundary collaboration.
So error handling cannot rely only on personal experience or local habits. It needs a methodology that can be discussed, executed, and evolved.
To this day, the industry still has no unified cross-language, cross-framework, cross-domain model for error governance. But strong projects have converged on useful practices in different directions:
- Stable error codes
- Structured diagnostics
- Centralized boundary policy
- State-oriented error presentation
- Observable failure signals
- User-facing repair hints
These practices show that error governance is not one API problem. It is a set of engineering constraints built around failure information.
Evidence from Strong Projects
| Project | Practice | Lesson |
|---|---|---|
| gRPC | Cross-language RPC failures converge to standard status codes | Stable classification lets callers retry, degrade, alert, and map user responses |
| PostgreSQL | Stable SQLSTATE codes instead of depending on message text | Machine contracts and human prose should be separate |
| Kubernetes | Readiness, failure reason, and conditions are written into status | Errors can become queryable, automatable system state |
| Terraform | Diagnostics carry severity, summary, detail, and attribute path | Errors should identify location, cause, and repair direction |
| rustc | Error codes, source location, labels, notes, and help shape the diagnostic experience | Diagnostics themselves are part of product quality |
| Envoy | Access log response flags express stable failure reasons | Boundary-layer errors should be searchable, aggregatable, alertable, and analyzable |
These projects differ in form, but point in the same direction: strong error handling designs the failure path as a stable information system. Machines can classify it, humans can diagnose it, internal detail is preserved, external exposure is policy-driven, and the same error serves both the current request and later troubleshooting, monitoring, and evolution.
Core Tension
Any error governance model must resolve one fundamental tension:
Convergence vs. diagnostics
- Callers need stable, finite classifications, otherwise they cannot make governance decisions such as retry, degrade, alert, or return a user response.
- Troubleshooters need complete, detail-preserving information, otherwise they cannot identify root cause.
Both demands are valid, but they naturally pull in different directions.
If errors expose too much technical detail to callers, upper layers start depending on the exact failure shapes of databases, network libraries, filesystems, and third-party SDKs. System boundaries are pierced by implementation detail. Refactoring the lower layer then forces contract changes.
If errors preserve only upper-layer business classifications, troubleshooting loses the critical path: what the original failure was, which component it occurred in, which layers it passed through, what each layer added, and why it was finally mapped to the external response.
So the central problem is not “whether to wrap errors”. It is this: how to make errors converge for governance while remaining faithful for diagnostics.
Inadequate Solutions
| Strategy | For callers | For troubleshooters |
|---|---|---|
| Throw only technical exceptions | Not governable | Full information |
| Throw only business errors | Governable | Root cause lost |
| Pure string chaining | Not governable | Human-readable, but not structurally queryable |
| Typed wrapping with preserved cause | Some local governance | Cause chain preserved, but classification and boundary policy still need extra rules |
| Swallow the error | Clean surface | All information lost |
Pure string chains only concatenate prose. Typed wrapping with preserved cause information (cause chains, typed wrapping, errors.Is/errors.As) supports some structured querying, but it solves only one part of the problem: how causes are preserved and queried. It does not automatically solve stable error identity, classification boundaries, exposure policy, or mappings to governance actions.
When one single representation is forced to satisfy both governance and diagnostics, one side usually loses: either callers get information too scattered for automation, or troubleshooters get too little information and must grep logs and reproduce failures manually.
Our Approach: The Wukong Error Governance Model
This article calls the methodology the Wukong Error Governance Model. What it “subdues” is not mythological demons, but the wildly varying failures inside industrial systems: using stable contracts to make errors legible, reliable diagnostics to preserve root causes and context, adaptive output to provide the right stable view for each audience, and production observation to keep the model evolving.
The name comes from the image of Wukong defeating monsters on the journey west. In software engineering, what we want to subdue and contain is not literal monsters, but errors themselves: giving them names, classes, traceability, and control, instead of letting them spread as scattered strings, implicit wrapping, and boundary leakage.
The core method is simple: separate stable contracts and reliable diagnostics into two channels, then generate adaptive output per audience.
Internal error model = contract channel + diagnostic channel
Adaptive output = audience-specific projections generated from the internal model under policy
The contract channel contains stable error identity, stable classification, and policy semantics. It serves governance decisions such as retry, degrade, alert, HTTP/RPC/CLI mapping, user hints, and SLA accounting. It should be finite, stable, documentable, and testable.
The diagnostic channel contains the diagnostic chain, context, and detail. It serves root-cause analysis: lower-level causes, traversed layers, the current operation, important fields, component, and environment. It can be richer and more dynamic, but it should not become the stable contract exposed to external callers.
The contract and diagnostic channels describe the information structure carried by errors internally. HTTP responses, RPC errors, CLI output, log records, metric labels, and debug reports are adaptive output views generated at different boundaries. Output views should not leak back and distort the internal error model.
| Channel | Contains | Serves | Stability requirement |
|---|---|---|---|
| Contract channel | Stable error identity, stable classification, category, retryable, exposure level | Callers, gateways, monitoring, operations policy, protocol clients | High; should be documented and enforced by tests |
| Diagnostic channel | Cause chain, operation context, key fields, dynamic detail, lower-level errors | Developers, SREs, troubleshooting tools, logging systems | May change, but must remain detailed, reliable, and traceable |
category is a fixed governance dimension for stable classification, such as business, system, configuration, or logic error. It helps distinguish ownership domains quickly and supports alert routing, log aggregation, and boundary policy. retryable, exposure level, HTTP status, and log level are not part of error prose. They are boundary decisions derived from stable identity, classification, and environment policy.
Four concepts must stay distinct:
identityis the long-term stable error key used by protocols, monitoring, alerts, documentation, and compatibility.reasonis the domain classification expression in code, usually an enum, sealed class, tagged union, or exception subtype.categoryis the coarse governance dimension used for routing and ownership decisions.policyis the boundary decision rule that derives output views from identity, category, and environment policy.
In short: identity is the contract primary key, reason is the code-level expression, category is the governance dimension, and policy is the boundary action. Do not substitute message text for identity, do not substitute category for identity, and do not scatter policy decisions across handlers.
The relation between reason and identity needs to be understood precisely. reason is the in-process type used by code to express domain classification. identity is the cross-boundary, cross-version error key. One reason variant usually provides one stable identity. External protocols, monitoring, and alerts should depend on identity.code, not on a Rust enum name, a Java class name, or Display prose. When errors cross semantic domains, upper layers should not expose lower-layer reason types, but they may preserve lower-layer errors in the diagnostic chain.
| Component | Meaning | Example |
|---|---|---|
| Stable error identity | Machine-readable error key for long-term compatibility | order.not_found, system.timeout |
| Stable classification | Finite categories for governance decisions | business error, config error, system error, timeout, rate limit |
| Governance attributes | Auxiliary decision fields derived from identity and classification | category, retryable, exposure level, HTTP status |
| Diagnostic chain | Preserved cause/source path across layers | service failure -> repository failure -> database timeout |
| Context | Structured environment of the current operation: where, for whom, doing what | operation, tenant, path, order_id, component |
| Detail | The current layer’s explanation of this failure | read config failed, upstream returned 503 |
The same error serves both needs through different views, so callers and troubleshooters no longer have to trade away each other’s requirements.
The contract channel should have low cardinality, low dynamism, and long-term stability. It should not include tenant IDs, file paths, SQL, HTTP bodies, third-party message text, user input, or specific field values. Those belong in the diagnostic channel as detail, context, or source chain. The more stable the contract channel is, the more reliable monitoring aggregation, alert routing, protocol compatibility, and automated decisions become. The richer and more reliable the diagnostic channel is, the more effective troubleshooting and repair become.
Different information has different stability. Design and testing should treat it accordingly:
| Information | Stability | Suitable as an external contract |
|---|---|---|
| Error identity code | Highest | Yes |
| Category | High | Usually yes |
| Reason variant | Medium-high | Internal code contract |
| Retryable / visibility / HTTP status | Medium | Policy contract |
| Detail | Low | No |
| Context fields | Low to medium | Usually no |
| Source chain | Low | No |
So tests should prioritize identity and policy results, not exact detail strings. Documentation should promise stable identity and classification semantics, not specific diagnostic wording.
Bricks Are Not the Building
A common objection is this: Java exceptions plus error codes, Go sentinel errors plus wrapping, and Rust enums plus cause chains already cover most of what the Wukong model wants. Why add another model?
Because those mechanisms are bricks, not the building. The gap appears in three places.
Error identity is unstable. In the Java ecosystem, exception types often act as routing signals: catch (OrderNotFoundException e). But exception types follow inheritance structure and can change during refactoring. Error codes become optional side fields. Callers first do instanceof, then read getErrorCode(). The code is not the routing key. The Wukong model makes stable error identity a first-class member of the contract channel. Boundary policy routes by identity, decoupled from inheritance. Whether you use enums, error code strings, or tagged unions, the identity itself must be stable, documentable, and enforced by tests.
The classification space grows naturally without bound. Exception-based designs encourage “one class per failure”: SubmitDependencyUnavailableException, InvalidStateException, and so on. The number of classes grows with business failure patterns, with no mechanism forcing convergence into a finite classification space. If exception types also carry classification semantics, classification stops being stable. Every new exception class can affect every boundary that routes by exception hierarchy. The Wukong model constrains the classification space (R) to a finite set. Adding a new variant becomes an intentional compatibility evolution, not casual class proliferation.
Contracts and diagnostics share one channel and constrain each other. Cause chains, structured wrapping, and errors.Is/errors.As solve diagnostic preservation: how root causes are kept and queried. They do not decide whether to retry or degrade, which HTTP status to map, whether to expose the error to users or only to logs. If governance actions are derived ad hoc from exception types, fields, or local judgments, they spread into every handler, every catch block, and every errors.Is call. The Wukong model separates contract information from diagnostic information and makes governance decisions come from centralized policy rather than local code.
So the key difference is not which language feature you use. It is whether your error architecture has four load-bearing walls: stable error identity that survives type refactors, a finite classification space governed by compatibility rules, diagnostics preserved across layers, and centralized boundary policy instead of repeated decisions in handlers. Without those structures, any language’s error handling turns into an ungovernable thicket, even when the underlying bricks are excellent.
Design Principles
The Wukong Error Governance Model needs five principles working together: a unified carrier, a stable contract channel, a faithful diagnostic channel, centralized boundary output, and explicit bridges to external ecosystems.
Code snippets in this section are language-agnostic pseudocode intended only to show the methodological shape.
Principle 1: Use a Unified Carrier for Contract and Diagnostic Information
Your own cross-layer propagation paths should use one structured model.
Anti-pattern:
#![allow(unused)]
fn main() {
// Module A returns io::Error
fn read_file() -> io::Result<Data>
// Module B returns a custom enum
fn validate() -> Result<Data, ValidationError>
// Module C returns a string
fn process() -> Result<Data, String>
}
Callers must learn a new error shape for every path. When composing multiple functions, the caller must classify, rebuild diagnostics, and decide boundary output format all over again.
Preferred shape:
read_file() -> Result<Data, StructuredError<ErrorClass>>
validate() -> Result<Data, StructuredError<ErrorClass>>
process() -> Result<Data, StructuredError<ErrorClass>>
Only a unified carrier can hold stable classification and diagnostics at the same time. What varies is the classification space and the context. Different layers may define different classification spaces, but cross-layer propagation must have clear convergence or boundary-conversion rules.
A unified carrier does not mean third-party libraries, the standard library, framework exceptions, or protocol errors must all become one type. It means the propagation paths your team controls should use one structured model internally, and bridges to external ecosystems should be explicit.
Principle 2: Keep the Contract Channel Stable
Error classification contracts should evolve under backward-compatible rules.
Classification is a contract because callers depend on it to make governance decisions. “Stable” does not mean you can never add classifications. It means the machine key and semantics of existing classifications should not change casually.
The error identity is the machine primary key of the contract channel. In practice it is often a stable string, numeric code, or protocol field such as business.not_found or system.timeout. Callers, gateways, monitoring, alerts, and documentation should depend on that identity, not on message text.
Compatibility rules for classification contracts:
- You may add new identities or classifications to represent new business or system failures.
- You should not delete committed public identities. If one must be retired, keep a compatibility mapping or versioned migration path.
- You should not change the meaning of an existing identity, such as turning
business.not_foundfrom “resource does not exist” into “permission denied”. - You should not let one identity produce contradictory governance actions at different boundaries, such as retryable in one place and non-retryable in another.
- You may change wording, diagnostic detail, context fields, and lower-level source chains, as long as identity and classification semantics remain intact.
| Should remain stable | May change |
|---|---|
| Stable error identity | Diagnostic detail |
| Classification semantics | Error wording |
Category (business / system / config) | Specific technical detail |
Stable classification has another benefit: it becomes the shared interface between humans and systems. Operations rules, gateway status mappings, API documentation, and response specs all depend on stable identity and classification semantics rather than message text. Enums, exception types, numeric codes, or tagged unions are only implementation forms for expressing that contract.
Granularity must be constrained. A new stable error identity usually needs at least one of the following:
- Callers need a different governance action such as retry, degrade, stop retrying, or manual intervention.
- A boundary needs a different protocol status, public error code, user message, or repair hint.
- Monitoring, alerts, SLA reporting, or operations reporting need independent aggregation.
- SREs, business owners, or rule developers need to track it as a distinct failure class.
- The semantics are stable over time and do not depend on the current database, SDK, network library, or implementation detail.
The following usually do not justify a new stable identity:
- Only the wording differs.
- Only the field name, file name, tenant, path, line/column, or sample content differs.
- Only the lower-level library error type differs, but the governance action is the same.
- It only exists to make logs more detailed.
Those dynamic differences belong in the diagnostic channel. Otherwise the classification space keeps expanding until the contract channel degenerates into another form of log text.
Principle 3: Preserve the Diagnostic Channel Across Layers
As errors propagate internally, layers should add information without damaging the existing diagnostic chain.
Anti-pattern:
repository() -> Result<Data, RepoError> {
// database connection failed, returns RepositoryConnectionFailed
}
service() -> Result<Data, ServiceError> {
data = repository()? // lower-level specifics are discarded
return data
}
Preferred shape:
repository() -> Result<Data, StructuredError<RepositoryClass>> {
// database connection failed, original database error preserved
}
service() -> Result<Data, StructuredError<ServiceClass>> {
data = repository()
.source_err(ServiceDependencyFailed, "load repository data failed")
return data
}
The information preserved by each layer forms a complete error chain, allowing diagnosis to trace from the final error back to the original root cause.
There are two different operations here:
- If the current layer only converges lower-level classification into the upper layer’s classification space and does not introduce a new semantic boundary, it should preserve the existing diagnostic chain without inventing a new error narrative.
- If the current layer expresses a new failure meaning, it should keep the lower-level error as a cause and add the current layer’s explanation.
The deciding factor is the semantic domain, not the number of stack frames.
If upper and lower layers belong to the same semantic domain, the conversion is usually only classification convergence. For example, a database driver, a query executor, and a repository helper all live in the data-access domain. They may converge a lower connection failure into RepositoryConnectionFailed while preserving the original database error and context, without adding business narrative at every level.
If the error crosses a semantic domain or an architectural responsibility boundary, a new semantic boundary should be introduced. For example, once a data-access failure enters the order service, the upper layer should not care about “database connection failed”. It cares about “load order draft failed” or “submit-order dependency unavailable”. At that point the service layer should add its own semantic meaning and keep the lower data-access error as a cause.
Useful diagnostic questions:
- Is this layer hiding implementation details from the layer above?
- Does this layer have new business meaning, user intent, or operation goals?
- Will this layer change governance action, such as mapping a low-level timeout into business dependency unavailable?
- If the lower implementation is replaced in the future, should the upper-layer error contract stay the same?
If the answer is yes, this is usually a semantic boundary. If it is only module splitting, a helper, or technical layering within one domain, then classification convergence and diagnostic preservation are enough. Redaction and formatting can happen later at output boundaries.
Principle 4: Centralize Output at Boundaries
Boundary exposure policy should be defined centrally, not re-decided at each boundary point.
Structured errors carry both the contract and diagnostic channels as they propagate internally. At the boundary, the boundary layer reads the stable error identity from the contract channel and passes it to a unified policy to generate output views.
StructuredError<ErrorClass>
-> error_identity
-> exposure_policy
-> HTTP response / RPC error / CLI output / log record / metric label
Anti-pattern:
// handler A
match err {
NotFound => HttpResponse(404, "not found"),
Timeout => HttpResponse(503, "try again"),
}
// handler B
match err {
NotFound => HttpResponse(404, "resource missing"),
Timeout => HttpResponse(504, "gateway timeout"),
}
The two handlers expose the same error inconsistently.
Preferred shape:
// central policy definition
policy.status(error_identity) {
match error_identity {
"business.not_found" => 404
"system.timeout" => 503
_ => 500
}
}
// all boundaries use the same policy
render_error_response(err, policy)
Centralized policy must cover more than HTTP status. It should also cover:
- Public error codes and user-visible messages
- HTTP/RPC/CLI format mapping
- Log levels and structured log fields
- Whether to trigger alerts or count toward SLA
- Whether callers should retry, degrade, or stop retrying
- Diagnostic redaction and exposure level
- Metric labels and aggregation dimensions
If those decisions are scattered across handlers, workers, and controllers, the same error identity will produce contradictory output across boundaries and the contract channel will stop being stable.
Principle 5: Bridge External Ecosystems Explicitly
Crossing into external ecosystems such as logging systems, standard error interfaces, or third-party libraries should be explicit.
Anti-pattern:
// the caller unknowingly degrades the error into a plain string
handle(error_as_text) // structured information erased
Preferred shape:
// explicit choice to enter an external ecosystem
plain_error = err.to_plain_error()
log_record = err.to_log_record(redaction_policy)
Explicit bridges ensure that loss of structure, redaction, or degradation is intentional rather than accidental. Each bridge function should have a clear bridge contract:
- Who is the target consumer: user, protocol client, logging system, monitoring system, third-party library, or standard error interface?
- What is preserved: stable error identity, classification, cause-chain summary, operation context, key fields, retryable, visibility?
- What is dropped: internal implementation types, sensitive fields, excessively long lower-level errors, dynamically formatted text that is not stable to parse?
- What is redacted: tokens, secrets, personal data, tenant-isolation data, internal topology, SQL fragments, or payload bodies?
- How does degradation work: if the target ecosystem accepts only strings or ordinary exceptions, which fields are compressed into text and which are lost entirely?
Different bridge targets need different contracts. Logging should keep error identity, classification, context, key fields, and a cause summary. External responses should expose only the public error code, public message, and repair hint. Bridging to a standard error interface may preserve only text and the source chain. The point is not “carry everything everywhere”. The point is that every output is auditable, testable, and predictable.
Boundary Safety and Cross-Process Output
Structured errors preserve more information internally, so boundary output must handle governance, safety, and privacy together. The diagnostic channel may be rich, but that does not mean everything may leave the current trust domain.
Trust domains should be handled in layers:
| Trust domain | Recommended payload |
|---|---|
| In-process | Full structured error, source chain, detail, context |
| Service-to-service / cross-process | Protocol snapshot, identity, category, public message, retryable, correlation ID |
| User boundary | Stable error code, public message, necessary repair hint, correlation ID |
| Observability systems | Redacted diagnostic summary, key context, aggregation labels |
| Support bundles / debug reports | Richer diagnostics under permission control, with redaction and lifecycle constraints |
Different views should have different exposure levels:
- User responses should include only stable error code, public message, required repair hints, and correlation ID.
- Protocol clients should receive machine-usable governance fields such as identity, category, and retryable, but not the internal source chain.
- Logs and reports should preserve redacted context, cause summaries, and key diagnostic fields.
- Support bundles or debug reports may include richer diagnostics, but only with permission, redaction, and lifecycle controls.
Across services, processes, or message queues, you should usually avoid shipping the full internal error object directly. A safer approach is to transmit a protocol snapshot: stable identity, category, public message, retryable, correlation ID, or trace ID. The full diagnostic chain stays inside the service that produced the error and is correlated through logs, traces, and reports.
That avoids two problems: leaking internal implementation details to external callers, and making downstream systems depend on upstream internal error types, source chains, or library shapes. At cross-process boundaries, the error contract should be a protocol contract, not a direct serialization of in-process diagnostic structure.
Three Error Propagation Modes
The five principles above define the static structure of error governance: a unified carrier, a stable contract channel, a faithful diagnostic channel, centralized boundary output, and explicit external bridging. The three propagation modes below describe the dynamic lifecycle of an error: how it first enters the structured system, how it changes across semantic domains, and how it is finally projected at a boundary. Principles answer “what the structure should look like”. Modes answer “how it moves at runtime”. They are complementary views of the same methodology.
Error propagation is not just mechanically throwing upward. In an industrial system, an error goes through three actions: first entry, cross-layer conversion, and boundary output.
First Entry
When a raw failure such as I/O, parse, or network error enters the structured system for the first time, three things must happen together:
- Choose the classification (business vs. system vs. configuration)
- Add the current layer’s explanation (
detail) - Preserve the raw error as the lower-level cause
The diagnostic concepts divide responsibilities like this: source/cause answers what the root problem was, context answers where, for whom, and while doing what, and detail answers how the current layer interprets the failure.
Cross-Layer Conversion
The upper layer converges lower-level classifications into its own classification space. If it is only remapping classification, it preserves all diagnostics. If it needs a new semantic boundary, it wraps the lower-level error as a cause. The deciding factor is whether the current layer is a new semantic boundary.
A semantic boundary is not the same as an architectural layer boundary. Crossing a function, file, helper, or adapter does not automatically require a new error narrative. Crossing business intent, user operation, governance action, or implementation-hiding boundaries usually does. Over-wrapping fills the source chain with repetitive “failed to process” messages. Under-wrapping leaks low-level technical failures into upper-layer contracts. The criterion is semantic responsibility, not call-stack depth.
Three quick questions help:
| Current action | Add new stable identity? | Add new source frame? | Mental model |
|---|---|---|---|
| Only converge lower classification into upper classification | No, only remap | No | Reason convergence, e.g. conv_err() |
| Express new business or architectural semantics | Yes | Yes | New semantic boundary, e.g. source_err(...) |
| Only add current operation path or fields | No | No | Context enrichment, e.g. doing(...) / with_context(...) |
The point of this table is to avoid both extremes: wrapping every layer into a new story, or directly exposing cross-domain low-level technical failures to upper layers.
Boundary Output
At a system boundary such as an HTTP handler, RPC endpoint, CLI entry, or log-writing point, choose the output format, apply exposure policy, and emit the result.
A Full Propagation Example
Here is the full path of one “submit order” failure across the three modes.
First, a database failure enters the structured system in the repository layer. The repository chooses a stable classification in the data-access domain, preserves the database error as source, and adds operation context.
repository.insert_order(order) -> Result<(), StructuredError<RepositoryClass>> {
db.insert(order)
.on_error(source_error) {
return StructuredError {
identity: "repository.connection_failed",
class: RepositoryConnectionFailed,
detail: "insert order failed",
context: {
operation: "insert_order",
order_id: order.id,
component: "order_repository"
},
source: source_error
}
}
}
Second, the service layer crosses into the business semantic domain. It does not expose “database connection failed” to upper layers. Instead it expresses a business failure, dependency unavailable during order submission, while preserving the repository error as the cause.
service.submit_order(order) -> Result<(), StructuredError<ServiceClass>> {
repository.insert_order(order)
.on_error(repo_error) {
return StructuredError {
identity: "order.submit_dependency_unavailable",
class: SubmitDependencyUnavailable,
detail: "submit order failed",
context: {
operation: "submit_order",
order_id: order.id,
tenant: order.tenant
},
source: repo_error
}
}
}
Third, the HTTP handler reaches the boundary. It does not reinterpret the error. It hands the error to centralized policy to generate output.
handler.post_orders(req) -> HttpResponse {
result = service.submit_order(req.order)
if result is error {
err = result.error
identity = err.identity
log_record = policy.to_log_record(err)
metrics.record(policy.metric_labels(identity))
return HttpResponse {
status: policy.http_status(identity),
body: policy.public_body(identity),
retry_after: policy.retry_after(identity)
}
}
}
At the boundary, the contract channel presents order.submit_dependency_unavailable, which drives status code, user message, retry hint, and metric labels. The diagnostic channel still preserves service detail, repository detail, context, and the original database error. Callers do not need database details, but troubleshooters can still trace root cause.
The Lifecycle of Error Governance
Runtime propagation is only part of governance. Once stable classifications reach production, they must be observed and evolved:
detect -> classify -> enrich -> propagate -> project -> observe -> review/evolve
detect: capture the raw technical error or business failure where it occurs.classify: choose the stable identity and classification in the current semantic domain.enrich: add detail, context, and source without polluting the contract channel.propagate: preserve the diagnostic chain across layers and create a new semantic boundary where needed.project: generate output views at HTTP/RPC/CLI/log/metric boundaries under policy.observe: inspect error distribution and governance effectiveness through logs, metrics, traces, and reports.review/evolve: merge, retire, or add error identities based on production feedback, and update policy and documentation.
This step matters. Classification is not one-time modeling. If one identity carries too many different governance actions over time, classification is too coarse. If a group of identities differs only in wording while governance action stays the same, classification is too fine. After L2, production observation should calibrate the classification contract in reverse.
Governance Levels
Error governance maturity has four levels:
L0: No governance
- Error types are scattered:
std::io::Error,String,Box<dyn Error>, custom enums, all mixed together - Boundary output is manually concatenated strings
- Troubleshooting depends on grepping logs
L1: Unified carrier
- Internal cross-layer paths return the same structured model
- A basic cause chain exists, but may still be dropped across layers
- There is still no stable classification contract, and the same failure may be categorized differently in different modules
- This is only unified expression and propagation, not governance yet
L2: Stable classification
- The classification contract is stable and documented
- Boundary output uses unified policy
- Cause chains are preserved across layers
- Tests assert error identities rather than message text
L3: Governance-driven
- Error classifications map directly to governance actions such as retry, degrade, alert, and SLA handling
- Boundary policy is configurable and may vary by environment
- Error metrics enter monitoring systems
- New error types require review before being added
Most teams live between L0 and L1. The step from L1 to L2 is the most underestimated one. Switching return types to a unified carrier is not enough. The team also needs shared semantics around which failures share one identity, which classifications imply retry, and which errors may be exposed externally. Java exception mechanisms can help a project reach L1, but they do not automatically provide stable classification, unified boundary policy, or testing constraints.
Moving from L1 to L2 requires:
- Standardizing the classification contract: stable identities, classification semantics, category, and governance meaning.
- Refactoring existing errors: migrate scattered strings, technical exceptions, and temporary enums into stable classifications.
- Establishing boundary policy: unify HTTP/RPC/CLI/log/metric output rules.
- Establishing test rules: assert identity and governance decisions rather than message text.
- Establishing review habits: when adding a new error, discuss semantic ownership rather than only whether the code compiles.
Once you reach L2, tests should not stop at “an error was returned”. A more valuable test matrix includes:
- Whether
identityis stable and independent from message text - Whether
category,retryable,visibility, and HTTP status match policy - Whether the source chain preserves the lower-level root cause
- Whether context includes the key fields needed to locate the problem
- Whether exposure is correctly redacted and user responses do not leak internal detail
- Whether HTTP/RPC/CLI/log/metric views are consistent for the same error identity
The move from L1 to L2 is not just a local refactor. It changes team collaboration. Error classification stops being an individual implementation detail and becomes a shared engineering language.
L3 means error governance has entered organizational process. New error types need review because every new stable identity can affect alerts, retries, SLA handling, user wording, protocol compatibility, and operations dashboards. At that stage, changes to error classification should be managed like API changes: naming conventions, compatibility rules, policy mappings, test coverage, and retirement or migration paths.
When the Model Does Not Fit
- Small projects, prototypes, scripts. If there are few boundaries, a short lifecycle, and errors are handled locally, layered governance adds little value.
- Extremely performance-sensitive paths. Structured error paths carry costs such as allocation, cause chains, context collection, and serialization. In statically typed languages, generics or templates may also add compile time and code size.
- Errors do not cross layers. If all errors are fully handled inside one layer, the benefit approaches zero.
Interim Summary
Up to this point, we have covered the general methodology: why error governance matters, what the core tension is, and how the Wukong model organizes failure information. The next section moves into the Rust implementation.
Error Governance in Rust
Rust is well suited to structured error governance, but it does not perform governance automatically. Result<T, E>, enums, ?, and traits solve syntax for error expression and propagation. Stable error identity, semantic boundaries, diagnostic preservation, boundary output, and bridge contracts still require engineering design.
orion-error turns the Wukong model into Rust infrastructure:
Result<T, StructError<R>>
R -> reason / identity / category in the contract channel
StructError<R> -> runtime carrier for detail / context / source chain
ExposurePolicy -> boundary output policy
report / interop -> diagnostics and external ecosystem bridges
R is the error classification contract of the current semantic domain. Different bounded contexts, architectural layers, or business domains may define their own Reason types. Cross-domain propagation expresses semantic boundaries through explicit conversion.
The diagram below shows how an error enters the structured system from a low-level failure, crosses semantic domains, and finally reaches boundary output. Keep three points in mind:
- Internal propagation uses the unified carrier
StructError<R>. - One error carries both the contract and diagnostic channels.
- Boundary layers generate output views from policy instead of reinterpreting the error.
flowchart TB
raw["Raw failure<br/>IO / DB / Network / Parser"]
repo["Repository layer<br/>StructError<RepositoryReason>"]
service["Service layer<br/>StructError<OrderReason>"]
boundary["System boundary<br/>HTTP / RPC / CLI / Worker"]
raw -->|"First entry<br/>source_err(reason, detail)"| repo
repo -->|"Cross semantic domain<br/>source_err(new_reason, detail)"| service
service -->|"Boundary output<br/>exposure(policy)"| boundary
subgraph governance["Contract channel: stable, finite, testable"]
identity["identity<br/>order.submit_dependency_unavailable"]
category["category<br/>biz / sys / conf / logic"]
policy["policy<br/>status / retry / visibility / hints"]
end
subgraph diagnostic["Diagnostic channel: faithful, traceable"]
detail["detail<br/>submit order failed"]
context["context<br/>operation / order_id / tenant / component"]
source["source chain<br/>service -> repository -> database"]
end
service -.carries.-> identity
service -.carries.-> category
service -.carries.-> detail
service -.carries.-> context
service -.carries.-> source
identity --> policy
category --> policy
policy --> boundary
boundary --> user["External response<br/>stable error code + public message"]
boundary --> log["Logs / Report<br/>diagnostic summary + redacted context"]
boundary --> metric["Metrics / Alerting<br/>identity + category"]
The key idea in the diagram is that errors are not flattened into strings during internal propagation. The boundary layer generates three output views from policy: user response, logs/reports, and metrics/alerts.
Mapping the Five Principles to Rust and orion-error
| Methodology principle | Rust / orion-error implementation | Effect |
|---|---|---|
| Unified carrier for contract and diagnostics | Use Result<T, StructError<R>> for internal cross-layer propagation | Callers face one error shape, with classification space parameterized by R |
| Stable contract channel | Domain reasons define stable identity, category, and classification semantics | Callers, monitoring, and protocol boundaries depend on identity, not wording |
| Faithful diagnostic channel across layers | Use detail, context, and source chain to preserve lower causes and current-layer explanation | Upper layers may converge classification without losing root cause |
| Centralized boundary output | Use exposure policy to decide HTTP/RPC/CLI/log/metric output centrally | Avoid every handler building its own response and redaction rules |
| Explicit bridges to external ecosystems | Use report, redacted render, standard-error interop, protocol JSON, and similar explicit conversion paths | Every degradation, redaction, or exposure step has a clear contract |
Design Rule 1: Define Reason by Semantic Domain
Each semantic domain should define its own reason type instead of putting every system error into one giant global enum.
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, OrionError)]
enum RepositoryReason {
#[orion_error(identity = "repository.connection_failed")]
ConnectionFailed,
#[orion_error(identity = "repository.write_failed")]
WriteFailed,
#[orion_error(transparent)]
General(UnifiedReason),
}
#[derive(Debug, Clone, OrionError)]
enum OrderReason {
#[orion_error(identity = "order.submit_dependency_unavailable")]
SubmitDependencyUnavailable,
#[orion_error(identity = "order.invalid_state")]
InvalidState,
#[orion_error(transparent)]
General(UnifiedReason),
}
}
That matches the stable classification contract described earlier: repository.connection_failed belongs to the data-access semantic domain, while order.submit_dependency_unavailable belongs to the business domain. The same lower-level failure may trigger both at different boundaries, but they should not be merged into one classification.
Design Rule 2: Build Structured Errors at First Entry
When ordinary I/O, database, network, or parse errors first enter the governance system, three things must happen together: choose the current-layer classification, provide detail, and preserve the lower-level source.
#![allow(unused)]
fn main() {
fn insert_order(order: &Order) -> Result<(), StructError<RepositoryReason>> {
let ctx = OperationContext::doing("insert_order")
.with_field("order_id", order.id.to_string())
.with_meta("component.name", "order_repository");
db_insert(order)
.source_err(RepositoryReason::ConnectionFailed, "insert order failed")
.map_err(|err| err.with_context(ctx))?;
Ok(())
}
}
Do not convert the lower-level error into a string. The lower-level error is the source, "insert order failed" is repository-layer detail, and order_id plus component.name are context.
Design Rule 3: Create a New Boundary When Crossing Semantic Domains
Within one semantic domain, classification convergence should only convert reasons and should not create a new error story. When crossing into a new business semantic domain, create a new semantic boundary and preserve the lower structured error as the source.
#![allow(unused)]
fn main() {
fn submit_order(order: &Order) -> Result<(), StructError<OrderReason>> {
let ctx = OperationContext::doing("submit_order")
.with_field("order_id", order.id.to_string())
.with_field("tenant", order.tenant.to_string());
insert_order(order)
.source_err(
OrderReason::SubmitDependencyUnavailable,
"submit order failed",
)
.map_err(|err| err.with_context(ctx))?;
Ok(())
}
}
The service layer does not expose the repository’s connection failure directly to the handler. It expresses a business failure instead: dependency unavailable during order submission. The repository error still remains in the source chain.
Design Rule 4: Boundaries Only Output; They Do Not Reinterpret Errors
HTTP handlers, RPC endpoints, CLI entries, and worker boundaries should not rebuild error semantics. They should pass structured errors to centralized policy to generate responses, logs, metrics, and debug reports.
#![allow(unused)]
fn main() {
fn handle_submit(req: Request) -> HttpResponse {
match submit_order(&req.order) {
Ok(()) => HttpResponse::ok(),
Err(err) => {
let snapshot = err.exposure(&DefaultExposurePolicy);
log_error(err.report());
HttpResponse::from(snapshot)
}
}
}
}
The boundary has multiple output views: redacted exposure for users, full report for developers and SREs, and stable identity plus category for monitoring.
Design Rule 5: Test Error Identity, Not Error Wording
After reaching L2, tests should enforce stable error identity and governance decisions, not exact message text. Error wording may improve, be translated, or be redacted. Identity and classification semantics are the long-term contract.
#![allow(unused)]
fn main() {
let err = submit_order(&order).unwrap_err();
assert_eq!(
err.identity_snapshot().code,
"order.submit_dependency_unavailable"
);
let exposed = err.exposure(&DefaultExposurePolicy);
assert_eq!(exposed.decision.http_status, 503);
}
These tests force the team to maintain a stable classification contract: new errors need identities, identity changes must consider compatibility, and boundary policy must have explicit expectations.
For a runnable end-to-end example, see orion-error/examples/order_case.rs. It defines reasons separately for parsing, user, storage, and order-service layers. Lower-level failures enter the structured system once, diagnostics are preserved through reason convergence, and boundary output is unified at the end. That example mainly demonstrates the conv_err() convergence path. If an upper layer needs a new business semantic boundary, it should follow this article and use source_err(...) to keep the lower structured error as the source.
Industrial Validation: WarpParse
orion-error is the Rust infrastructure implementation of the Wukong Error Governance Model. But infrastructure still needs validation in real industrial systems: high throughput, long call chains, many roles, many boundaries, and strong observability requirements. That is where you learn whether error governance is truly usable.
WarpParse is the core high-throughput log parsing and ETL engine in the Orion stack. According to the Linux single-machine benchmark in wp-examples/benchmark/report/report_linux.md, WarpParse 0.12.0 achieved an EPS multiplier range of 1.56x-20.30x for pure parsing and 1.34x-17.90x for parse-plus-transform against Vector-VRL 0.49.0, across five log categories (Nginx, AWS ELB, Firewall, APT Threat, Mixed Log) and three topologies (File -> BlackHole, TCP -> BlackHole, TCP -> File).
The benchmark proves industrial intensity: high throughput, multiple formats, multiple topologies, parsing and transformation together. It does not by itself prove error governance quality. The value of error governance has to be judged by whether failure paths can be located, classified, projected, and automated.
So WarpParse validates this methodology not by throughput numbers alone, but by whether complex failure paths can be expressed stably:
- Can rule errors point to file, line, column, and field?
- Can configuration errors block rollout instead of triggering system-failure paging?
- Can data-quality errors be aggregated separately without polluting system-error metrics?
- Can runtime failures be distinguished into retryable, non-retryable, and manual-intervention-required?
- Do user view, operator view, and debug view all come from the same stable error identity?
In such a system, if a rule syntax failure returns only a string like this:
unexpected token at line 12
the rule developer still has to open the rule file, find the location, guess which field failed, and decide whether the problem is syntax or sample mismatch. The system also cannot reliably distinguish configuration errors, data-quality issues, and runtime system failures from text alone.
With Wukong-style governance, the same failure becomes structured information:
identity : rule.syntax
category : config
detail : unexpected token in extractor expression
context : {
rule_file : "rules/nginx.wpl",
line : 12,
column : 18,
field : "request_time",
expected_token : "identifier",
actual_token : ")"
}
policy : block rule activation, show repair hint, do not page SRE
This is the key validation point in WarpParse: rule developers get precise location and repair clues, the runtime gets stable error identity and governance policy, and operations can count and alert configuration, data, and system failures separately. The higher the throughput, the more you need structured failure paths. Otherwise, stronger processing capacity only amplifies error spread and troubleshooting cost.
WarpParse Error Governance Structure
WarpParse error handling covers the full path of rule development, rule validation, runtime parsing, pipeline execution, boundary output, and operational observation. Read the diagram in three layers: failure sources, contract/diagnostic carriers, and output views.
flowchart TB
sample["Sample logs<br/>Nginx / ELB / Firewall / APT / Mixed"]
rule["WPL rules<br/>field extraction / type conversion / enrichment"]
check["Rule validation<br/>syntax / sample / schema"]
engine["Parsing runtime<br/>high-throughput parse / transform"]
pipeline["ETL pipeline<br/>input -> parse -> transform -> output"]
boundary["System boundary<br/>CLI / API / worker / report"]
sample --> check
rule --> check
check -->|"rule accepted"| engine
engine --> pipeline
pipeline --> boundary
subgraph failure["Failure sources"]
syntax["Rule syntax error"]
mismatch["Sample mismatch"]
typeerr["Type conversion failure"]
dirty["Dirty data / anomalous field"]
runtime["Runtime I/O / backpressure / resource issue"]
end
syntax --> check
mismatch --> check
typeerr --> engine
dirty --> engine
runtime --> pipeline
subgraph governance_wp["Contract channel"]
wp_identity["Stable error identity<br/>rule.syntax / parse.mismatch / transform.type / runtime.io"]
wp_category["category<br/>config / data / system"]
wp_policy["policy<br/>abort? / skip? / alert? / retryable?"]
end
subgraph diagnostic_wp["Diagnostic channel"]
wp_rule_ctx["Rule context<br/>rule file / line / field / pattern"]
wp_sample_ctx["Sample context<br/>sample id / input slice / expected field"]
wp_runtime_ctx["Runtime context<br/>source / sink / batch / offset / component"]
wp_source["source chain<br/>parser -> engine -> pipeline"]
end
check -.produces.-> wp_identity
engine -.produces.-> wp_identity
pipeline -.produces.-> wp_identity
check -.preserves.-> wp_rule_ctx
check -.preserves.-> wp_sample_ctx
engine -.preserves.-> wp_rule_ctx
engine -.preserves.-> wp_source
pipeline -.preserves.-> wp_runtime_ctx
wp_identity --> wp_policy
wp_category --> wp_policy
wp_policy --> boundary
boundary --> user_view["Rule developer view<br/>error location + repair hint"]
boundary --> ops_view["Operations view<br/>metrics + alerting + failure classification"]
boundary --> debug_view["Debug view<br/>redacted context + source chain"]
The core point of this diagram is that WarpParse’s high-performance parsing and its error governance must coexist. orion-error provides the governance infrastructure, and WarpParse validates that the methodology works in an industrial high-throughput ETL system.
Engineering Reuse for AI
The Wukong Error Governance Model should not stop at documentation. A more effective approach is to organize the methodology, design principles, crate or library usage rules, example code, anti-patterns, and migration guidance into reusable engineering skills. In the Orion ecosystem, these skills are maintained in the orion-skills repository: https://github.com/galaxio-labs/orion-skills
AI can use those skills to produce project-level error design documents. One example is Warp Insight’s error-handling system design: https://github.com/wp-labs/warp-insight/blob/main/doc/design/foundation/error-handling-system.md . The value of such documents is not merely listing error types. It is letting AI and human engineers reason about classification, propagation, boundary output, observability, and migration using the same governance model.
That changes AI from “generate a few ad hoc error-handling snippets” into “work inside a defined governance model”. In a new project, AI first identifies error boundaries, semantic domains, stable identities, diagnostic chains, and boundary output, then proposes a governance plan. During implementation, it uses orion-error according to convention, and turns reason definitions, source preservation, context attachment, exposure policy, and test assertions into code.
Skills turn error governance from “prompt the AI with experience” into “give the AI a set of engineering constraints”:
- Planning stage: identify whether the current system is at L0, L1, or L2, then design the classification contract and migration path.
- Design stage: split semantic domains and define reason, identity, category, and governance attributes.
- Implementation stage: choose the correct API for first entry, cross-layer convergence, semantic-boundary wrapping, and boundary output.
- Review stage: check whether source was lost, whether code depends on error wording, whether handlers rebuild responses repeatedly, and whether tests for stable identities are missing.
- Migration stage: gradually converge string errors, temporary enums, and generic wrapping into stable contract and diagnostic structures.
Error handling spans architecture, protocols, observability, tests, and team conventions. A single prompt is rarely enough to make it consistent. Once the methodology and library constraints are distilled into skills, AI can reuse the same engineering judgment across projects and generate implementations that stay consistent and maintainable.
Appendix: Language Mechanisms and Ecosystem Adoption
The methodology itself is language-agnostic, but implementation cost differs by language. Two dimensions matter:
- Language expressiveness: how naturally the language can express stable classification, structured carriers, cause chains, and boundary output.
- Ecosystem adoption cost: how much organizational and migration effort it takes to adopt the governance model within the existing ecosystem.
High affinity does not mean low adoption cost. Rust’s type system fits this model very well, but the error ecosystem has multiple established paths. Go’s type expressiveness is weaker, but explicit error returns are highly uniform, so introducing a lightweight classification discipline may actually cost less organizationally.
In any language, implementation should answer the same questions:
- Where does stable error identity live, and can it stay compatible across versions?
- How is the diagnostic chain preserved, and can root cause be lost during cross-layer conversion?
- Where do context and detail live, and do they pollute governance classification?
- Where is boundary policy centralized, and are HTTP/RPC/CLI/log/metric outputs consistent?
- When entering logs, protocols, standard error interfaces, or third-party frameworks, what is preserved, redacted, or dropped?
Rust — Native Fit
Rust satisfies three important properties together: algebraic types (enum) express classification, match provides exhaustive checking, generics parameterize carriers with type safety, and there is no exception mechanism dominating control flow. Errors are returned as values, which naturally composes with structured carriers.
But real-world Rust adoption is not trivial. The ecosystem has long had multiple orientations such as failure, error-chain, anyhow, thiserror, and eyre: some optimized for fast propagation, some for diagnostics, some for domain error definition. Teams still need to decide which layers use structured governance errors, which boundaries allow fast aggregation, and which identities become long-term contracts.
TypeScript — High Affinity
type AppErrorClass =
| { kind: "not_found"; id: string }
| { kind: "system_error" };
Union types and discriminated unions are a natural fit for error classification. Libraries such as neverthrow and Either in fp-ts provide return-value-style error handling. The weakness is runtime type information. Across processes, packages, or JSON boundaries, you still need explicit runtime tags, schemas, or protocol fields to preserve stable identity and classification.
Swift — High Affinity
Algebraic types (enums with associated values) fit error classification well. Result<T, E> is built into Swift 5.0+. The community also has real practice using Result instead of throws.
C# — Needs Mapping into the Exception Ecosystem
Generics are strong and preserve runtime type information, but the ecosystem is exception-centric. There is no native discriminated union, though libraries such as OneOf can simulate it. The more natural mapping is not to force everything into Result, but to use exception hierarchies for classification, inner exceptions for cause chains, and ASP.NET Core middleware for centralized policy.
Java — Needs Mapping into Framework Conventions
Java has generic erasure and an exception-dominant ecosystem. But it has mature cause-chain support, and Spring’s @ControllerAdvice, filters, and interceptors already provide common centralized-boundary patterns. Java 17+ sealed classes, records, and pattern matching also make finite classification much more natural than before.
The key mapping is this: each semantic domain defines its own sealed class, and domains do not inherit from one another. This mirrors Rust’s separate enums such as RepositoryReason and OrderReason. When crossing domains, construct a new domain exception and preserve the old one as the cause, rather than upcasting everything into one shared base type.
// data-access semantic domain (simplified)
public sealed abstract class RepositoryError extends RuntimeException
permits RepositoryError.ConnectionFailed, ... {
public abstract String identity(); // e.g. "repository.connection_failed"
public abstract String category(); // e.g. "system"
public abstract boolean retryable();
private DiagnoseContext ctx;
protected RepositoryError(String detail, Throwable cause, DiagnoseContext ctx) {
super(detail, cause);
this.ctx = ctx;
}
public static final class ConnectionFailed extends RepositoryError {
public ConnectionFailed(String detail, Throwable cause, DiagnoseContext ctx) { super(detail, cause, ctx); }
public String identity() { return "repository.connection_failed"; }
public String category() { return "system"; }
public boolean retryable() { return true; }
}
}
When crossing domains, the service layer constructs OrderError and keeps RepositoryError as the cause:
catch (RepositoryError e) {
throw new OrderError.DependencyUnavailable("submit order failed", e, ctx);
}
The resulting cause chain is OrderError.DependencyUnavailable -> RepositoryError.ConnectionFailed -> SQLException. The boundary layer then routes centrally with @ControllerAdvice: @ExceptionHandler(OrderError.class) chooses status codes and response bodies from identity().
| Concept | Rust | Java |
|---|---|---|
| Semantic-domain classification | enum RepositoryReason | sealed class RepositoryError |
| Contract channel | Reason variants + identity string | Overridden identity() / category() / retryable() on subclasses |
| Diagnostic channel | Detail / context / source inside StructError | getMessage() / context record / getCause() |
| Unified carrier | StructError<R> generic carrier | Not possible — the JLS forbids generic classes from extending Throwable |
| Cross-domain conversion | One source_err(...) call | Explicit try/catch constructing a new exception |
Java’s hard constraint is that the JLS forbids generic classes from extending Throwable, so one generic StructuredError<R> carrier cannot unify all domains. But the architectural idea remains the same: independent semantic domains, stable error identity as the primary key, diagnostics preserved via causes, and centralized routing at the boundary.
C++ — Technically Feasible, No Ecosystem Convention
Templates preserve type information. std::expected (C++23) offers a Result-like mechanism, and libraries such as Boost.Outcome provide richer modeling. But C++ has long supported several parallel error paths: exceptions, error codes, expected, Outcome, custom status types, and more. The model is technically feasible, but organizational unification cost is high.
Go — Requires Stronger Team Discipline
The error interface only requires Error() string by default. Structured information has to be added through custom error types, errors.Is/errors.As, and wrapping. Go is not incapable of error governance. Its default ecosystem path is simply optimized for lightweight wrapping, so governance constraints have to be designed deliberately by the team.
Comparison Across Two Dimensions
| Language | Language expressiveness | Ecosystem adoption cost | Main reason |
|---|---|---|---|
| Rust | High | Medium | Type system fits well, but the error ecosystem has multiple established paths |
| Swift | High | Medium | enum and Result fit naturally, but throws remains an important ecosystem path |
| TypeScript | Medium-high | Medium | Discriminated unions are convenient, but runtime schemas or tags are still needed |
| C# | Medium | Medium | Generics and middleware are strong, but the ecosystem is exception-centric and DUs are simulated |
| Java | Medium | Medium | Cause chains and framework boundaries are mature; sealed classes improve finite classification |
| C++ | Medium-high | High | Type capability is strong, but error handling paths are fragmented and hard to standardize |
| Go | Low-medium | Medium-low | Type expression is weaker, but explicit error returns are uniform and lightweight conventions spread easily |
This table describes implementation friction, not language quality. What really determines error-governance quality is usually not the language itself, but whether the team established stable error identity, diagnostic preservation, boundary policy, and evolution rules.
Conclusion
Error handling is the boundary between a prototype and an industrial system. A prototype only proves that the happy path runs. An industrial system must remain operable, diagnosable, recoverable, and evolvable under input drift, dependency degradation, configuration drift, abnormal data, evolving rules, and unstable runtime conditions.
The Wukong Error Governance Model proposed here splits failure information into two channels:
- Contract channel: stable error identity, stable classification, category, retryable, and exposure level, used for caller decisions, protocol output, monitoring and alerting, SLA accounting, and long-term compatibility.
- Diagnostic channel: cause chain, context, detail, and lower-level errors, used for troubleshooting, debugging, rule repair, runtime observation, and system evolution.
These two channels resolve the core tension: errors must converge at the governance level, or automated decisions become impossible; errors must remain faithful at the diagnostic level, or root causes become invisible. A mature error system cannot stop at “pretty wrapping”, nor can it rely on language mechanisms alone. It must explicitly define how stable identities evolve, how diagnostic chains survive across layers, how boundary policy is centralized, and what is preserved or discarded when bridging into external ecosystems.
In Rust, orion-error turns this model into reusable infrastructure: StructError<R> carries contract and diagnostic information, domain reasons provide stable classification contracts, source chains and context preserve diagnostic paths, and exposure/report/interop provide boundary output and bridging. orion-error/examples/order_case.rs gives a small runnable example. WarpParse provides industrial validation: in a high-throughput ETL system, error governance directly affects rule-development experience, runtime observability, boundary output quality, and long-term operations cost.
Error governance is not a side effect of exception syntax, nor a local optimization of log formatting. It is one of the information architectures of an industrial system. Only when failure paths also have stable classification, complete diagnostics, centralized output, and evolvable contracts does a system truly move from “it runs” to “it can keep running”.
Error Governance and AI Programming: orion-error’s Structured Path
Why Error Handling Becomes a Bottleneck
In industrial software, fault localization, bug fixing, and avoidable rework consume a large amount of engineering effort. Studies and industry reports commonly place finding/fixing bugs and avoidable rework in the 40-60% range; this does not mean that “error-handling code itself is 40-60% of the code or effort.” Error governance is concerned with what happens when failure occurs: whether classification is stable, context is preserved, boundary output is consistent, and the diagnostic path is complete.
Useful data points:
- Hamill and Goseva-Popstojanova, in a NASA fault-fix effort study, cite a Cambridge University report that developers spend about 50% of their time finding and fixing bugs; the same passage cites Boehm/Basili’s 40-50% effort on avoidable rework.
- Capers Jones, in an ASQ / Software Quality Professional article, summarizes that finding and fixing bugs often exceeds 60% of total software effort.
- Cabral and Marques, in a field study of 32 Java/.NET applications, show that exception-handling code itself is much smaller: about 5% on average for Java, about 3% on average for .NET, and up to about 7%.
So this article is not about “writing more error-handling code.” It is about using structured error mechanisms to reduce the cost of fault localization, boundary governance, and cross-layer diagnostics. Rust has no exception mechanism and no try-catch; every ? is a propagation decision. Without structure, those decisions become harder to govern as the codebase grows.
The problem is not simply “more code.” The problem is that these decisions lack structure. A typical error-handling decision tree looks like this:
What error can this call return? -> Should I intercept it or propagate it?
If I intercept it, what category should the new error use?
Should I preserve the original error?
Am I at a boundary?
What is the correct format for the caller?
Every layer repeats these decisions, and different developers often answer them differently. The larger the codebase, the more fragmented error handling becomes.
The Core Tension
Every error-governance approach has to handle one tension:
- Convergence: concrete technical errors need to be abstracted into a small, stable set of upper-layer categories, otherwise callers cannot govern retry, fallback, alerting, or user-facing output.
- Diagnostics: the convergence process must not lose the information needed for troubleshooting.
In code, the tension often looks like this:
#![allow(unused)]
fn main() {
// Converges, but loses diagnostics.
Err(AppError::SystemError)
// Preserves some detail, but gives up governance.
Err(anyhow::format_err!("concrete error: {e}"))
}
orion-error handles this by separating the two dimensions: classification converges into a reason, while diagnostics stay in the source chain and context.
What This Means for AI Programming
Structure as Prompt
AI models, especially LLMs, generate code by following patterns. When error handling is structured, the model has a clearer pattern to follow.
Unstructured pattern:
#![allow(unused)]
fn main() {
// The model has to guess which error type matters here.
fn load_config() -> Result<Config, Box<dyn Error>> {
let text = std::fs::read_to_string("config.toml")?;
let cfg = toml::from_str(&text)?;
Ok(cfg)
}
}
The model has to infer what is inside Box<dyn Error> and how callers should handle it.
Structured pattern:
fn load_config() -> Result<Config, StructError<ConfigReason>> {
let text = std::fs::read_to_string("config.toml")
.source_err(ConfigReason::ReadFailed, "read config file")
.doing("load config")?;
let cfg = toml::from_str(&text)
.source_err(ConfigReason::ParseFailed, "parse config")
.doing("parse config")?;
Ok(cfg)
}
The reason variant is explicit, so both the model and the developer choose from a constrained classification space. The source_err + doing pattern is easier to generate, inspect, and review than free-form string wrapping.
Constrained Classification Space
UnifiedReason provides built-in categories such as validation, system, network, timeout, and config. Common technical failures get default categories first; domain-specific failures can then add project-specific reasons. This means many error paths do not need a new classification scheme from scratch.
For project-specific reasons, the transparent-variant pattern is stable:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, OrionError)]
enum AppReason {
#[orion_error(identity = "biz.xxx")]
SpecificError,
#[orion_error(transparent)]
General(UnifiedReason),
}
}
This gives code generation a stable template: business variants need a stable identity, and common failures reuse UnifiedReason through a transparent variant. The actual business semantics still need human review.
Boundary Projection Becomes Centralized
Protocol-boundary output is one of the easiest places for generated code to get things wrong:
- exposing internal detail directly to users
- choosing the wrong HTTP status
- producing inconsistent shapes across protocols
orion-error’s ExposurePolicy centralizes that decision:
#![allow(unused)]
fn main() {
impl ExposurePolicy for MyPolicy {
fn http_status(&self, identity: &ErrorIdentity) -> u16 {
match identity.code.as_str() {
"biz.not_found" => 404,
"biz.invalid" => 400,
_ => 500,
}
}
// visibility, retryable, and hints have defaults.
}
}
Boundary projection no longer has to be hand-written in every handler. Generated code can follow one policy invocation pattern; exact status codes, visibility, retryability, and hints remain team-defined and reviewable.
Test Paths Become Easier to Infer
Structured errors also make test assertions more direct:
#![allow(unused)]
fn main() {
// Easier for generated code to produce, and easier for reviewers to check.
let err = function_that_fails().unwrap_err();
assert_err_identity(&err, "biz.not_found", ErrorCategory::Biz);
assert_err_operation(&err, "load config");
}
Instead of:
#![allow(unused)]
fn main() {
// Requires guessing exact display text.
let err = function_that_fails().unwrap_err();
assert!(err.to_string().contains("not found"));
}
Deeper Impact on AI Programming
From Code Generation to Decision Structuring
Most AI programming tools operate at the level of generating code snippets. Structured error governance turns part of the decision into data: when an error path is represented by an enum variant rather than a free-form string, the model is no longer only writing prose; it is choosing from a constrained set. That choice can still be wrong, but it is easier to constrain with types, tests, and review.
Error-Path Coverage
Generated code often underinvests in error paths. Error paths appear less frequently than happy paths in training data and examples. A structured system turns error paths into repeatable patterns such as source_err + doing + conv_err. Once the model recognizes that a call can fail, there is a clearer API path to follow, and the result is easier to cover with tests.
Cross-Layer Consistency
In multi-person codebases, different developers often handle the same kind of failure differently. Generated code can make this worse because the model may produce different styles in different contexts. Structured governance moves consistency requirements into shared definitions: reason enums and exposure policies. Both human-written and generated code then work under the same constraints.
LLMs and Error Governance
| Traditional error handling | Constraint quality | Structured error governance |
|---|---|---|
| Free-form strings | Hard to constrain | Enum variants |
| Ad-hoc classification | Hard to review | Fixed classification space |
| Local error-message decisions | Inconsistent | Repeatable API patterns |
| Boundary output decided per handler | Fragmented | Centralized policy |
The reliability benefit is not that the model becomes “smart enough” to understand every failure. The benefit is that the task is shifted away from free-form generation and toward choosing from constrained options that can be checked by types, tests, and review.
Limits
- Up-front modeling cost remains. Reason categories and exposure policies still require human design.
- Small projects may not need this. A short script or prototype is often better served by
thiserrororanyhow. - Business semantics are still hard. Choosing between
validation_errorandbusiness_errorstill requires domain judgment.
Summary
orion-error’s structured error model fits AI-assisted programming because both benefit from the same principle: turn implicit decisions into explicit structure. Implicit decisions rely on context interpretation, where both humans and models make mistakes. Explicit structure gives the system enums, source chains, context, policies, and tests.
This is one possible direction for Rust error governance: not a smarter error type by itself, but a system that makes error-handling decisions more predictable, enumerable, and reviewable for both humans and AI-assisted tools.
Design Constraints
Cross-StructError From Conversion: Orphan Rule Limitation
Problem
Cross-layer error conversion (StructError<ParseReason> → StructError<OrderReason>) requires an explicit .conv_err() call. A blanket From to make ? work automatically is blocked by Rust’s orphan rule.
#![allow(unused)]
fn main() {
// Desired but impossible:
fn place_order() -> Result<OrderDraft, StructError<OrderReason>> {
let draft = parse_order()?; // expected auto From<ParseError> → OrderError
Ok(draft)
}
// Actual:
fn place_order() -> Result<OrderDraft, StructError<OrderReason>> {
let draft = parse_order().conv_err()?; // explicit conversion
Ok(draft)
}
}
Root Cause
Rust’s orphan rule prohibits implementing From<Foreign<Local>> for Foreign<Local2> from a downstream crate:
#![allow(unused)]
fn main() {
impl From<orion_error::StructError<UserLocalReason>> // Foreign<Local>
for orion_error::StructError<UserLocalReason2> // Foreign<Local2>
}
From= std trait (foreign)StructError= foreign type (from orion-error)- Even though
LocalReasonandLocalReason2are local types
The orphan rule requires at least one local anchor in either the trait or the implementing type. Neither From nor StructError<_> satisfy this when the impl is written in a downstream crate.
Attempted Workarounds
| Approach | Result |
|---|---|
Direct impl From<StructError<A>> for StructError<B> in downstream | ❌ orphan rule |
Derive attribute upcast_from(SubReason) on target type | ❌ orphan rule |
Derive attribute upcast_to(MainReason) on source type | ❌ orphan rule |
Make ? auto-convert across reasons | ❌ can’t use From |
newtype struct AppError(StructError<T>) | ✅ works, but changes every return type |
Conclusion
.conv_err()` is the recommended path. The newtype wrapper can technically bypass the orphan rule but the cost (wrapping every function return type) far outweighs the benefit of saving one explicit call. Rust’s orphan rule is a core guarantee for ecosystem compatibility and is unlikely to change for this use case in the foreseeable future.
orion-error 0.8.0 Architecture
This document describes the ideal design architecture of orion-error 0.8.0: the design constraints behind the public API, the core error flow, and the governance goals. Struct snippets are conceptual models, not exact source snapshots; the implementation in src/ remains the source of truth for precise fields.
The Problem
In large Rust services, error handling faces five unmet needs:
- Convergence without loss. Lower-layer technical errors must be abstracted into upper-layer stable semantics — but the original cause (source chain, detail, context) must remain available for diagnostics.
- Cross-layer propagation. An error passes through multiple layers (handler → service → repository → database). Each layer needs to attach its own context without discarding what came before.
- Boundary projection. The same error must be presented differently to different audiences: end users (safe message), operators (component + retryability), protocol clients (stable code + structure), and developers (full chain).
- Governable identity. Errors need stable, machine-readable identities that survive refactoring, across HTTP/RPC/log/CLI boundaries.
- Structured carrier. Errors carry detail, source chain, operation context, and metadata — all as structured fields, not string concatenation.
Existing libraries solve a subset:
| Library | Strengths | Leaves open |
|---|---|---|
thiserror | Local error enum modeling, Display + From generation | Cross-layer propagation, context attachment, protocol projection |
anyhow | Application-level error unification, context() | Stable identity, protocol output, fine-grained category routing |
color-eyre | Rich diagnostic reports | Same as anyhow — no protocol or identity layer |
orion-error targets the gap: governance at scale — what happens when errors travel through 3–5 layers and must emerge at a protocol boundary with stable structure.
Core Insight: Reason/Carrier Separation
The central design decision: separate the error’s semantic classification (reason) from its propagation mechanism (carrier).
#![allow(unused)]
fn main() {
// Reason = what kind of error
enum AppReason {
InvalidInput,
OrderNotFound,
General(UnifiedReason),
}
// Carrier = how it propagates
let err: StructError<AppReason> = AppReason::OrderNotFound
.to_err()
.with_detail("order #42 not found")
.with_source(db_error)
.with_context(ctx);
}
Why separate?
If reason and carrier are combined — as in typical thiserror enum usage — every piece of runtime machinery (context attachment, source tracking, protocol projection) must be reimplemented for each enum. The carrier (StructError<T>) implements it once.
The reason stays thin — a DomainReason marker trait requiring only PartialEq + Display + Debug + Send + Sync + 'static. The carrier does the rest.
#![allow(unused)]
fn main() {
pub trait DomainReason: PartialEq + Display + Debug + Send + Sync + 'static {}
}
| Constraint | Reason |
|---|---|
Display + Debug | Errors must be printable for diagnostics and logging. |
PartialEq | Enables assertion in tests. |
Send + Sync | Required for StructError to cross async task boundaries. |
'static | Enables type erasure via dyn Error and storage in SourceFrame. |
Error Flow
raw std error ──→ .source_err(reason, detail) ──→ first entry into structured system
│
conv_err()
(reason remap)
│
report / exposure / display_chain
1. Entry: source_err(reason, detail)
The unified entry point. Works for both raw std::error::Error and already-structured StructError sources:
#![allow(unused)]
fn main() {
let result = std::fs::read_to_string("config.toml")
.source_err(AppReason::system_error(), "read config failed")?;
}
- The raw error is stored as a source frame, preserving its
DisplayandDebugoutput. - The
reasonbecomes the error’s stable classification. - The
detailprovides layer-specific explanation.
2. Cross-layer conversion: conv_err()
When the upstream error is already StructError<R1> and only the reason type needs to change:
#![allow(unused)]
fn main() {
fn upper_layer() -> Result<(), StructError<UpperReason>> {
lower_layer().conv_err()?;
Ok(())
}
}
Requires UpperReason: From<LowerReason>. All detail, context, source chain, and metadata survive the conversion.
A blanket From<StructError<R1>> for StructError<R2> is blocked by Rust’s orphan rule (neither From nor StructError are local to the user’s crate). An explicit trait method is the intended path.
3. First entry vs. cross-layer distinction
| Method | Semantics | Source preservation |
|---|---|---|
source_err(reason, detail) | Creates a new semantic boundary | Wraps as unstructured or structured source |
conv_err() | Only remaps reason type | Preserves all detail, context, source, metadata |
Core Types
StructError<T: DomainReason>
The universal runtime carrier. Conceptually, it stores the reason and the runtime propagation state behind a small carrier:
#![allow(unused)]
fn main() {
pub struct StructError<T: DomainReason> {
imp: Box<StructErrorImpl<T>>,
}
}
Box is used to keep StructError small (pointer-sized), as it is expected to be returned through Result frequently.
StructErrorImpl<T>
Holds the data needed for error propagation. Simplified model:
#![allow(unused)]
fn main() {
struct StructErrorImpl<T> {
reason: T,
detail: Option<String>,
position: Option<String>,
context: Option<Arc<Vec<OperationContext>>>,
source_payload: Option<InternalSourcePayload>,
}
}
Key decisions:
context: Option<Arc<Vec<...>>>— lazy allocation: no heap allocation for errors without context.Arcenables cheap clone of the context chain.Box<StructErrorImpl<T>>—StructErroritself stays small (one pointer), minimizingResultsize.
OperationContext
Carries runtime context. Conceptually it describes what the current layer was doing, what it was accessing, which diagnostic fields were attached, and whether operation logging should be emitted:
#![allow(unused)]
fn main() {
pub struct OperationContext {
action: Option<String>,
locator: Option<String>,
fields: Vec<(String, String)>,
path: Vec<String>,
metadata: ErrorMetadata,
result: OperationResult,
exit_log: bool,
}
}
doing(...)— what operation was running (“load config”, “validate order”)at(...)— what resource was being accessed (“config.toml”, “order #42”)with_field(...)— human-readable diagnostic fieldswith_meta(...)— machine-oriented metadata (serialization only)success()/fail()/cancel()and logging helpers — record operation outcome with little call-site code
SourceFrame
Represents one element in the source chain. Simplified model:
#![allow(unused)]
fn main() {
pub struct SourceFrame {
pub index: usize,
pub message: SmolStr,
pub display: Option<SmolStr>,
pub debug: Option<SmolStr>,
pub type_name: Option<SmolStr>,
pub error_code: Option<i32>,
pub reason: Option<SmolStr>,
pub path: Option<SmolStr>,
pub detail: Option<SmolStr>,
pub metadata: ErrorMetadata,
pub is_root_cause: bool,
pub context_fields: Vec<(SmolStr, SmolStr)>,
}
}
String fields use SmolStr (zero-allocation for short strings) for fast clone in source chain traversal.
Consumption Paths
Three independent consumption paths, each returning a different view of the same error:
report() → DiagnosticReport
Human-readable diagnostics. Only requires DomainReason.
#![allow(unused)]
fn main() {
let report: DiagnosticReport = err.report();
println!("{}", report.render());
}
Output:
reason: system error
detail: read config failed
context:
[0] place_order [user_id: 42]
exposure(&policy) → ErrorProtocolSnapshot
Protocol-boundary projection. Requires ErrorIdentityProvider (provided by #[derive(OrionError)]).
#![allow(unused)]
fn main() {
let proto = err.exposure(&MyPolicy);
let http_json = proto.to_http_error_json()?; // {"status": 500, "code": "sys.io_error", ...}
let log_json = proto.to_log_error_json()?; // full structured log output
let cli_json = proto.to_cli_error_json()?; // operator-facing summary
let rpc_json = proto.to_rpc_error_json()?; // upstream-facing protocol
}
The ExposurePolicy trait controls the decision:
| Method | Default | Override frequency |
|---|---|---|
http_status() | 500 | Most common |
visibility() | Internal (Biz → Public) | Common |
retryable() | false | Occasional |
default_hints() | [] | Rare |
Visibility controls which error information reaches the external caller:
Public | Internal | |
|---|---|---|
HTTP message | Uses detail | Uses reason (hides detail) |
RPC detail | Exposed | null |
display_chain() → formatted string
Source chain expansion for debugging. No trait requirement beyond DomainReason.
system error
-> Info: read config failed
-> Caused by:
1. outer source
2. inner source
identity_snapshot() → ErrorIdentity
Stable identity inspection without protocol projection:
#![allow(unused)]
fn main() {
let id = err.identity_snapshot();
assert_eq!(id.code, "sys.io_error");
}
UnifiedReason
UnifiedReason is the built-in universal reason classification. It covers the common error categories found in most services:
| Category | Code range | Examples |
|---|---|---|
| Business | 100-105 | validation_error, not_found |
| Infrastructure | 200-204 | system_error, network_error, timeout |
| Configuration | 300-301 | core_conf, external_error |
Designed as a catch-all for errors that don’t need a domain-specific reason. Domain enums typically include it as a transparent variant:
#![allow(unused)]
fn main() {
#[derive(OrionError)]
enum AppReason {
#[orion_error(identity = "biz.invalid")]
Invalid,
#[orion_error(transparent)]
General(UnifiedReason),
}
}
The #[orion_error(transparent)] attribute delegates stable_code(), error_category(), and Display to the inner UnifiedReason.
Explicit StdError Bridge
StructError<T> does not implement std::error::Error. This is intentional:
- Prevents accidental type erasure. If
StructErrorimplementedStdError, calling code could unintentionally erase the reason type with.into()orBox<dyn Error>, losing structured identity. - Keeps boundary crossing explicit. When interop with
StdErrorecosystem is needed, the conversion is explicit:
#![allow(unused)]
fn main() {
let std_ref: StdStructRef<'_, AppReason> = err.as_std();
let owned: OwnedStdStructError<AppReason> = err.into_std();
let dyn_owned: OwnedDynStdStructError = err.into_dyn_std();
}
Derive Macro
#[derive(OrionError)] generates the core trait implementations:
| Trait | Purpose | Source |
|---|---|---|
Display | Human-readable error message | From message attribute, or auto-generated from identity |
DomainReason | Carrier compatibility | Empty marker impl |
ErrorCode | Legacy numeric compatibility code | From code attribute, or default 500 |
ErrorIdentityProvider | Stable code + category | From identity and category attributes |
Attributes
| Attribute | Required? | Generates |
|---|---|---|
identity = "biz.foo" | Yes (unless transparent) | stable_code() returns "biz.foo" |
category = Biz | No (inferred from identity prefix) | error_category() returns specified category |
transparent | Alternative to identity | Delegates all methods to inner type |
message = "..." | No (auto-generated from identity) | Custom Display output |
code = ... | No (default 500) | Legacy numeric error_code() |
Protocol outputs, log aggregation, and monitoring should use ErrorIdentity.code / stable_code() as the stable identity. ErrorCode is a numeric compatibility layer, not the recommended primary key for new external contracts.
Transparent Variant Constructor Delegation
When an enum has a transparent variant wrapping UnifiedReason, all UnifiedReason constructors are generated as methods on the enum:
#![allow(unused)]
fn main() {
#[derive(OrionError)]
enum AppReason {
#[orion_error(transparent)]
General(UnifiedReason),
}
// Generated automatically:
AppReason::system_error() // instead of AppReason::General(UnifiedReason::system_error())
AppReason::validation_error()
AppReason::not_found_error()
}
Third-Party Error Integration
Third-party error types enter the structured system through source_err(). Supported types:
| Type | Feature | Mechanism |
|---|---|---|
std::io::Error | Built-in (no feature) | Direct UnstructuredSource impl |
serde_json::Error | serde_json | Direct UnstructuredSource impl |
anyhow::Error | anyhow | Attempts structured recovery, falls back to unstructured |
toml::de::Error | toml | Direct UnstructuredSource impl |
| Custom types | — | Opt-in via RawStdError + raw_source() |
The opt-in design (RawStdError) prevents silent structured-to-unstructured downgrade:
impl RawStdError for MyError {}
let result: Result<(), MyError> = Err(MyError);
let err = result
.map_err(raw_source)
.source_err(AppReason::system_error(), "my operation failed")?;
Design Evolution
Naming: UvsReason → CommonReason → UnifiedReason
The built-in reason type went through three names:
UvsReason— original name, meaning unclear to new usersCommonReason— intermediate rename, but “common” sounded like “ordinary” rather than “unified”UnifiedReason— final name, reflecting its role: concrete errors converge (are unified) into this classification
The deprecated pub type UvsReason = UnifiedReason; alias is retained for migration compatibility.
Variant name: Uvs → General
The transparent variant in domain enums was renamed to General:
#![allow(unused)]
fn main() {
// Before
Uvs(UnifiedReason),
// After
General(UnifiedReason),
}
General communicates “this is the catch-all for non-domain-specific errors” more clearly than the opaque Uvs.
Consumption path convergence: snapshot is not the main path
The orion-error 0.8.0 architecture centers on report(), exposure(), display_chain(), and identity_snapshot().
Stable machine identity is provided by identity_snapshot(). HTTP/RPC/CLI/log boundary output is handled by exposure() and ErrorProtocolSnapshot. Human diagnostics are handled by report(). This avoids making users learn a separate snapshot type hierarchy while preserving stable identity and protocol projection.
API naming: exposure
Consistency with report(). The shorter name reflects the intent: expose this error at a boundary according to a policy, without making users first learn an internal snapshot model.
Feature Gating
| Feature | Enables | Default |
|---|---|---|
derive | Proc-macro derive macros (OrionError, ErrorCode, ErrorIdentityProvider) | Yes |
log | OperationContext log methods (ctx.info(), .debug(), .warn(), .error()) and Drop auto-logging | Yes |
tracing | Tracing integration (preferred over log when both are enabled) | No |
serde | Serialize / Deserialize on core types | No |
serde_json | Protocol JSON projection methods (to_http_error_json(), etc.) | No |
anyhow | anyhow::Error interop with structured source recovery | No |
toml | toml::de::Error / toml::ser::Error interop | No |
Project Structure
src/
lib.rs — Crate root, re-exports, layered modules
core/
domain.rs — DomainReason trait
reason.rs — ErrorCode trait, ErrorCategory enum, ErrorIdentityProvider trait
universal.rs — UnifiedReason enum (built-in classification)
error/
carrier.rs — StructError<T>, StructErrorImpl<T>
builder.rs — StructErrorBuilder<T>
identity.rs — ErrorIdentity struct, identity_snapshot()
source_chain.rs — SourceFrame, source payload infrastructure
std_bridge.rs — StdStructRef, OwnedStdStructError, OwnedDynStdStructError
context/
types.rs — OperationContext, OperationScope
convert.rs — ContextAdd trait
metadata.rs — ErrorMetadata, MetadataValue
report/
diagnostic.rs — DiagnosticReport, redaction
protocol.rs — ErrorProtocolSnapshot, ExposurePolicy, Visibility
traits/
contextual.rs — ErrorWith trait
conversion.rs — ConvErr, ConvStructError, ToStructError
source_err.rs — SourceErr, RawStdError, RawSource
testing.rs — Test assertion helpers
docs/
en/book.toml — English mdBook config
en/src/ — English mdBook source
zh/book.toml — Chinese mdBook config
zh/src/ — Chinese mdBook source
index.html — Language selector copied to site root
site/
en/ — Generated English book
zh/ — Generated Chinese book
Constraints
Orphan Rule
A blanket From<StructError<R1>> for StructError<R2> cannot be provided — neither From (std) nor StructError (this crate) are local to the user’s crate. The explicit conv_err() method is the intended path:
#![allow(unused)]
fn main() {
let result: Result<(), StructError<UpperReason>> = lower_result.conv_err()?;
}
Send + Sync
DomainReason requires Send + Sync. This is necessary for StructError to be used across async task boundaries and captured by anyhow::Error or Box<dyn Error>. For single-threaded use, this adds a small but unavoidable constraint.
0.8 API Contract
更新时间:2026-05-01
本文档固定 orion-error 0.8.x 的公开 API 契约。它描述当前承诺的主路径、
分层模块、feature-gated API、稳定快照和协议 JSON 边界。
如果本文档与 src/、tests/、examples/ 冲突,以代码和测试为准,并同步修正
本文档。
1. Root Exports
crate root 只承诺保留最小主路径入口:
StructErrorOperationContextUnifiedReason- derive feature 开启时的 derive 宏:
OrionErrorErrorCodeErrorIdentityProvider
root 不承诺重新暴露 reason trait、protocol type、report type、 interop type 或测试 helper。它们的正式归属在分层模块中。
ErrorCode 作为 derive 宏名字和兼容数值码能力存在;面向外部协议、日志、快照和
监控的稳定机器主键是 ErrorIdentity.code / stable_code()。
2. Prelude
orion_error::prelude::* 是新业务代码的推荐导入入口,当前承诺包含:
StructErrorErrorWithConvErrSourceErr- derive feature 开启时的
OrionError
prelude 只放主传播路径需要的最小集合。协议、report、interop 和测试 helper
应从各自分层模块导入。
3. Layered Modules
分层模块是非 root 类型和 trait 的正式归属。
runtime运行时传播载体和上下文:StructError、StructErrorBuilder、OperationContext、OperationScope、WithContext、ErrorMetadata。runtime::sourcesource 观察模型:SourceFrame、SourcePayloadKind、SourcePayloadRef。conversion主路径转换 trait:SourceErr、ErrorWith、ConvErr、ConvStructError、ToStructError。reasonreason trait、分类和内置 reason:DomainReason、ErrorCode、ErrorIdentityProvider、ErrorCategory、UnifiedReason、ConfErrReason。report人类诊断与 redaction:DiagnosticReport、RedactPolicy。protocol协议/exposure 投影:DefaultExposurePolicy、ExposurePolicy、ExposureDecision、ErrorProtocolSnapshot、Visibility。interop标准错误生态互操作:StdStructRef、OwnedStdStructError、OwnedDynStdStructError、raw_source、RawSource、RawStdError。cliCLI 输出辅助:print_error(...)。dev::testing测试断言 helper,不属于业务主路径。dev::prelude协议/schema 测试和迁移验证用宽导入,不属于业务主路径。
bridge::* 不是 0.8 当前公开分层入口;标准错误生态边界统一称为 interop。
4. Source Attachment
source 挂载的推荐主路径是:
StructError::with_source(...)StructErrorBuilder::source(...)
调用者不需要区分 source 是普通 StdError 还是下层 StructError<_>;路由由 crate
内部完成。
以下 API 保留为维护旧代码、测试 source 分类或调试 auto-routing 的底层入口, 不作为教程和新业务代码的默认推荐:
with_std_source(...)with_struct_source(...)StructErrorBuilder::source_std(...)StructErrorBuilder::source_struct(...)
5. Error Flow
当前推荐的错误流转决策:
- 上游是普通错误,第一次进入结构化体系:
source_err(reason, detail)。 - 上游是
StructError<R1>,当前层只改变 reason 类型:conv_err()。 - 上游是
StructError<R1>,当前层建立新的语义边界:统一使用source_err(reason, detail)。 - 需要挂载 cause 到已有
StructError:with_source(...)或builder.source(...)。 - 需要进入
std::error::Error生态:as_std()、into_std()、into_boxed_std()、into_dyn_std()。
旧 owe(...) / owe_*() / err_wrap(...) / want(...) / with(...) 不属于
0.8 当前主 API。
6. Feature-Gated API
默认 feature:
logderive
可选 feature:
derive开启 root derive 宏 re-export,并启用#[derive(OrionError)]等宏。log开启log集成和OperationContextdrop 日志路径。tracing开启tracing集成;同时启用log和tracing时,drop 日志优先走tracing分支。serde开启主要结构的Serialize/Deserialize支持。serde_json开启 stable snapshot 和 protocol JSON projection 方法:to_stable_snapshot_json()、to_http_error_json()、to_cli_error_json()、to_log_error_json()、to_rpc_error_json()。anyhow开启anyhow::Error进入source_err(...)的适配,并支持官方 dyn interop wrapper 的结构化 source 恢复。toml开启toml::de::Error/toml::ser::Error进入source_err(...)的适配。
文档示例如果依赖 feature,应显式说明或用测试门控覆盖。
7. Protocol JSON
协议投影主入口:
identity_snapshot()exposure(...)into_exposure(...)ErrorProtocolSnapshot::to_http_error_json()ErrorProtocolSnapshot::to_cli_error_json()ErrorProtocolSnapshot::to_log_error_json()ErrorProtocolSnapshot::to_rpc_error_json()
ErrorProtocolSnapshot 的稳定输入由三部分组成:
identitydecision- embedded
DiagnosticReport
稳定承诺:
identity.code是协议、日志、监控、测试断言的稳定机器主键。identity.category是稳定分类。ExposureDecision的字段名和含义稳定:http_status、visibility、default_hints、retryable。- HTTP / CLI / log / RPC projection 的顶层用途稳定。
不承诺:
render_user_debug()的文本格式不是机器协议。- JSON 中用于人工排障的
summary/rendered_detail文本不作为精确稳定 schema。 source_frames的debug、display、type_name等诊断字段可能随实现调整。- 未在
docs/protocol-contract.md和测试中锁定的内部 helper 字段不作为公共协议。
8. Report And Redaction
DiagnosticReport 面向人类诊断,不要求 reason 实现 ErrorIdentityProvider。
主入口:
report()into_report()render()render_redacted(...)report_redacted(...)
redaction 适用于 report、protocol projection 和 source frame 诊断视图。机器协议中的 稳定 code/category 不应被当成自然语言 detail 处理。
9. Compatibility Policy
0.8 当前策略:
- 保持主路径稳定。
- 保持 observation surface 可用,但不把它们放进 quick start。
- 保持
dev::*面向测试和迁移验证。 - 不恢复 0.6 / 0.7 legacy API 作为 root 或 prelude 主路径。
- archive 文档保留历史语境,不代表当前推荐用法。
Compatibility & Migration
API Renames
| Old Name | New Name | Description |
|---|---|---|
into_as(reason, detail) | source_err(reason, detail) | Unified error entry point |
wrap_as(reason, detail) | source_err(reason, detail) | Same, unified |
upcast() | conv_err() | Cross-layer reason conversion |
err_conv() | conv_err() | Same |
Old names are no longer available. If you see a compilation error, replace with the new name — parameters are unchanged.
0.7 → 0.8 Migration
0.8 removed the following 0.7 compatibility paths:
compat_prelude/compat_traitsmodulesErrorOwefamily of traits (owe()/owe_source()etc.)ErrorWithmethodswant()/attach_context()/with()OperationContext::with_want()
Public Surface Grading
更新时间:2026-04-30
本文档基于当前 orion-error 0.8.x 代码,给公开 surface 做分级整理。
目标不是继续删除 API,而是固定下面四类边界:
- 主路径 API
- 观察面 API
- 测试 / 适配器入口
- 兼容保留 API
如果后续要继续提升到 9+,这份分级表应作为 public API review 的参考基线。
1. 主路径 API
这些 API 构成当前推荐主路径,应长期稳定保留:
-
StructError<R> -
OperationContext::doing(...) -
OperationContext::at(...) -
with_context(...) -
with_source(...) -
StructErrorBuilder::source(...) -
report() -
render() -
identity_snapshot() -
exposure(...) -
source_err(...) -
conv_err() -
cli::print_error(...)
特征:
- README / tutorial / docs 主文档会优先描述它们
- 新业务代码默认优先使用它们
- 不应再为相同任务引入并列“主路径”
2. 观察面 API
这些 API 有明确价值,但更适合诊断、测试、观测、辅助断言:
source_frames()root_cause_frame()source_payload()source_payload_kind()action_main()locator_main()target_path()render_redacted(...)render_user_debug()render_user_debug_redacted(...)
特征:
- 它们不是主传播 / 主构造入口
- 应在文档里明确属于 observation / diagnostics surface
- 不应在 quick start 中抢占主路径叙事位
3. 测试 / 适配器入口
这些 API 主要服务测试、schema 校验、中间层适配或协议拼装:
ErrorProtocolSnapshot::from_report_skeleton(...)dev::prelude::*dev::testing::*interop::*runtime::source::*
特征:
- 允许公开存在
- 但应明确不是正常业务主路径
- 文档中应把它们描述成 secondary path
- 其中
dev::prelude::*应保持在对象级检查面,不再扩成 frame 级宽导出
4. 兼容保留 API
这些字段或投影仍然有现实兼容价值,但名字本身带有历史包袱:
- context / snapshot frame 中的
target
当前统一口径:
- runtime 主语义应优先理解为
action/locator/ path segments target继续存在,主要作为 compat projectionpath是稳定导出的路径投影
5. 当前结论
当前 orion-error 的主要结构问题已经不是“大量兼容 API 混在主路径里”,而是:
- 少量 compat projection 字段仍公开存在
- 少量 observation / secondary path 仍需要靠文档说明降级
这意味着下一阶段如果要继续打磨:
- 不应再优先做内部模型重写
- 应优先做 public surface review 与分级锁定
6. 后续建议
如果进入下一个版本线,可以按这个顺序评估:
- 是否继续保留 frame 中的
target - 是否需要继续缩窄
dev::prelude::* - 是否要给 observation / adapter API 增加更明确的模块或命名提示
在没有明确版本策略前,当前更合理的做法是:
- 保持主路径稳定
- 保持观察面可用
- 用文档和测试锁住 secondary / compat 的定位
Release Checklist
Steps for publishing 0.8.x.
Pre-release
- Confirm
CHANGELOG.md,README.md,docs/are in sync with current code. - Confirm
orion-errorandorion-error-derivehave matching versions. - Run:
cargo fmt --allcargo clippy --all-targets --all-features -- -D warningscargo test --all-features -- --test-threads=1cargo test --doc --no-default-featuresbash scripts/check-feature-matrix.shbash scripts/check-doc-code.shbash scripts/check-v3-policy.sh
- In a networked environment:
cargo package --manifest-path orion-error-derive/Cargo.tomlcargo packagecargo publish --manifest-path orion-error-derive/Cargo.toml --dry-runcargo publish --dry-run
Pre-release Boundary Checks
src/lib.rsroot surface compile-fail doctests still pass.tests/test_layered_exports.rs,tests/test_versioned_namespaces.rsstill cover current layered export boundaries.- README / tutorial / reason identity guide code blocks match current source.
- New or migrated public surface: add tests / compile guards first, then update README / docs, then update changelog.
Publishing Order
- Publish
orion-error-derivefirst. - Wait for crates.io index propagation.
- Publish
orion-error.
The GitHub Actions release workflow is already configured in this order.
Post-release
- Confirm both crates are visible on crates.io.
- Confirm the default
derivefeature correctly resolvesorion-error-derive. - Confirm docs.rs pages generate:
orion-errororion-error-derive
StructError 堆分配性能基线
硬件:Apple M4 (Mac mini, 2024)
系统:macOS 15, aarch64
Rust:stable 2025-04-30
运行:cargo test --release --test perf_context_allocation -- --nocapture
测试场景
每个场景重复 500,000 次,测量总耗时后计算均值和吞吐量。
| 场景 | 构造内容 |
|---|---|
bare | StructError::from(UnifiedReason::validation_error()) |
with-detail | 同上 + .with_detail("port number out of range") |
with-detail+pos | 同上 + .with_position("src/config.rs:42") |
builder | builder API 等同 with-detail+pos |
结果
Before:context: Arc<Vec<OperationContext>>
| 场景 | 吞吐量 | ns/iter | 总耗时 |
|---|---|---|---|
| bare | 28 M/s | 35.9 | 17 ms |
| with-detail | 19 M/s | 53.3 | 26 ms |
| with-detail+pos | 15 M/s | 64.6 | 32 ms |
| builder | 15 M/s | 65.1 | 32 ms |
After:context: Option<Arc<Vec<OperationContext>>>
| 场景 | 吞吐量 | ns/iter | 总耗时 | 提升 |
|---|---|---|---|---|
| bare | 55 M/s | 18.2 | 9 ms | +97% |
| with-detail | 27 M/s | 36.6 | 18 ms | +46% |
| with-detail+pos | 20 M/s | 48.9 | 24 ms | +32% |
| builder | 20 M/s | 48.8 | 24 ms | +33% |
优化方法
StructErrorImpl 中的 context: Arc<Vec<OperationContext>> → context: Option<Arc<Vec<OperationContext>>>。
空 context 时不再堆分配,仅在 with_context() 或 ContextAdd::add_context() 首次调用时懒初始化。
分析
- bare(18.2 ns)现为主要来自
Box::new+ 栈构造 - with-detail 比 bare 多一次
String堆分配(约 18 ns) - with-detail+pos 比 bare 多两次
String堆分配(约 30 ns) - 预期符合:去掉一次空 Arc 堆分配 reduce ~18 ns
测试文件:tests/perf_context_allocation.rs
优化改动:src/core/error/carrier.rs + src/core/report/diagnostic.rs
Source Debug 格式化性能影响
测试 eager format!("{source:?}") 在 collect_source_frames 中的实际开销及优化效果。
运行:cargo test --release --test perf_context_allocation -- --nocapture
结果
Before:eager debug: format!("{source:?}")
| 场景 | 吞吐量 | ns/iter | 说明 |
|---|---|---|---|
| bare | 56 M/s | 18.0 | baseline |
| with-std-source | 2.5 M/s | 400.9 | + io::Error |
| with-std-verbose | 1.7 M/s | 581.0 | + 256-byte io::Error |
| with-struct-src | 458 K/s | 2184.8 | + StructError (2 contexts) |
| deep-struct-src | 420 K/s | 2381.9 | + 3 层 StructError 链 |
After:lazy debug: None(优化后)
| 场景 | 吞吐量 | ns/iter | 提升 |
|---|---|---|---|
| bare | 58 M/s | 17.3 | +4% (noise) |
| with-std-source | 3.9 M/s | 259.3 | +55% |
| with-std-verbose | 4.0 M/s | 252.1 | +130% |
| with-struct-src | 849 K/s | 1177.8 | +86% |
| deep-struct-src | 1.2 M/s | 821.3 | +190% |
分析
with-std-source从 400.9 → 259.3 ns,Debug格式化占 ~140nswith-std-verbose从 581.0 → 252.1 ns,长消息的 Debug 开销被完全消除with-struct-src从 2184.8 → 1177.8 ns(-46%),Debug 遍历 context 栈的开销消失deep-struct-src从 2381.9 → 821.3 ns(-65%),最深层的帧直接拷贝已有帧,无额外格式化
优化方法
将 SourceFrame.debug 从 String 改为 Option<String>:
#![allow(unused)]
fn main() {
// Before
pub debug: String,
// 在 collect_source_frames 中:debug: format!("{source:?}"),
// After
pub debug: Option<String>,
// 在 collect_source_frames 中:debug: None,
}
Redaction 仍然支持 debug 字段——测试中显式设置了 Some(...) 的值会被正常处理。None 的帧在 redaction 中跳过。