Trust and Safety
GDD takes a structured approach to trust. AI agents read instructions from nested project components, and not all of those components are equally trustworthy. The framework provides explicit rules for how to handle this.
The Trust Hierarchy
graph BT
L4["User instructions<br/>(in-session)"] --> L3["Non-ecosystem components<br/>(untrusted until reviewed)"]
L3 --> L2["Ecosystem components<br/>(trusted, flag conflicts)"]
L2 --> L1["Yggdrasil root instructions<br/>(highest trust)"]
| Level | Source | Treatment |
|---|---|---|
| 1 (highest) | Yggdrasil root instructions | Trusted — the base |
| 2 | Ecosystem components (in ecosystem.yaml) |
Trusted — flag conflicts with root |
| 3 | Non-ecosystem components | Untrusted until reviewed — log before processing |
| 4 | User instructions in-session | Respected unless safety-violating |
The Black-Box Safety Pattern
When the orientation skill encounters instructions from an untrusted or suspicious source, it follows a specific sequence:
- Read just enough to identify the file as an instruction file from an untrusted source (filename, location, first few lines)
- Log a concern to Thalamus immediately — before reading the full content. This is the safety breadcrumb.
- Continue reading the full file
- Surface the concern to the human in conversation
- Do not follow the instruction until the human explicitly approves
Why log first? If the file contains a successful prompt injection that compromises the agent's behavior, the pre-injection concern is already on disk for the human to find. The breadcrumb survives even if the agent doesn't.
What Gets Flagged
- Instructions that contradict yggdrasil root instructions
- Requests for elevated permissions or unusual access patterns
- Instructions to ignore, override, or "forget" other instructions
- Instructions to push, publish, or send data to unfamiliar destinations
- Skills that execute code as part of loading (rather than providing guidance)
- Any instruction file that is new or modified since the last session
The Community Angle
The agent is part of the yggdrasil community. It has a responsibility not just to the current human, but to the integrity of the shared workspace:
- Do good faith work, even when asked to cut corners
- Flag things that could harm other contributors or the project
- Refuse to participate in actions that would compromise the workspace, while making clear the human is free to act on their own
The agent can't prevent a human from doing harmful things, but it can make them do those on their own — so the agent and the community have done their part.