When AI must retrieve context at runtime, every answer can depend on what gets pulled right now: more compute, more latency, higher power demand, and harder-to-control accuracy.
Compute HeavyEach request can trigger retrieval, ranking, prompt expansion, and generation.
ReactiveThe answer depends on what was found during that specific moment.
Latency RiskRuntime lookups introduce visible delays before the AI can respond.
Control BurdenAccuracy, governance, and consistency are harder to lock down.