LLM Serving Architecture: Latency and Cost ControlsArchitecture patterns to reduce response time and token spend without sacrificing output quality.