All MicroEvals
Act as a senior AI platform architect designing an enterpris...
Create MicroEval
Header image for Act as a senior AI platform architect designing an enterpris...

Act as a senior AI platform architect designing an enterpris...

Prompt

Act as a senior AI platform architect designing an enterprise-grade, sovereign AI service. Compare the models: Provide a detailed, technical evaluation across these dimensions: 1. Model capabilities - Reasoning quality (multi-step logic, accuracy) - Coding ability (generation, debugging, correctness) - Language performance (clarity, instruction following) - Multimodal capabilities (if applicable) 2. Architecture and limits - Context window size and practical usable limits - Token efficiency and memory behaviour - Known limitations (hallucinations, drift, truncation issues) 3. Inference performance - Latency and responsiveness (relative comparison) - Throughput considerations (batch vs real-time usage) - Scaling behaviour under load 4. Cost and efficiency - Cost per 1M tokens (input/output if known) - Cost vs quality trade-offs - Efficiency for large-scale workloads 5. Deployment and control - API vs self-hosted availability - Support for on-prem / air-gapped environments - Suitability for sovereign cloud deployments - Vendor lock-in risks 6. Security, privacy, compliance - Data handling characteristics - Suitability for regulated industries (finance, government, health) - Key risks (data leakage, training exposure, logging) 7. Integration and ecosystem - API maturity and stability - Tool support (function calling, agents, RAG support) - Compatibility with Kubernetes / OpenShift / enterprise platforms 8. Use case suitability For each model, assess fit for: - RAG / knowledge assistants - Enterprise copilots - Code generation - High-accuracy reasoning tasks - Customer-facing applications 9. Risks and operational considerations - Hallucination profile - Monitoring and guardrail requirements - Operational complexity Final output format: - A structured comparison table (side-by-side) - Then a concise executive summary using bullet points: - Key strengths per model - Key weaknesses per model - Clear recommendations for: * highest quality * lowest cost * sovereign / regulated workloads * balanced enterprise use Be opinionated and make clear recommendations. Avoid generic descriptions and focus on practical decision-making. Assume the target environment is an Australian sovereign cloud platform serving regulated industries, prioritising data residency, privacy, and control over cost.

Drag to resize
Drag to resize