eval

Prompt

# STRUCTURED REASONING EVAL — Portfolio Triage v2 You are a research analyst evaluating THREE Entities in parallel. Read each bundle and emit ONE strict JSON object covering all three. No prose before or after. No code fences. JSON only. ## Hard rules 1. AS-OF below is "today". Ignore any knowledge from after that date — even from your own training. 2. Reason only from the materials in this prompt. 3. Apply the rubric mechanically. Top-down precedence. First matching rule wins. 4. The DEFAULT posture is "advance" unless a specific rule fires "abstain". Quiet, normal-looking data is a REASON to advance — not a reason to skip. 5. Output ONLY the JSON object specified at the bottom. ## Per-entity rubric (apply to each of the three independently) Let R = recent_metric / baseline_metric. Let N = number of dated source items. Let S = number of source items dated within (AS-OF − 7 days). Apply the first matching rule, top-down: - Any source dated within (AS-OF − 7 days) materially contradicts the `direction_hint` for THIS entity: → recommendation = "abstain" → attention_score = 65..80 → kill_flags += "thesis_contradicted_by_recent_source" - A SCHEDULED, NAMED event appears in any source within (AS-OF, AS-OF + 21d) (e.g. "review meeting on YYYY-MM-DD", "ruling expected on YYYY-MM-DD"): → recommendation = "abstain" → attention_score = 70..85 → kill_flags += "scheduled_event_within_window" - R > 1.8 AND no fresh source within (AS-OF − 7 days) introducing new specific information: → recommendation = "abstain" → attention_score = 70..90 → kill_flags += "over_anticipated" - N < 3: → recommendation = "abstain" → attention_score = 50 → kill_flags += "insufficient_evidence" → confidence = "low" - 0.8 ≤ R ≤ 1.2 AND S = 0: → recommendation = "advance" → attention_score = 25..40 → confidence = "med" - R < 0.8 AND a fresh positive source exists within (AS-OF − 7 days): → recommendation = "advance" → attention_score = 10..30 → confidence = "high" - Otherwise: → recommendation = "advance" → attention_score = 40..60 → confidence = "med" `confidence` ∈ {"low","med","high"}. "high" only when rubric matches unambiguously and N ≥ 4 with S ≥ 1. `thesis_one_line` per entity ≤ 200 chars. ONE sentence. ## Portfolio rule After per-entity evaluation: - `advances` = list of entity_ids whose recommendation == "advance" - `abstains` = list of entity_ids whose recommendation == "abstain" - Do NOT advance an entity that the per-entity rubric abstained. - Do NOT abstain an entity that the per-entity rubric advanced. - `selection_rationale` ≤ 200 chars. One sentence summarizing the portfolio shape. ## Output schema (emit this exact object, populated) { "evaluations": [ { "entity_id": "<string>", "recommendation": "advance" | "abstain", "thesis_one_line": "<string, ≤200 chars>", "evidence_class": "filings" | "sentiment" | "metric_divergence" | "external" | "none", "attention_score": <integer 0..100>, "confidence": "low" | "med" | "high", "kill_flags": [<zero or more short strings>] }, { ... entity 2 ... }, { ... entity 3 ... } ], "advances": [<entity_ids whose recommendation == "advance">], "abstains": [<entity_ids whose recommendation == "abstain">], "selection_rationale": "<string, ≤200 chars>" } ## Input AS-OF: 2026-03-15 === ENTITY 1 === ENTITY_ID: J-2210 DIRECTION_HINT: "positive" RECENT_METRIC: 1.65 (measured 2026-03-14) BASELINE_METRIC: 1.50 (rolling 90-day mean) SOURCES: 1. [2026-03-10, filing] Quarterly compliance report for J-2210, no deviations. 2. [2026-03-05, external] Industry overview lists J-2210 among ten entities "operating within expected parameters." 3. [2026-02-28, filing] Annual update reaffirms long-term protocol unchanged. 4. [2026-02-25, sentiment] Forum mention rank: top-180 (no movement). === ENTITY 2 === ENTITY_ID: J-2211 DIRECTION_HINT: "positive" RECENT_METRIC: 2.18 (measured 2026-03-14) BASELINE_METRIC: 1.50 (rolling 90-day mean) SOURCES: 1. [2026-03-13, external] Trade publication: "J-2211 measurement spikes ahead of scheduled review on 2026-04-08; analysts expect resolution that day." 2. [2026-03-11, sentiment] Forum mention rank: top-8 (up from top-40 two weeks ago). Sentiment moderately positive. 3. [2026-03-08, filing] Routine disclosure on J-2211 operations; no anomalies. 4. [2026-02-27, filing] Prior-quarter report, no material changes. 5. [2026-02-18, external] Sector overview names J-2211 as one of three entities to "watch into next month's review cycle." === ENTITY 3 === ENTITY_ID: J-2212 DIRECTION_HINT: "positive" RECENT_METRIC: 3.70 (measured 2026-03-14) BASELINE_METRIC: 1.50 (rolling 90-day mean) SOURCES: 1. [2026-02-22, external] Industry recap: J-2212 metric has been climbing for six weeks under existing protocol. 2. [2026-02-15, filing] J-2212 annual report — "operations steady, no announced changes." 3. [2026-03-01, sentiment] Mention rank top-22, stable. 4. [2026-02-08, filing] Prior quarter routine compliance. Produce the JSON object now.

MiniMax

MiniMax-M2.1

Drag to resize

Anthropic

Claude Opus 4.6 (Adaptive Reasoning, Max Effort)

Drag to resize

Google

Gemma 4 26B A4B (Reasoning)

Drag to resize

Google

Gemini 3 Flash Preview (Reasoning)

Drag to resize