Autonomous AI Engineer (Agentic + RAG + Eval Harness)

Upwork

Remoto

•

14 hours ago

•

No application

About

Were commissioning a fixed-price pilot that can evolve into a retainer if successful. Budget for the 23 week pilot is $4,500. The goal is an MVP agent that reads a task brief, plans subtasks, calls tools and APIs, retrieves from our knowledge base, and produces a validated deliverable with its own self-checks. The system should include a planner/executor that decomposes tasks and runs them with tool use such as web search, code execution, and database reads and writes. It should ship with a retrieval-augmented generation pipeline spanning roughly twenty thousand internal documents, with guardrails to reduce hallucinations. An evaluation harness must measure quality, latency, and cost with automated regression checks, and we want basic observability for traces, token usage, costs, and pass or fail metrics presented in a simple dashboard. Were flexible on stack, but expect Python or TypeScript, LangGraph or LangChain or CrewAI, OpenAI or Anthropic or Groq models, pgvector or Weaviate for vector storage, LiteLLM for routing, sandboxed code runners like E2B, OpenTelemetry for traces, and Docker for packaging. We will provide API keys, a small redacted dataset, sample tasks, and acceptance rubrics. Deliverables are a repo with a clear README and one-click Docker run, configurable agent graphs with a tool registry, a RAG service with evaluation scripts wired into CI, a minimal UI in Next.js or Streamlit to submit tasks and view traces, and a short deployment guide from development to staging. Success looks like at least a twenty-five percent reduction in manual task time versus our baseline, an average cost per standard task of no more than thirty cents, and a factuality score of at least 0.85 on our rubric. Please include two or three concrete agent or RAG examples with links or repos, and a short paragraph describing how you design evaluations to catch silent regressions.

Remove Ads

Similar Positions

Physician – Internal Med...

Optum

Atlanta, GA

Explore opportunities with Kelsey-Seybold Clinic, part o...

25 minutes ago

Events and Administrative Coor...

Stanford University

Palo Alto, CA

The Department of Cardiothoracic Surgery has a long tradition...

26 minutes ago

NP or PA – Geriatric Ext...

Optum

Everett, WA

Optum WA, (formerly The Everett Clinic) is seeking a Ger...

26 minutes ago

Physician- Internal Medicine |...

Optum

Westminster, CO

Explore opportunities with Kelsey-Seybold Clinic, part o...

26 minutes ago

Monitor Tech Telemetry FT Days

Saint Vincent Hospital

Worcester, MA

Saint Vincent Hospital offers a whole new ex...

26 minutes ago

Get our app today

Autonomous AI Engineer (Agentic + RAG + Eval Harness)

Autonomous AI Engineer (Agentic + RAG + Eval Harness)

About

Application