Responsibilities:
-
Co‑create AI vision, KPI tree, and prioritized use‑case portfolio with business leaders.
-
Translate strategy to a delivery roadmap and budget, with explicit risk and dependency plans.
-
Delivery leadership
-
Lead cross‑functional pods (data, platform, app, safety, SRE) from discovery through production.
-
Design A/B and canary rollout strategies; enforce guardrails and incident playbooks.
-
AI system engineering
-
Architect and guide implementation of LLM/RAG/agent solutions, including retrieval quality, prompt/policy engineering, guardrails, and evaluation harnesses.
-
Drive observability (tracing, safety counters, cost telemetry) and SLO compliance.
-
Governance and security
-
Stand up policy‑as‑code, model/prompt versioning, access controls, data residency, and vendor risk assessments.
-
Chair or collaborate with the governance board; run review gates.
-
Stakeholder engagement
-
Communicate simply and often; convert ambiguity into decisions; manage expectations.
-
Run demos, evidence‑based decisions, and post‑incident reviews.
-
Talent enablement
-
Mentor teams on AIOps/SRE practices; cultivate champions; reduce burden through automation.
Must‑have skills:
-
Leadership and ownership
-
Operates with high autonomy, bias to action, and accountability for outcomes.
-
Proven ability to align executives and guide cross‑functional teams without formal authority.
-
Communication and influence
-
Excellent written and verbal communication and meeting facilitation skills.
-
Translates technical topics (LLMs, safety, SOs) into business terms and tradeoffs.
-
Product and delivery thinking
-
Evidence‑driven decisions; comfort with A/B testing, canary rollouts, and ROI models.
-
Governance and security mindset
-
Practical understanding of data governance, privacy, and AI safety guardrails; policy‑as‑code.
-
Hands‑on AI systems integration
-
Experience integrating GenAI (LLMs/RAG/KG/agents) into real products with telemetry, guardrails, and rollback.
-
AIOps and reliability fundamentals
-
SLI/SLO design, error budgets, incident management, observability, CI/CD for prompts/policies/indexes.
-
Manufacturing/OT/IoT/Edge AI experience; familiarity with device data and shop‑floor constraints.
-
Microsoft Azure: Azure OpenAI, Cognitive Search, API Management, App Service/AKS, Functions, Event Hubs, Key Vault, Monitor; Azure ML or equivalent.
Nice‑to‑have skills
-
SAP ecosystem awareness (SAP/S4H processes and integration points for AI governance).
-
Data and knowledge systems
-
Knowledge graphs/ontologies, hybrid retrieval (vector + keyword), embeddings, data contracts.
-
Safety and compliance
-
Red‑teaming methods, PII/PHI handling, content moderation pipelines, audit trails.
-
Cost and performance engineering
-
token/call cost controls.
Experience profile
-
7–12+ years in software/AI product delivery with 3+ years leading cross‑functional initiatives.
-
Track record of shipping AI or data‑intensive systems to production in enterprise settings.
-
Demonstrated practice of Site Reliability Engineering (SRE)/AIOps concepts (SLOs, incident response, observability).