AI Evaluation Engineer, Apple

Apply for this job

Email *
Executive Name *

Job Description

Apple is searching for an AI Evaluation Engineer who will be responsible for overseeing the complete process of evaluating artificial intelligence products. The position requires execution of assessment frameworks together with instrumentation and operational procedures which will evaluate LLM results and agent actions and AI system efficiency while supporting product launches and developing new artificial intelligence solutions which will improve decision-making and analytical capabilities for Apple Sales.

Posted On: February 18, 2026

Qualification: B.S. Degree in Computer Science/Engineering or equivalent

Experience: 4+ years in AI engineering, ML, data science, or related fields

Apply: Apply Now

Main Duties:

  • Evaluate AI workflows which assess LLM performance through chat, summarization and recommendation and agent-based functionalities. 
  • The evaluation process needs both rubric-based assessments and LLM-as-a-Judge evaluations to evaluate AI output correctness and relevance and grounding and consistency. 
  • AI systems to capture user interactions through tracing of system operations and recording of system prompts and system responses and system-related information and user feedback. 
  • Create evaluation standards while helping design systems and improve AI decision-making systems.

Essential Qualifications:

  • Develop advanced python programming abilities. 
  • Practical experience using GenAI technology together with RAG pipelines and recommendation systems and context engineering methods. 
  •  LLM ecosystems which include OpenAI and Anthropic and Gemini together with vector databases which include Pinecone and FAISS and Milvus and PostgreSQL. 
  • Using SQL together with data analytics tools which include Hadoop and Spark and Snowflake.
  • Experience working with continuous integration and continuous delivery together with artificial intelligence validation procedures. 
  • Possesses expertise in monitoring systems and assessment frameworks used for evaluating artificial intelligence agents. 
  • Demonstrates exceptional ability to manage their time while effectively working with different teams.

Preferred Skills:

  • Experience with LLM observability tools
  • Understanding of embeddings together with retrieval algorithms and agents and vector development graphs. 
  • Expertise in distributed systems and their applications with asynchronous messaging through RabbitMQ and Redis and Valkey. 
  • Hold an advanced degree which can be either a Master of Science or a Doctor of Philosophy in Economics or Electrical Engineering or Statistics or Data Science. 
  • Abilities to communicate with others while managing relationships with stakeholders.