Description
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
Key Responsibilities Customer Engagement & Technical Debug Support
- Serve as the primary technical interface for customers on GPU server bring‑up, stability, and debug issues.
- Support customers during system integration, validation, and production ramp, acting as the first line of escalation.
- Support POC, EVT/DVT/PVT, and early customer deployments from a system debug perspective.
- Review customer system architecture and provide debug readiness, risk assessment, and best‑practice guidance.
- Diagnose and resolve server‑level issues including boot failures, OS bring‑up, GPU/NIC detection, PCIe issues, and system hangs.
- Perform HW/SW co‑debug across BIOS/UEFI, BMC, firmware, drivers, OS, and GPU stacks.
- Analyze logs, dumps, and traces (BIOS, BMC, OS, GPU, NIC) to isolate root causes.
- Work closely with ODMs, component vendors, and internal engineering teams to drive issue closure.
- Debug GPU server issues related to power, thermals, PCIe, interconnects, and multi‑GPU configurations.
- Validate GPU functionality under stress, burn‑in, and long‑run stability conditions.
- Support RMA analysis and failure reproduction when required.
- Assist with system‑level performance validation and identify platform bottlenecks.
- Support customer concerns related to system stability, reliability, and scalability in multi‑GPU servers.
- Create debug guides, checklists, and best‑practice documents for server bring‑up and issue triage.
- Provide technical training to customers and internal teams on server debug methodology and tools
- Bachelor's or Master's degree in related field.
- 5+ years of experience in server platform debug, GPU systems, or data center hardware support.
- Strong understanding of x86 server architecture, GPU platforms, PCIe, memory, power, and thermals.
- Hands‑on experience with Linux OS, system logs, firmware, and driver‑level debugging.
- Experience working with ODMs/OEMs and cross‑functional engineering teams.
- Strong communication skills for customer‑facing debug and escalation management.
- Experience debugging GPU servers or AI/HPC platforms in customer environments.
- Familiarity with BIOS/UEFI, BMC (OpenBMC), firmware update flows, and server validation stages.
- Understanding of networking (NICs, RDMA, Ethernet/InfiniBand) in GPU servers.
- Ability to work independently, manage multiple customer issues, and drive problems to closure.
- #LI-SC1
- #LI-HYBRID
Benefits offered are described: AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's “Responsible AI Policy” is available here.
This posting is for an existing vacancy.
Apply on company website