Search for More Jobs
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: Hyderabad, Telangana, India
Career Level: Entry Level
Industries: Technology, Software, IT, Electronics

Description



WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. 

AMD together we advance_



THE TEAM 

AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.

THE ROLE:

We are seeking an experienced HPC Systems Engineer with 7+ years of expertise in high-performance computing (HPC) environments. This role requires hands-on experience with Python, Kubernetes (K8s), Slurm, OpenStack, and Ansible, along with the ability to support external clients in live troubleshooting sessions.

The PERSON:

The ideal candidate will have deep technical knowledge of drivers, troubleshooting methods, and system-level debugging and will play a key role in managing, optimizing, and troubleshooting HPC clusters and cloud-based HPC environments.

  KEY RESPONSIBILITIES:

 

HPC System Administration & Troubleshooting
  • Manage and optimize HPC clusters, ensuring high availability and performance.
  • Troubleshoot GPU, CPU, network drivers, firmware, and OS-level issues.
  • Debug storage, networking, and job scheduling bottlenecks in Slurm-based environments.
Kubernetes & Cloud HPC Environments
  • Deploy and manage HPC workloads in Kubernetes for AI/ML and parallel computing.
  • Optimize OpenStack-based HPC clusters with Ceph, Cinder, and Neutron for cloud scalability.
  • Implement containerized HPC workflows using Kubernetes and OpenShift.
Automation & Infrastructure as Code (IaC)
  • Develop Ansible and Terraform scripts for provisioning and managing HPC resources.
  • Automate job scheduling, cluster monitoring, and log analysis using Python.
  • Optimize CI/CD pipelines for HPC and AI/ML applications.
Performance Tuning & Benchmarking
  • Benchmark and optimize multi-node HPC workloads (MPI, NCCL, ROCm, CUDA).
  • Tune OS parameters, networking (InfiniBand, RoCE), and Slurm configurations for peak performance.
  • Enhance HPC storage performance (Ceph, Lustre, NFS) and distributed computing efficiency.
Client Support & Collaboration
  • Provide real-time technical support and troubleshooting for HPC users.
  • Engage with developers, DevOps, and system administrators to optimize cluster performance.
  • Document solutions, best practices, and contribute to internal knowledge bases.
PREFERRED QUALIFICATIONS:
  • Experience with AMD MI300, MI2X0 GPUs, ROCm, MPI, UCX, or XPMEM.
  • Exposure to containerized workloads using Singularity or Docker in HPC.
  • Familiarity with OpenStack deployment automation (e.g., TripleO, Kolla, or OpenStack-Ansible).
  • Experience in customer-facing technical roles, with a strong ability to troubleshoot live issues.

This role is critical in ensuring seamless HPC operations, troubleshooting complex system issues, and supporting high-profile clients with real-time problem resolution in both bare-metal and cloud-based HPC environments.

 

ACADEMIC CREDENTIALS:

  • Bachelor or Masters Degree in Computer Engineering or Electrical/Electronics Engineering

 

#LI-PK1



Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.


 Apply on company website