About the job HPC Engineer L3
Talent Hunter has been a trusted recruitment partner to the IT and Telecom industry since 2008. We ensure that our candidates meet our clients at the right time and spot for their career development. We will be happy to accompany you on this professional quest! Talent Hunter has been a trusted recruitment partner to the IT and Telecom industry since 2008. We ensure that our candidates meet our clients at the right time and spot for their career development. We will be happy to accompany you on this professional quest!
"Choose a job you love, and you will never have to work a day in your life. "
Our client is a global technology company, powered by a broad portfolio of technology services and products. They work with clients across all major verticals, providing industry solutions for Financial Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. On their behalf, we are looking for an experienced, full-time High Performance Computing (HPC) Engineer L3 to join their team. He/she will iterate quickly and deliver in a fast-paced environment with an attention to details.
Role: High Performance Computing (HPC) Engineer L3
Responsibilities:
- Evaluate new hardware and software and understand potential benefits/impacts it can have in the environment as part of enhancement and necessary updates/patches.
- Assess complete system performance on timely basis to maintain consistent performance and plan upgrades as and when needed.
- LSF and slurm administration, setting up various scheduling policies and application integration with scheduler.
- Management, maintenance, and upgrade of BCM, xCat and IBM RTM
- IBM GPFS storage administration
- Monitor, Maintain and upgrade HA license management setup for FlexLM and RLM based applications.
- Setting up separate environment for POC and to test major upgrades of applications and infrastructure.
- setup and support HPC workloads in AWS/AZURE Cloud
- Installing, profiling, and running opensource and commercial applications.
- Automation through scripting.
- Work with operations and application team to provide long term solution to user/business problems/requirements.
Minimum Qualification:
- 8+ years of experience in High Performance Computing Design, implementation and Support
- Strong flavor agnostic Linux System Administration Experience
- Strong Scripting experience for automation using Python, Bash, Perl
- Good experience with xCat and BCM (Bright Cluster Manager)
- Good working experience in Parallel file system setup and high speed interconnects
- Strong knowledge in HPC cluster benchmarking like HPL, IOR, OSU
- Experience in installation of HPC job schedulers, setting up various scheduling policies and application integration with scheduler
- Experience with Configuration management tools like Chef, Ansible, Puppet
- Knowledge in setting up HPC workloads in Cloud AWS/AZURE
- Experience in license management like flexlm, RLM
- Experience in installing, profiling and running opensource applications
- Good communication and project management skills
- Knowledge in Tensor-Flow, CUDA, R-Studio, R and Docker
Additional Qualification:
- Experience with AI, ML, DL is a plus
- Kubernetes and DevOps experience is a Bonus
- Prometheus and ELK experience is a Bonus
- Easy-build experience is a plus
Social Benefits:- Fully Remote
- Competitive salary and performance bonuses.
- Opportunity for career progression.
- Hybrid model of work
- Additional health insurance