Information Technology Services
At Information Technology Services, our goal is to be the university's trusted business partner by creating a culture of exceptional customer service. Bringing together a team of diverse and talented professionals, we provide the central IT services that support USC's schools, hospitals, research centers, and administrative units. Through our recently launched digital transformation initiatives, we aim to develop an environment of continuous service improvement, founded on cross-functional teamwork, industry best practices, innovation, and commitment to the customer experience.
Senior HPC EngineerApply Information Technology Services ITS Los Angeles, California
The University of Southern California’s (USC’s) Information Technology Services is seeking a talented Senior High-Performance Computing (HPC) Engineer with an exceptional commitment to service excellence to join its team. As the Sr. HPC Engineer, you will be an integral member of the Center for Advanced Research Computing (CARC), collaborating with diverse and talented team members to support USC research community, improve customer experience, and generate value for our campus stakeholders across a broad base of departments and constituencies
The ITS vision aligns strategy, business, and services; affirms ITS cultural values; empowers cross-functional teamwork; embraces world-class best practices; and promotes innovation, excellence, agility, and efficiency. To achieve this vision, ITS is committed to providing a modern technology infrastructure that is resilient and delivers the performance necessary to meet the demands of a growing customer base, training in the latest technologies for its highly productive and motivated workforce, outstanding customer experience, and technology services that are aligned with the university’s mission to provide exceptional learning opportunities for students. ITS is creating a workplace where employees can develop cutting-edge skills, take pride in the services they provide, and have access to the roles and career paths that align to their abilities and potential.
We are looking for top talent to join us on our journey.
USC’s ITS organization represents a diverse and talented team, committed to supporting a collaborative culture and delivering secure and innovative IT services, core to the mission of USC. ITS values accountability, excellence, and commitment to exceptional customer experience. ITS strives for a supportive and inclusive culture that encourages employees to do their best work every day and where individuals are recognized and celebrated for their contributions.
USC is the leading private research university in Los Angeles—a global center for arts, technology, and international business. With more than 47,500 students, we are located primarily in Los Angeles but also in various US and global satellite locations. As the largest private employer in Los Angeles, responsible for $8 billion annually in economic activity in the region, we offer the opportunity to work in a dynamic and diverse environment, in careers that span a broad spectrum of talents and skills across a variety of academic and professional schools and administrative units. As a USC employee and member of the Trojan Family—the faculty, staff, students, and alumni who make USC a great place to work—you will enjoy excellent benefits, including a variety of well-being programs designed to help individuals achieve work-life balance.
Come join the ITS team and work as a trusted partner in shaping an environment of innovation and excellence for the university.
The candidate for the position of Senior HPC Engineer must meet the following qualifications:
- Bachelor’s degree in a relevant field such as computer science, computational science and/or engineering, etc., or equivalent combined education, training, and experience.
- Three years of experience in one of the following fields: information technology, system administration, or high-performance computing.
- Demonstrated experience leading IT project planning and implementation.
- Expertise with multi-vendor management, security, and network/Internet protocols.
- Expertise with system administrating, monitoring, and maintaining secure Linux/Unix-based HPC environment.
- Proficiency in shared and distributed memory parallelism (OpenMP, MPI) and accelerators (GPUs).
- Proficient fundamental programming skills (Bash, Perl, Python, or similar languages).
- Expertise with the HPC system software cluster management tool, job schedulers, and other HPC tools (such as Slurm, Warewulf, Ansible, etc.).
- Proficiency with low-latency/high-bandwidth interconnected infrastructure (including InfiniBand, 10/100GigE, etc.).
- Knowledge of HPC storage principles, file systems (NFS, BeeGFS, ZFS, Lustre, etc.), and compute node storage.
- Ability to identify and resolve problems and improve system performance.
- Demonstrated expertise in HPC system design configuration and planning.
- Ability to drive technical leadership and management of complex large-scale computing system projects.
- Experience establishing processes for maintaining system performance and managing best-in-class standards.
- Excellent organization and communication skills.
- Ability to develop positive working relationships and a strong rapport with team members
The ideal candidate for the position of Senior HPC Engineer has the following qualifications:
- Bachelor’s degree in a relevant field, such as computer science, computational science and/or engineering, etc.
- More than five years of experience in large-scale high-performance computing cluster systems administration, high-speed network, and various parallel file systems.
- In-depth knowledge and extensive experiences in research computing systems (HPC cluster, InfiniBand network, advanced parallel file systems, hybrid cloud systems, etc.) design and integration and related components and applications.
- Experience with Linux Container solutions (Docker, Singularity, Mesos) and orchestration tools (Kubernetes).
- Experiences with Cloud Computing Services (AWS, GCP, Azure).
THE WORK YOU WILL DO
The Sr. HPC Engineer works with other HPC Systems Engineering Team members in Center for Advanced Research Computing and collaborates with technical leadership in the design, development, installation of the High-Performance Computing (HPC) systems and maintenance of its software stack. The Senior HPC Engineer is responsible for the management of planning, implementation, availability, performance, security, maintenance, and repair of high-performance computing infrastructure. The Senior HPC Engineer oversees multi-vendor management, security, and network/Internet protocols for the ITS organization. As a member of ITS, the Senior HPC Engineer demonstrates ITS values in action.
The Senior HPC Engineer:
- Drives the day-to-day operations for the CARC’s High-Performance Computing team by monitoring computing resource performance, managing configurations, and addressing security administration. Applies revisions to system firmware and software. Engages and collaborates with vendors to assist support activities as required.
- Leads the development of new HPC software deployment plans, custom scripts, and testing procedures to ensure operational reliability for university researchers. Trains technical ITS staff in the use of new software and hardware, either developed or acquired.
- Oversees the maintenance and management of HPC researcher accounts for USC research groups. Leads the installation, modification, and maintenance of various research software applications for access on HPC clusters. Acts as a trusted technical advisor for researcher support and documentation on software applications and programs.
- Designs, installs, configures, and perform document management for cluster infrastructure, including operating systems, job schedulers, resource managers, provisioning managers, configuration managers, SAN devices, network devices, and other components.
- Investigates, debugs, and addresses researcher inquiries and requests efficiently through a customer issue ticketing system. Implements customer-focused resolutions efficiently. Communicates complex technical concepts in a simple, straightforward manner to address a broad range of stakeholders.
- Creates opportunities to explore emerging technologies and technical developments to address expanding analytical requirements. Identifies new services and develops corresponding implementation plans. Advocates for best practices in the HPC field. Champions collaborative relationships with peer HPC research organizations.
- Contributes to an inclusive environment that values differences by building and maintaining collaborative relationships with team members, peers, and ITS leaders. Actively embodies ITS values and behaviors including accountability, ethics, and best-in-class customer service. Contributes to a culture of trust and transparency by sharing information broadly, openly, and deliberately.
- Supports the vision for the Center for Advanced Research Computing. Works closely with team members and management to implement and support effective solutions for HPC. Maintains currency with technology, standards, and best practices. Supports process improvement efforts within the team and across ITS.
- Performs other related duties as assigned or requested. The university reserves the right to add or change duties at any time.
MINIMUM QUALIFICATIONS The candidate for the position of Senior HPC Engineer must meet the following qualifications: • Bachelor’s degree in a relevant field such as computer science, computer information systems, etc., or equivalent combined education, training, and experience. • Three years of experience in one of the following fields: information technology, system administration, or high-performance computing. • Demonstrated experience leading project planning. • Expertise with multi-vendor management, security, and network/Internet protocols. • Expertise with administrating, monitoring, and maintaining secure Linux/UNIX operating systems (CentOS, Solaris). • Proficiency in shared and distributed memory parallelism (OpenMP, MPI) and accelerators (GPUs). • Proficient fundamental programming skills (Bash, PERL, Python, or similar languages). • Expertise with the HPC system software cluster management tool, job schedulers, and other HPC tools (such as Slurm, Salt, xCAT, etc.). • Proficiency with low-latency/high-bandwidth interconnected infrastructure (including InfiniBand, Myrinet, 10GigE, etc.). • Knowledge of HPC storage (FC, SAS) principles, file systems (SAM-FS/-QFS, BEE-GFS, ZFS, etc.), and compute node storage (NFS). • Ability to identify and resolve problems and manage system performance. • Demonstrated expertise in design configuration and planning. • Ability to drive technical leadership and management of complex large-scale computing system projects. • Experience establishing processes for maintaining system performance and managing best-in-class standards. • Excellent organization and communication skills. • Ability to develop positive working relationships and a strong rapport with team members.
REQ20095427 Posted Date: 01/11/2021 Apply