The Linac Coherent Light Source (LCLS), located at SLAC, began operation in 2009 as the world's first x-ray free-electron laser, producing ultra-fast pulses of coherent x-rays with unprecedented brightness. A suite of x-ray instruments are used to exploit the unique LCLS scientific capabilities. All these different instruments adopt a common control and data acquisition system. Our division is responsible for developing the control and data systems for the different LCLS experiments. The IT & Networking group within our division is responsible for designing and developing tools for system deployment, diagnose and resolve complex systems configuration, perform software and hardware installs, patches, updates, create and maintain policies that improve internal services, manage existing and new network solutions, monitor the network stability and up-time, and participate in a 24x7 pager rotation. The successful candidate will play a key role in the IT & Networking group to provide guidance to management as to advancing our capabilities by exploiting new technologies and implement those technologies that best suit the needs of the facility. This position will also oversee the experimental data flows in LCLS in order to ensure reliability and availability of the data, participate in the general design efforts of the LCLS computing system in order to convey the data management related aspects of the system, and perform software development to implement various tools which may be needed in order to effectively organize the data flow.
Your specific responsibilities include:
To be successful in this position you will bring:
- Research and provide conceptual ideas that enhance the capabilities of the IT infrastructure for LCLS.
- Design and develop systems that implement cutting edge technologies and advance the state of the art in support of high throughput, high availability data collection and networking systems.
- Create roadmaps for sustaining and upgrading equipment in support of ever changing and advancing needs of the experimental facilities.
- Collaborate across directorates, other laboratories, industry and the university to find solutions that exist and can be leveraged to support the experimental facilities.
- Ensure the reliability and availability of computing infrastructure.
- Maintain and upgrade computer and network infrastructure.
- Troubleshoot complex problems as they arise utilizing deep understanding of the diverse technologies distributed across the experimental facilities.
- Design, develop, install, and maintain operating systems, utilities, and applications software on computing systems.
- Resolve system emergencies with significant impact on the integrity of user data and systems.
- Design, configure, implement, and maintain system security strategies, policies, and procedures.
- Establish software/hardware standards and systems policies and procedures.
- Engage in long-term strategic planning with regard to systems development and integration.
- Conceptualize and implement systems within broad technical or structural frameworks.
- Perform capacity planning for system configuration, software services, network services, load distribution, and service interrelationships among computer systems.
- Act as a technical expert or lead for local computer system administration.
- Act as project leader on large-scale computing projects in which strong technical, directional, and personal leadership is necessary; assign and oversee the work of other system administrators as needed.
- Manage vendor relationships and cost effective hardware and software maintenance agreements with vendors.
In addition, preferred requirements include:
- Bachelor's degree in computer science, mathematics, physics, engineering or related and eight years of relevant experience, or a combination of education and relevant experience
- Solid understanding of networking and distributed computing.
- Must have extensive programming skills with scripting languages.
- Significant experience with PC hardware, knowledge of the Linux kernel, ability to tweak kernel parameters for special applications.
- Hands-on experience with installing hardware and software, experience with DNS, DHCP, HTTP, SMTP, LDAP, NFS, IPMI, kickstart, Puppet.
- Demonstrated experience in technical team leadership and/or project leadership.
- Strong problem solving skills and ability to identify and recommend solutions to improve efficiency and take corrective actions independently.
- Excellent communication skills and ability to work well in a research and development team.
SLAC employee competencies:
- A Master's degree in computer science, mathematics, physics, engineering or related and six years of relevant experience, or a PhD and four years of relevant experience
- Experience with High-availability architectures, file system technologies, RAID technologies, virtualization technologies and containers.
Physical requirements and Working conditions:
- Effective Decisions: Uses job knowledge and solid judgment to make quality decisions in a timely manner.
- Self-Development: Pursues a variety of venues and opportunities to continue learning and developing.
- Dependability: Can be counted on to deliver results with a sense of personal responsibility for expected outcomes.
- Initiative: Pursues work and interactions proactively with optimism, positive energy, and motivation to move things forward.
- Adaptability: Flexes as needed when change occurs, maintains an open outlook while adjusting and accommodating changes.
- Communication: Ensures effective information flow to various audiences and creates and delivers clear, appropriate written, spoken, presented messages
- Relationships: Builds relationships to foster trust, collaboration, and a positive climate to achieve.
- Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.
- Interpersonal Skills: Demonstrates the ability to work well with Stanford colleagues and clients and with external organizations.
- Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for environment, safety and security; communicates related concerns; uses and promotes safe behaviors based on training and lessons learned. Meets the applicable roles and responsibilities as described in the ESH Manual, Chapter 1—General Policy and Responsibilities: http://www-group.slac.stanford.edu/esh/eshmanual/pdfs/ESHch01.pdf
- Subject to and expected to comply with all applicable University policies and procedures, including but not limited to the personnel policies and other policies found in the University's Administrative Guide, http://adminguide.stanford.edu
- Policies found in the University's Administrative Guide, http://adminguide.stanford.edu.