CAREERS

Sr. Lead Infrastructure SRE

📁
InformationTechnology
📅
22059 Requisition #

Introduction

This is a remote position.

The IT support world is evolving and it’s clear that SREs will play a pivotal role in the future.  The resulting demand means rapid career progression and fabulous benefit packages.  We are offering you the opportunity to grow your SRE skills and experience.  You will:

  • Receive training in areas such as black-box troubleshooting and Golden Signal Monitoring
  • Work alongside experienced SREs who support and mentor new team members
  • Undertake a mix of project and BAU work to increase your level of experience

Being typical SREs, we want to automate everything and so we will help you develop those skills too.

You

You will come from one of two backgrounds.

  • Software Developer – you have been a developer for several years; you can do the job standing on your head and you are now ready for a new challenge
  • Application Analyst – you have a working knowledge of distributed systems, and you are hungry to develop your expertise

Crucial characteristics are that you love technology and always want to learn more.

Team Overview

The SRE Team provides services to our application and infrastructure support colleagues, namely:

  • Recurring Problem Diagnosis – where we determine the root cause of recurring problems where the causing technology is not obvious
  • Golden Signal Monitoring – where we monitor the availability of application and infrastructure services, notifying and helping support teams deal with the issues we identify
  • Automation – where we provide automation consultancy and build help to infrastructure teams as well as frameworks, common code and automation hosting services.
  • Community – where we promote and assist in the adoption of SRE concepts and techniques throughout the bank.

The remit for the SRE joining us will be to help deliver the Recurring Problem Diagnosis and Golden Signal Monitoring services.

We use advanced techniques.  This is an exciting opportunity for the right candidate to learn those techniques and hone their problem diagnosis and Golden Signal Monitoring skills.

This Role

Recurring Problem Diagnosis (RPD)

You will help our team diagnose ongoing recurring problems, basing the investigation on a structured problem diagnosis method called RPR.  We will teach you the RPR method and then support you through the investigations.

We rarely get involved in ongoing incidents; we specialise in diagnosing recurring problems.  Occasionally, (and this would only be three or four times each year) we are asked to attend a SWAT call (ongoing significant incident).  Sometimes, this involves work outside of normal business hours.

Golden Signal Monitoring (GSM)

Where the RPD work can be thought of as being reactive (reacting to problems that arise), the Golden Signal Monitoring service provides a proactive approach to problem detection and diagnosis.  Our GSM objectives is clear:

  1. Identify and resolve issues before they become service impacting
  2. Reduce errors and transient response time issues to drive up service levels

We monitor Golden Signals using a system called Site Reliability Core.  You will use SRC to identify app and infra service issues, perform a preliminary investigation and then raise the matter with the service owner.

We will teach you how to use Site Reliability Core and how to deal with matters arising.

Personal Qualities

To undertake this role, you will need:

  • Keenness to learn new technologies, concepts and techniques.
  • Demonstratable critical thinking skills – we deal with complex issues, and this requires clear thought processes.
  • Synthesise an approach to an issue from existing knowledge and the new techniques we teach you.
  • Drive and determination – the issues we deal with often take twists and turns demanding real stamina from our SREs.
  • Confidence to stand your ground using data to explain your conclusions and recommendations.

Qualifications

  • The following qualifications are essential for this role:
    • At least 5 years’ experience in either;
    • Java EE / Jakarta EE application software development, or
  • Java EE / Jakarta EE application support
  • A demonstratable understanding of distributed systems.
  • A working knowledge of containerised applications.
  • A demonstratable basic understanding of TCP/IP.
  • A demonstrable understanding of an application layer protocol such as HTTP.

Nice to Have

The following qualifications would be beneficial to this role:

  • Experience developing or supporting applications based on Tomcat application servers.
  • Experience developing or supporting applications based on WebLogic application servers.
  • Experience developing or providing support in a microservice environment.
  • Knowledge of a messaging technology such as MQ (Message Queue), Solace or Kafka.
  • Experience in full stack support (application, data and infrastructure).
  • Knowledge of Oracle or Microsoft SQL Server relational database technologies.
  • Experience in analysing data logs using Elastic Kibana.
  • Experience in analysing data logs using Azure Log Analytics.
  • Experience in the use of Wireshark for the capture and analysis of network packet traces.
  • Experience (past or present) in the use of an automation platform such as Ansible, Puppet, Chef, Salt or vRA.
  • Experience developing or supporting applications based on Pivotal Cloud Foundry (Tanzu Application Service).
  • Knowledge of SRE concepts and techniques.
  • Experience with DevOps-related tasks; in particular, BAU support.
  • Experience in using ServiceNow.
  • An understanding of the regulatory landscape for financial services.

Tasks & Responsibilities

General

  • Attend weekly team meetings.
  • Submit time records at the end of each week.
  • Undertake general tasks that may be allocated from time-to-time.

Recurring Problem Diagnosis (RPD)

The investigations will be based on our RPR problem diagnosis method which we will teach you.  The tasks and responsibilities are:

  • Conduct Discovery Calls to obtain;
    • a problem statement,
    • a high-level understanding of the moving parts of the system to investigate,
    • how the data flows around the system, and
    • the diagnostic data sources available.
  • Produce a Diagnostic Capture Plan that describes how the data needed will be captured.
  • Help app and infra people to execute the Diagnostic Capture Plan.
  • Analyse the data that results to determine the root cause of the problem, or the next steps.
  • Issue periodic email-based status reports.
  • Attend investigation progress meetings with stakeholders.
  • Notify the team leader of blockers or other issues that may arise.
  • Assist other SREs in investigations.
  • Handle multiple RPDs at any time – this is possible as there can be long pauses in the investigations.
  • Undertake projects to improve our ability to solve problems.

Golden Signal Monitoring

  • Use Site Reliability Core (SRC) to identify app and infrastructure services that are missing their Service Availability target or in danger of doing so.
  • For the services identified as having a problem, investigate using SRC and other data sources.
  • Assess the underlying issue against criteria that we have establish and, where appropriate, create a ServiceNow problem record with details of the problem and assign to the service owner.
  • Work with the team that owns the service to help them understand our findings and explain to vendors.
  • Assist the service support team in determining the cause of the problem.
  • Assist in the operation of our SRC system including onboarding of services, setting availability metrics and fine tuning SLOs.
  • Undertake projects to improve our ability to monitor systems and deliver service availability information.

Previous Job Searches

My Profile

Create and manage profiles for future opportunities.

Go to Profile

My Submissions

Track your opportunities.

My Submissions
Reasonable Accommodation
Northern Trust is committed to working with and providing reasonable accommodations to individuals with disabilities. If, because of a medical condition or disability, you need a reasonable accommodation for any part of the employment process, please email our HR Service Center or call 1-800-807-0302 (North America), +630-276-5353 (Asia Pacific), 1800-425-0333 (India), +44(0)207 982 4357 (Europe, Middle East and Africa) and let us know the nature of your request and your contact information.

Equal Employment Opportunity Statements
  • APAC/India EEO Statement
    • APAC/INDIA EEO STATEMENT

      It is the policy and practice of Northern Trust to provide equal employment opportunities to all employees and applicants. Northern Trust does not discriminate on the basis of race, colour, religion or belief, nationality, ethnic or national origin, sex, marital status, sexual orientation, disability or age. All employment decisions will be made in a non-discriminatory manner in accordance with our obligations under the law and codes of practice. This includes human resources’ decisions relating to recruitment, terms and conditions of employment, transfers, promotions and access to learning and development.

  • Canada EEO Statement
    • Canada EEO STATEMENT

      Northern Trust is an Equal Opportunity Employer. Hiring and other employment decisions at Northern Trust are made without regard to race, colour, religion, sex, ancestry, national origin, ethnic origin, age, disability, citizenship, veteran status, sexual orientation, record of offences, marital status, family status, or any other characteristic protected by federal, provincial, or local law, regulation, or ordinance.

  • EMEA EEO Statement
    • EMEA EEO STATEMENT

      It is the policy and practice of Northern Trust to provide equal employment opportunities to all employees and applicants. Northern Trust does not discriminate on the basis of race, colour, religion or belief, nationality, ethnic or national origin, sex, marital status, sexual orientation, disability or age. All employment decisions will be made in a non-discriminatory manner in accordance with our obligations under the law and codes of practice. This includes human resources’ decisions relating to recruitment, terms and conditions of employment, transfers, promotions and access to learning and development.

  • USA EEO Statement
    • USA EEO STATEMENT

      It is the policy of The Northern Trust Company to afford equal opportunity in all phases of employment without regard to an individual's age, race, color, religion, creed, gender, national origin, citizenship status, marital status, pregnancy, sexual orientation, gender identity, gender expression, genetic tests and information, physical or mental disability, protected veteran status or any other legally protected status. EEO poster (U.S.)EEO is Law Poster Supplement


Pay Transparency
Pay Transparency Nondiscrimination Provision (U.S)