Sr Monitoring Performance Engineer
Role: Senior Observability Engineer
What Is the Opportunity?
Northern Trust Enterprise Monitoring & Tools Services is hiring a Senior Observability Engineer who will design, plan, implement, and maintain complex log aggregation and monitoring solutions through both third party and in house developed tools. Lead enterprise-wide event, tracing, and log aggregation projects in emerging technologies such as the public cloud, hybrid cloud, and container technologies, with a focus on the foundational engineering constructs through monitoring, logging, and observability. Participate and assist with a move towards an SRE work ethic with all work related to observability. The incumbent is a leader, and self starter who demonstrates a thorough understanding of the activities performed related to engineering, support, installation and/or operations of infrastructure technologies. Plans at an operational level designing and developing technology solutions interfacing with appropriate customers, management and technical resources. Facilitates and/or participates in the design, development, and implementation of large complex technology solutions supporting one or more business and/or technology areas. Develops and implements appropriate solutions that may involve multiple platforms, databases, software/hardware technologies and tools. Strong ability to multi-task and reprioritize in an environment of changing priorities and digital transformation.
Primary Job Duties & Responsibilities
· Meet with internal customers to review, evaluate and provide direction for our monitoring and logging solutions.
· Maintain and troubleshoot the current monitoring infrastructure.
· Participation in a regular on-call support rotation.
· Research and thoroughly study documentation to implement new tools or features of existing tools
· Evaluates systems specifications regarding customer requirements, transforming specifications into cost-effective, technically correct solutions.
· Prioritizes work and manages projects within established budget objectives and customer priorities.
· Responsible for establishing and managing to established quality control and security protocols.
· Provides the division, department and business area management with timely and accurate information regarding the status and performance of the assigned project(s).
· Leverages technology to develop, redesign and/or implement optimal technology solutions.
· Builds, leverages, and maintains effective alliances across technical and business community.
· Interacts with customers to achieve efficient, effective results.
· Multi-tasks, prioritizes according to business priorities and production availability requirements.
· Promote Automated remediation principals by targeting optimal observability strategies
· Have a high level understanding of the application stacks and utilize that to trace, profile and optimize application performance within and across multiple technology stacks
· Help build Highly-Available and Scalable infrastructure with robust monitoring and alerting mechanisms
· Engineer solutions and establish standards for infrastructure, agent deployments, and decommissioning by automating manual tasks where possible
· On-board applications to the platforms and train users on how to use the tools and best proactive monitoring techniques
· Integration with enterprise monitoring tools data visualization needs
· Analyze monitoring metrics (e.g. Signal:Noise) and objective and key results (e.g. reduction of monitoring gaps) to continuously improve the teams level of service and customer experience
· Other duties as assigned.
Minimum Qualifications
· A bachelor's degree in Computer Science or a related field, or its equivalent in work experience, required.
· Must have 5 plus years of experience in Application Performance Monitoring using enterprise standard tools
· Prior experience must include experience working with agile scalable software engineering
· Prior experience must include experience in CICD, automation and DevOps practices
· Must have knowledge in application architecture, OSI layers and software design and development methodologies
· Minimum 2+ years with ElasticSearch (Observability experience preferred) or similar tool
· Minimum 2+ years with APM Tools, preferably Dynatrace.
Education, Work Experience, & Knowledge
· Experience with Kafka (Confluent Kafka experience preferred)
· Experience with AppDynamics, NewRelic, Datadog, Catchpoint.
· Seven years of experience in Technology Infrastructure or Application Development, or DevOps preferred.
· Experience providing technical direction to project teams preferred.
· Application and Infrastructure support experience preferably within a large matrixed enterprise
· Experience working with vendors and managed service providers
· Working knowledge of infrastructure technologies such as Network, Database, Server, Storage etc. preferred.
· Firm understanding of Event Management and ITIL Practices
Job Specific Technical Skills & Competencies
· Advanced knowledge of one or more of the following technical skills:
o Foundational IT Infrastructure
o Unix based Operation Systems
· Ability to utilize APIs and write scripts for automation
· Ability to collect requirements and architect complex solutions under pressure
· Strong analytical and critical thinking skills
The following skills are preferred:
o Experience implementing, maintaining, and troubleshooting observability and monitoring tool suites such as: Elastic Stack, Splunk, Prometheus, Dynatrace, AppD, Datadog, Catchpoint, Azure Monitor
o Knowledge and experience writing scripts with Python or Ansible preferred.
o Knowledge and experience with DevOps tools: Jenkins, Selenium, git and GitHub, ADO
o Experience with infrastructure as code tools such as ansible or terraform
o Experience providing technical direction to project teams preferred
Preferred License and Certifications Any certifications related to the following are preferred:
· ElasticSearch
· Dynatrace
· Kafka
· AWS/Azure/GCP
· Kubernetes
· ITIL/ITOM
· SixSigma
· PMP