All roles

[Remote] Sr. Site Reliability Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. PayNearMe is on a mission to simplify payments through innovative technology. As a Site Reliability Engineer, you will design, build, and maintain systems and infrastructure, ensuring their reliability, scalability, and performance while automating processes to support business needs.

Responsibilities

  • Infrastructure Management: Design, implement, and maintain scalable and resilient infrastructure using Terraform for infrastructure as code, ensuring high availability and performance
  • Kubernetes and Containers: Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker. Implement best practices for container orchestration and management
  • Systems and Application Monitoring/Observability: Develop and maintain comprehensive monitoring and observability solutions using Datadog. Ensure detailed visibility into system performance and application health
  • SLOs and SLA Management: Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure reliable and consistent service delivery
  • Incident Response and Troubleshooting: Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence. Participate in post-incident reviews and contribute to blameless postmortems
  • Reliability and Production Environment Management: Ensure the reliability and stability of our production environments. Continuously assess and improve system reliability, identifying and addressing potential points of failure
  • Automation and Scripting: Develop automation scripts and tools to reduce manual intervention and improve system reliability using Python, Bash, or Go. Implement and improve CI/CD pipelines
  • CI/CD Pipeline Management: Enhance and maintain continuous integration and continuous deployment pipelines using GitLab CI. Ensure seamless and reliable deployment processes
  • Capacity Planning and Scaling: Assist in capacity planning and ensure that systems are scalable to meet future demands. Implement auto-scaling strategies where applicable
  • Security and Compliance: Implement security best practices and ensure compliance with industry standards. Regularly review and update security policies and procedures
  • Collaboration and Support: Work closely with development teams to ensure reliability and scalability of new features and services. Provide technical support and guidance on infrastructure-related issues
  • Software Engineering for Operations: Develop and maintain internal tools and services that enhance the efficiency and reliability of our operations
  • On-Call Rotation: Participate in an on-call rotation to address production issues and collaborate in incident response efforts

Skills

  • +3 years of experience in SRE, DevOps, or a related role
  • Proficient with cloud platforms such as AWS, GCP, or Azure Experience with EC2, RDS, VPCs, and security groups is essential
  • Strong experience with Kubernetes and Docker, including deployment, scaling, and management of containerized applications
  • Expert in using Terraform for infrastructure as code. Proficient with configuration management tools such as Ansible, Puppet, or Chef
  • Extensive experience with monitoring and observability tools like Datadog, Prometheus, Grafana, ELK stack, or Splunk. Skilled in setting up detailed monitoring and logging systems
  • Proven ability to define, monitor, and maintain SLOs and SLAs to ensure reliable service delivery
  • Strong skills in scripting languages like Python, Bash, or Go. Experience automating repetitive tasks and processes
  • Familiarity with GitLab CI or similar tool for continuous integration and deployment. Experience in setting up and managing pipelines
  • Experience supporting production environments running Go or Ruby/Rails applications
  • Ability to write and update tools to support infrastructure and application management, demonstrating the principle that 'SRE is what happens when you ask a software engineer to design an operations team'
  • Deep understanding of DevOps principles, practices, and tools to drive continuous improvement in the software development lifecycle
  • Strong organizational skills, attention to detail, and the ability to work collaboratively in a team environment. Excellent documentation skills to ensure accurate and detailed records
  • Excellent analytical and problem-solving skills to diagnose and resolve complex system issues quickly and effectively

Benefits

  • Competitive salary and benefits with growth-company options grant
  • Fast- paced and professional work culture
  • Stock options with standard startup vesting - 1 year cliff; 4 years total
  • $50 monthly communication expense stipend to go towards your phone/internet bill
  • $250 stipend to enhance your WFH setup
  • Reimbursement for peripheral equipment: monitor (up to $400), keyboard and mouse (up to $200)
  • Premium medical benefits including vision and dental (100% coverage for employees)
  • Company-sponsored life and disability insurance
  • Paid parental bonding leave
  • Paid sick leave, jury duty, bereavement
  • 401k plan
  • Flexible Time Off (our team members typically take off ~3-4 weeks per year)
  • Volunteer Time Off
  • 13 scheduled holidays

Company Overview

  • PayNearMe provides a web and mobile-based cash payments platform designed to facilitate online purchases and bill payments. It was founded in 2009, and is headquartered in Santa Clara, California, USA, with a workforce of 201-500 employees. Its website is https://home.paynearme.com.
  • Apply To This Job

    Related roles