[Remote] Sr. Site Reliability Engineer

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. PayNearMe is on a mission to simplify payments through innovative technology. As a Site Reliability Engineer, you will design, build, and maintain systems and infrastructure, ensuring their reliability, scalability, and performance while automating processes to support business needs.

Responsibilities

Infrastructure Management: Design, implement, and maintain scalable and resilient infrastructure using Terraform for infrastructure as code, ensuring high availability and performance
Kubernetes and Containers: Deploy, manage, and optimize Kubernetes clusters and containerized applications using Docker. Implement best practices for container orchestration and management
Systems and Application Monitoring/Observability: Develop and maintain comprehensive monitoring and observability solutions using Datadog. Ensure detailed visibility into system performance and application health
SLOs and SLA Management: Define, monitor, and maintain Service Level Objectives (SLOs) and Service Level Agreements (SLAs) to ensure reliable and consistent service delivery
Incident Response and Troubleshooting: Respond to incidents, perform root cause analysis, and implement solutions to prevent recurrence. Participate in post-incident reviews and contribute to blameless postmortems
Reliability and Production Environment Management: Ensure the reliability and stability of our production environments. Continuously assess and improve system reliability, identifying and addressing potential points of failure
Automation and Scripting: Develop automation scripts and tools to reduce manual intervention and improve system reliability using Python, Bash, or Go. Implement and improve CI/CD pipelines
CI/CD Pipeline Management: Enhance and maintain continuous integration and continuous deployment pipelines using GitLab CI. Ensure seamless and reliable deployment processes
Capacity Planning and Scaling: Assist in capacity planning and ensure that systems are scalable to meet future demands. Implement auto-scaling strategies where applicable
Security and Compliance: Implement security best practices and ensure compliance with industry standards. Regularly review and update security policies and procedures
Collaboration and Support: Work closely with development teams to ensure reliability and scalability of new features and services. Provide technical support and guidance on infrastructure-related issues
Software Engineering for Operations: Develop and maintain internal tools and services that enhance the efficiency and reliability of our operations
On-Call Rotation: Participate in an on-call rotation to address production issues and collaborate in incident response efforts

Skills

+3 years of experience in SRE, DevOps, or a related role
Proficient with cloud platforms such as AWS, GCP, or Azure Experience with EC2, RDS, VPCs, and security groups is essential
Strong experience with Kubernetes and Docker, including deployment, scaling, and management of containerized applications
Expert in using Terraform for infrastructure as code. Proficient with configuration management tools such as Ansible, Puppet, or Chef
Extensive experience with monitoring and observability tools like Datadog, Prometheus, Grafana, ELK stack, or Splunk. Skilled in setting up detailed monitoring and logging systems
Proven ability to define, monitor, and maintain SLOs and SLAs to ensure reliable service delivery
Strong skills in scripting languages like Python, Bash, or Go. Experience automating repetitive tasks and processes
Familiarity with GitLab CI or similar tool for continuous integration and deployment. Experience in setting up and managing pipelines
Experience supporting production environments running Go or Ruby/Rails applications
Ability to write and update tools to support infrastructure and application management, demonstrating the principle that 'SRE is what happens when you ask a software engineer to design an operations team'
Deep understanding of DevOps principles, practices, and tools to drive continuous improvement in the software development lifecycle
Strong organizational skills, attention to detail, and the ability to work collaboratively in a team environment. Excellent documentation skills to ensure accurate and detailed records
Excellent analytical and problem-solving skills to diagnose and resolve complex system issues quickly and effectively

Benefits

Competitive salary and benefits with growth-company options grant
Fast- paced and professional work culture
Stock options with standard startup vesting - 1 year cliff; 4 years total
$50 monthly communication expense stipend to go towards your phone/internet bill
$250 stipend to enhance your WFH setup
Reimbursement for peripheral equipment: monitor (up to $400), keyboard and mouse (up to $200)
Premium medical benefits including vision and dental (100% coverage for employees)
Company-sponsored life and disability insurance
Paid parental bonding leave
Paid sick leave, jury duty, bereavement
401k plan
Flexible Time Off (our team members typically take off ~3-4 weeks per year)
Volunteer Time Off
13 scheduled holidays

Company Overview

PayNearMe provides a web and mobile-based cash payments platform designed to facilitate online purchases and bill payments. It was founded in 2009, and is headquartered in Santa Clara, California, USA, with a workforce of 201-500 employees. Its website is https://home.paynearme.com.

Apply To This Job

Apply

[Remote] Sr. Site Reliability Engineer

Related roles

[Remote] Capital Sales Executive

[Remote] SEO Content Writer

[Remote] Technical Account Manager

[Remote] Strategic Account Executive

[Remote] Data and Business Intelligence Analyst

[Remote] HCM Staff Consultant

[Remote] Clinical Research Regulatory Specialist

[Remote] Project Manager

[Remote] Customer Support Rep

[Remote] Business Development Manager

Work at Home Beauty Customer Service Specialist on behalf of Sephora

Advanced Mfg Engineer Sr Adv

Space System Scheduler

Patient Navigator (H)

Market Unit Procurement Lead, IMEA

Teacher Education Adjunct Faculty Summer 45

Remote Airline Support Jobs (Delta Airlines – Virtual Positions)

Security Risk Analyst I Remote (Utah, Idaho, Arizona, Nevada)

Regional Category Sales Specialist (East) - Wearable Safety Solutions

IT Trainer