All roles

[Remote] Senior Site Reliability Engineer, GeForce NOW

Remote · USA Full-time New today

Note: The job is a remote job and is open to candidates in USA. NVIDIA is looking for a Senior Site Reliability Engineer (SRE) to join its GeForce Now (GFN) team. The SRE ensures that GPU cloud gaming services maintain reliability and uptime, while enabling developers to make changes to the system through careful planning. Responsibilities include improving service observability, automating tasks, and supporting production systems.

Responsibilities

  • Working on building tools to improve the SRE Observability
  • Be part of the Kubernetes migration journey with VMI setup and problem solving
  • Rapidly debug and triage incidents and user-reported issues
  • Taking ownership of automating, scripting, and tooling of new/existing scripts to help the team achieve 100% automation of daily tasks
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management and launch reviews
  • Be part of an on call rotation to support production systems

Skills

  • MS or BS in Computer Science/Engineering or a related field or equivalent experience
  • 8+ year's Site reliability engineering experience working on large scale distributed micro services in a production environment with a real passion for automation and tooling
  • Very strong Kubernetes background and ability to understand Kubernetes with complex and highly available VMI setup on K8's
  • Lead significant production improvements including change management, post-mortem reviews, workflow processes, design and deliver software automation in various languages
  • Confirmed strengths in problem-solving and root causing issues, while continuously seeking ways to drive optimization, efficiency and the bottom line
  • Previous experience with Datadog, Prometheus, Alertmanager, or similar monitoring systems
  • Experience managing multi-region cloud deployments on hyperscalers like AWS, GCP, or Azure
  • Experience designing and managing deployment pipelines using tools such as GitHub Actions, GitLab CI, or ArgoCD
  • Excellent communication, presentation, social, and analytical skills; the ability to communicate complex interaction concepts clearly and persuasively across different audiences and varying levels of the organization
  • Production-grade coding proficiency in languages like Go, Python, or robust Bash scripting
  • Production on-call experience is a must. Should have served in a primary production on-call rotation, responding to and mitigating high-severity infrastructure alerts and service degradations
  • Experience working with automated anomaly detection, log clustering tools, or LLM-assisted debugging platforms
  • Comfortable using AI on a day-to-day basis as an SRE
  • Prior experience as an SRE or Service Engineer is a huge plus

Benefits

  • Equity
  • Benefits

Company Overview

  • NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. It was founded in 1993, and is headquartered in Santa Clara, California, USA, with a workforce of 10001+ employees. Its website is https://www.nvidia.com.
  • Company H1B Sponsorship

  • NVIDIA has a track record of offering H1B sponsorships, with 448 in 2026, 1872 in 2025, 1354 in 2024, 976 in 2023, 835 in 2022, 601 in 2021, 529 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    Related roles

    [Remote] Lead Reporting Analyst

    Remote · USA Full-time

    [Remote] VP Clinical Consulting & Advocacy

    Remote · USA Full-time

    [Remote] Data Center Capacity Analyst (Remote)

    Remote · USA Full-time

    [Remote] Head of Legal

    Remote · USA Full-time

    [Remote] Remote Customer Service Representative – Full-Time or Part-Time

    Remote · USA Full-time

    [Remote] Principal ProServe Cloud Architect, Healthcare and Life Sciences (HCLS) , AWSI Sales, AWSI Sales

    Remote · USA Full-time

    [Remote] Software Engineer II

    Remote · USA Full-time

    [Remote] Program Manager, Professional Services - West

    Remote · USA Full-time

    [Remote] VA.gov Program Manager (Remote)

    Remote · USA Full-time

    [Remote] Customer Service Representative

    Remote · USA Full-time

    Application Engineer

    Remote · USA Full-time

    Experienced Data Entry and Claims Specialist – Driving Efficiency and Excellence in arenaflex's Insurance Operations

    Remote · USA Full-time

    Senior Talent Intelligence & Market Research Specialist (Recruitment)

    Remote · USA Full-time

    Receptionist job at Brevard Achievement Center in Rockledge, FL

    Remote · USA Full-time

    Remote Client Experience Associate

    Remote · USA Full-time

    Remote Mental Health Counselor (LICSW, LMFT, LPCC)

    Remote · USA Full-time

    Remote Live Chat Customer Support Specialist – Flexible Hours, Competitive Pay, Global Client Portfolio

    Remote · USA Full-time

    Remote Data Entry Specialist – Pharmacy Operations & Patient Support – $75,000 Annual Salary – arenaflex

    Remote · USA Full-time

    Experienced Remote Live Chat Agent – Flexible Schedule at arenaflex

    Remote · USA Full-time

    Product Owner/Technical Business Analyst

    Remote · USA Full-time