SaaS Cloud Engineer
MEXICO
Job description
Job Description
Job Description Summary
GE Vernova's GridOS Platform Engineering team is building the next generation of SaaS reliability for critical energy infrastructure. The SaaS Cloud Engineer sits at the heart of our System Reliability Engineering (SRE) team, owning the end-to-end cloud provisioning lifecycle for every customer environment — from Day 0 bootstrap through Day 2 continuous operations. You will work alongside Platform SRE, Observability, Production DevOps, and SecOps engineers to ensure that GridOS SaaS products meet the highest standards of availability, security, and cost efficiency across our US and international customer base.
Job Description
Roles and Responsibilities
Day 0 — Provision & Bootstrap
·
Own per-customer AWS account provisioning;
·
Automate account bootstrap workflows using Infrastructure as Code (Terraform / AWS CloudFormation) and CI/CD pipelines (GHA / ArgoCD).
·
Implement and maintain Cyber Guardrails aligned to GESOS standards, including jumphost configuration, IAM policies, and VPC networking.
·
Deploy standardized cloud infrastructure baselines: AWS CloudTrail, CloudWatch, GuardDuty, Security Hub, and Config Rules.
·
Configure DNS, network connectivity, and cross-account trust relationships for each customer environment.
Day 1 — Deploy, Scale & Validate
·
Collaborate with Platform SRE to define sizing, scaling, and SLO baselines for each customer workload.
·
Support progressive delivery pipelines (blue/green, canary) to ensure zero-downtime deployments.
·
Integrate cloud-native observability hooks (CloudWatch, synthetic monitors) for new customer environments.
·
Assist with acceptance testing validation gates before production cutover.
Day 2 — Secure, Operate & Optimize
·
Drive FinOps practices: right-size resources, implement savings plans, and produce monthly cost reports per customer using AWS Cost Explorer.
·
Maintain cloud security posture: apply CVE patches, respond to compliance and audit requirements in coordination with SecOps.
·
Participate in on-call rotations for incident response (Level 1/2), root cause analysis (RCA), and BC/DR exercises.
·
Continuously improve account automation, reducing toil through scripting (Python, Bash) and runbook codification.
·
Monitor FinOps KPIs and flag anomalies proactively to the SRE Lead.
Required Experience
·
3-5 years of hands-on experience in cloud infrastructure, SRE, or DevOps engineering roles.
·
Deep AWS expertise — EC2, EKS, S3, VPC, IAM, CloudTrail, CloudWatch, GuardDuty, Organizations, Control Tower.
·
Proven proficiency with Infrastructure as Code — Terraform or AWS CloudFormation.
·
Experience with container orchestration (Kubernetes/EKS) and related tooling (Helm, Rancher).
·
Working knowledge of CI/CD pipelines — GitHub Actions (GHA) and/or ArgoCD.
·
Scripting fluency in Python and/or Bash for automation and operational tooling.
·
Demonstrated experience with cloud security best practices: IAM least privilege, security group design, encryption at rest/in-transit.
·
Exposure to FinOps concepts — cost allocation tagging, savings plans, Reserved Instances analysis.
Nice to Have
·
Experience with multi-tenant SaaS account vending machines (AWS Control Tower, Landing Zone Accelerator).
·
Familiarity with Cyber Security Standard and Policies in regulated environments.
·
Knowledge of GovCloud or regulated-industry compliance (FedRAMP, NERC CIP, SOC 2).
·
Exposure to Backstage IDP or similar developer portals.
·
AWS certifications: Solutions Architect (Associate or Professional), DevOps Engineer Professional.
Key Skills & Technologies
· AWS (EKS, IAM, CloudTrail, CloudWatch, RDS, MSK, SQS, S3, etc.)
· Infrastructure as Code - Terraform
· Kubernetes - EKS, Rancher
· CI/CD - Jenkins, GitHub Actions
· CD - ArgoCD, Flux
· Scripting - Python / Bash
· FinOps / Cost Explorer
· Observability - Grafana/Prometheus, Splunk, Datadog or Dynatrace
· BC/DR Planning
· Incident Response
· Compliance & SecOps
What Success Looks Like
In your first 30 days:
·
Shadow existing account provisioning workflows and document gaps in automation.
·
Complete onboarding to the SRE on-call rotation as a backup responder.
·
Stand up a personal sandbox environment using the team's IaC templates.
In your first 90 days:
·
Deliver the first iteration of automated account provisioning, reducing SLA to under 8 hours.
·
Instrument at least one customer environment with full CloudTrail, CloudWatch, and GuardDuty coverage.
·
Present a FinOps baseline report for existing customer accounts.
In your first year:
·
Achieve the 4-hour account provisioning SLA target across all new customer onboardings.
·
Own the Cyber Guardrails automation library and keep it current against GESOS standards.
·
Contribute improvements to at least two SRE runbooks and one DR playbook.
Working Environment & Team Culture
The GridOS SRE team operates within GE Vernova's Electrification Software division, supporting mission-critical energy management systems for utilities across North America. You will work in a highly regulated environment with active Cyber Security and Architecture Board oversight, making rigor, documentation, and compliance second nature.
Our team values:
·
Automation over toil — if you do something twice, automate it the third time.
·
Blameless incident culture with structured RCA and continuous improvement.
·
Clear escalation paths and transparent on-call expectations with Primary/Backup coverage.
·
Cross-functional partnership with AutoDev, SIRE, CRE, and Product Engineering teams.
Education Qualification
Bachelor's Degree in Computer Science or “STEM” Majors (Science, Technology, Engineering and Math) with basic experience.
Business Acumen:
Strong oral and written communication skills
Strong business analysis and problem solving skills
Proactively engages with cross-functional teams to resolve issues and design solutions using critical thinking and analytical skills and best practices
Ability to interact at all levels of the organization and with other GE businesses
Leadership:
Excellent communicator, works well in a team environment, and welcomes challenges
Self-starter with ability to manage multiple priorities in a fast paced work environment
Strong problem solving and analytical skills demonstrated the ability to assimilate new information and understand complex topics
Additional Information
Relocation Assistance Provided: Yes
- This is a remote position