Site Reliability Engineer

Site Reliability Engineer (x/f/m)

Full-time (Remote optional; EU time zones preferred)


Emphasis: AWS


Salary: €60K-80K



About Meshcapade

Meshcapade is a startup creating realistic human avatars for use in research, apparel, biomechanics, virtual reality and film. Using machine learning and computer vision, we model the nuances of human shape and movement. We can automatically convert photos, 3D & 4D scans, RGB-D sequences, Mocap and IMU data into realistic 3D humans. Our methods derive from state of the art, patented research methods. Our core product, digidoppel, is a consumer-facing platform for the creation, modification, and delivery of our automated 3D human avatars and related assets. Our clients run the gamut of global names; a broad mix of tech, media, health and fitness, apparel, and education.



What we offer:

We are a diverse team of passionate creators from a variety of backgrounds, seeking to change how people generate, think about, and make use of digital human avatars. Compensation and benefits are respectively competitive. Our offices are based in Tübingen, Germany. Remote is available, within the EU time zone or nearby. Work hours are flexible without advanced notice, with no scheduled meetings on Fridays. Our team is usually keen to help out with anything and everything they can, but we each have our specialisation, so collaboration is always fun and educational.



Description:

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. While we are still small, we aim to do things well from the start to make it easy to scale the team and our products without causing issues for our customers. A successful applicant will be an enthusiastic problem solver with 2-5 years working in software, devops, or infrastructure engineering roles. You will work closely with our senior backend and infrastructure engineer, and CTO to develop our infrastructure, visibility, deployment automation, and security solutions. You are expected to be learning on the job and taking ownership of your projects and solutions.

You will be helping to maintain and improve:

  • AWS infrastructure using EKS and ECS for compute, and deployed using terraform

  • Applications and cluster services running in EKS clusters using kustomize and helm, where an automated deployment strategy needs to be implemented.

  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.

  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.

  • Logging, monitoring, authentication, and authorization solutions within our products.

  • Job scheduling and autoscaling solutions

  • Engage with the team for incident response and postmortems



Requirements:

  • BS or higher in a technical field or meaningful experience.

  • 2-5 years of experience in a software engineering related field

  • Good understanding of programming

  • Basic experience with containerised environments and cloud infrastructure

  • Communicate effectively about your ideas and be willing to ask for help

  • Ability to adapt to different contexts

  • Willingness to learn lots of different tools as you go

  • Results oriented, highly motivated

  • Team player to the core



Bonus Skills:

    • Terraform

    • Kustomize

    • Kubernetes

    • AWS (or other cloud provider)

    • EKS

    • ECS

    • Gitlab-ci

    • Jenkins

    • GPG/blackbox

    • Python

    • Golang

    • Typescript

    • Bash

    • Grafana

    • Loki

    • Prometheus

    • Docker

    • Buildkit

    • Flux

    • Linux

    • Software testing

    • Networking basics

    • Stateless programming

    • Concurrent programming

    • Any GPU API (eg Vulcan, Cuda, DirectX, OpenGL)