Site Reliability Engineer
Site Reliability Engineer (x/f/m)
Full-time (Remote optional; EU time zones preferred)
Emphasis: AWS
Salary: €60K-80K
About Meshcapade
Meshcapade is a startup creating realistic human avatars for use in research, apparel, biomechanics, virtual reality and film. Using machine learning and computer vision, we model the nuances of human shape and movement. We can automatically convert photos, 3D & 4D scans, RGB-D sequences, Mocap and IMU data into realistic 3D humans. Our methods derive from state of the art, patented research methods. Our core product, digidoppel, is a consumer-facing platform for the creation, modification, and delivery of our automated 3D human avatars and related assets. Our clients run the gamut of global names; a broad mix of tech, media, health and fitness, apparel, and education.
What we offer:
We are a diverse team of passionate creators from a variety of backgrounds, seeking to change how people generate, think about, and make use of digital human avatars. Compensation and benefits are respectively competitive. Our offices are based in Tübingen, Germany. Remote is available, within the EU time zone or nearby. Work hours are flexible without advanced notice, with no scheduled meetings on Fridays. Our team is usually keen to help out with anything and everything they can, but we each have our specialisation, so collaboration is always fun and educational.
Description:
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. While we are still small, we aim to do things well from the start to make it easy to scale the team and our products without causing issues for our customers. A successful applicant will be an enthusiastic problem solver with 2-5 years working in software, devops, or infrastructure engineering roles. You will work closely with our senior backend and infrastructure engineer, and CTO to develop our infrastructure, visibility, deployment automation, and security solutions. You are expected to be learning on the job and taking ownership of your projects and solutions.
You will be helping to maintain and improve:
AWS infrastructure using EKS and ECS for compute, and deployed using terraform
Applications and cluster services running in EKS clusters using kustomize and helm, where an automated deployment strategy needs to be implemented.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
Logging, monitoring, authentication, and authorization solutions within our products.
Job scheduling and autoscaling solutions
Engage with the team for incident response and postmortems
Requirements:
BS or higher in a technical field or meaningful experience.
2-5 years of experience in a software engineering related field
Good understanding of programming
Basic experience with containerised environments and cloud infrastructure
Communicate effectively about your ideas and be willing to ask for help
Ability to adapt to different contexts
Willingness to learn lots of different tools as you go
Results oriented, highly motivated
Team player to the core
Bonus Skills:
Terraform
Kustomize
Kubernetes
AWS (or other cloud provider)
EKS
ECS
Gitlab-ci
Jenkins
GPG/blackbox
Python
Golang
Typescript
Bash
Grafana
Loki
Prometheus
Docker
Buildkit
Flux
Linux
Software testing
Networking basics
Stateless programming
Concurrent programming
Any GPU API (eg Vulcan, Cuda, DirectX, OpenGL)