Site Reliability Engineer (SRE)

Job at Blockdaemon

Remote (US)

Full time

Blockdaemon is looking for a Site Reliability Engineer (SRE) to join our rapidly growing team and support our mission to connect institutions to blockchains through a single integration. The Site Reliability Engineer will work with all facets of the business to help streamline and scale our infrastructure. In this role, you will be responsible for being a subject matter expert in network architecture and design implementation, working closely with Engineers from all parts of the business to grow Blockdaemon to meet the requirements of the Web3 ecosystem.

Position Overview:

  • Become an internal support system and leader for operational health and incident response
  • Partner and support the overall engineering organization and elevate incident management
  • Review and operationalize SLO/SLI/SLA for maximum efficiency
  • Design, implement, and troubleshoot services for supporting our cloud infrastructure to manage and support our nodes
  • Improve our infrastructure capabilities, optimizing for cost, simplicity, and maintainability
  • Utilize continuous integration/continuous delivery (CI/CD) using latest DevOps tools and innovative methods
  • Build strong and highly functional partnerships with product and other technology teams
  • Support senior engineers through outages and incidents for a business requiring 24x7 coverage
  • Build automations and self-service tooling with a security conscious mindset
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating processes to continually improve
  • Troubleshoot various issues around reliability, resiliency, scalability and availability
  • Assist with oncall and triage rotation
  • Removing barriers to building and shipping products across bare metal and cloud service providers

Required Experience:

  • 5+ years background in DevOps, Site Reliability Engineering, or Production Engineering
  • You have experience running a mission critical service at scale
  • Prior experience running critical production systems in a Linux environment
  • Passion for ensuring all things end-to-end observed and monitored
  • Deep knowledge of distributed system design and operation
  • Solid understanding of web and network protocols and standards (HTTP, TLS, DNS, etc)
  • Experience writing automation tools & eagerness to "automate all the things"
  • Experience building large applications from scratch, complete with CI/CD infrastructure
  • Experience with at least one of the major cloud providers (Amazon Web Services, Google Compute, Microsoft Azure)
  • Experience managing Kubernetes clusters or some other container orchestration infrastructure
  • You have worked with common infrastructure tools like Kubernetes, Docker, Terraform, Ansible, Consul, Packer, Puppet, and Helm
  • Strong sense of ownership, entrepreneurial spirit, and/or startup-like experience, capable of driving towards solutions independently while seeking feedback when appropriate
  • Knowledge of at least one (1) scripting language

Company: Blockdaemon

Website: Company's website

Skills: devopsdesignoperations

Please support us by letting Blockdaemon know that you found the job on Aworker. Thank you🙏

Receive

web3 jobs

Join 0+ people getting web3 jobs in their inbox