Aragon is a small team of people that spans a broad range of backgrounds, interests, and geographies. We're entrepreneurs and Ph.D.s, technologists and skeptics, troublemakers and problem solvers, who are striving to realize theAragon Manifestoby designing and building unstoppable tooling for the creation and management of Decentralized Autonomous Organizations (DAOs).
Aragon has overseen the production of the leading smart contract framework for DAOs, with toolkits for developers to seamlessly integrate their own apps with DAO functionality.
As a Site Reliability Engineer at Aragon, you'll join the Engineering Team and report directly to the CTO, where you'll play a pivotal role in ensuring our platform is efficient, scalable, reliable, and secure. Our new Aragon OSx and Aragon App have been designed to enable everyone to craft the next wave of digital organizations.
- Lead the management of our CI/CD pipelines to ensure they support rapid software deployments without compromising system stability.
- Maintain and continually optimize our backend services and infrastructure (mostly Kubernetes), ensuring optimal performance and unwavering reliability to ensure organizations running on our stack know they can depend on us.
- Set up and oversee monitoring tools, to proactively identify and address system health issues that would otherwise go missed.
- Craft and continually refine an incident response plan to guarantee swift recovery and minimal user impact in worst-case scenarios, further leveraging them for learning and continuous improvement.
- Engage proactively across internal team boundaries, discerning when your expertise is required and when it isn’t.
- 3+ years in an SRE, DevOps, or similar role
- Proficiency in setting up and managing CI/CD tools (e.g. Github Actions or CircleCI, etc.)
- Experience with monitoring tools (e.g. Grafana, Prometheus, and Kibana)
- Advanced administration skills in Linux as well as cloud providers (e.g. Google Cloud, GK, AWS, Azure, etc.)
- Experience of any scripting language for custom tooling (e.g. Typescript, Python, etc.)
- Demonstrated ability in designing and implementing incident response plans for digital platforms.
- Practical experience in orchestrating applications with Kubernetes in a production environment and GitOps workflows
- Understanding of how to work in an agile environment, using git flows, and tools like Jira and Github
- Passionate about the possibilities of decentralized autonomous organizations and the impacts they may have on how humans work together to solve problems
- Excellent English and technical communication - both written and oral (C1 or C2)
- Proven contributions to open-source projects within blockchain, cryptography, decentralized systems, or the broader web3 ecosystem
- Knowledge of the inner workings of the blockchain and related decentralized systems (IPFS, ENS, Subgraph, node operation, etc.)
- Experience in managing Elasticsearch or Logingestion
- General understanding of Solidity
- Proficiency with Infrastructure-as-a-Code tools, such as Terraform and Ansible.
- Understanding of cloud networking infrastructure.
- Broad and strong web3 sensibilities, including experience with various wallets, interacting with dApps, deploying smart contracts, and safely managing your own onchain assets
We value freedom and responsibility among our contributors. In practice, this means that we're a remote distributed organization that's flexible as to where you want to work and your schedule, as long as you're within +/- 3 hours of Central European Time. We will trust you to accommodate to best support your team.