Have you built and managed large cloud server deployments that have seen real production usage? Are you an expert at automation tools like Chef, Consul, Terraform, and Vault? Do you the concepts of immutable infrastructure or DevSecOps call out to you?
If so, we’d like you to learn about Files.com!
Files.com operates dozens of services over 150+ cloud server in 10 AWS regions. We rely on our ever-growing infrastructure team to keep those services running smoothly and securely.
About the Role
You’ll be working with our existing deployments of Chef, Vault, Consul, Docker, Ansible, ELK, Grafana, Statsd, Asterisk, MySQL, Redis, Memcached, Zeromq, Puma, Jenkins, and many other exciting open source systems.
Of course, you’ll also have the freedom to deploy something else if it gets the job done.
As a member of our infrastructure team, your work will be mostly project-based, but will also involve being part of an on-call rotation for the systems you maintain. (Generally there are not many after hours incidents.)
Examples of Projects our Infrastructure Team Tackles:
- Building zero-downtime failover from one AWS region to another for complex web applications.
- Securing our network using tools like Terraform and Vault.
- Deploying and managing internal services for things like LDAP, VPN, and telephone.
- Managing our AWS infrastructure that includes dozens of VPCs, S3 buckets, etc.
- Designing and building our sophisticated monitoring stack and app uptime alerting.
- Contributing to the codebase of our home-built FTP and SFTP server software that runs the FTP/SFTP interfaces of Files.com
- Maintaining our complex system of allocating dedicated IPs to Files.com customers and keeping those IPs highly available even across server/AZ migrations.
- 5+ years of directly applicable experience.
- Experience managing large cloud server deployments that have seen real production usage.
- Experience building distributed, failure-resistant architecture, including disaster recovery, backups, failover, etc.
- Significant experience working with GNU/Linux servers, including a complete understanding of the command line, /proc, services, processes, virtual memory, etc.
- Experience diagnosing and resolving problems in mission-critical environments.
- Comprehensive understanding of networking concepts (layers, firewalls, DNS, VPN, etc) and how to build secure infrastructure and an awareness of common server security vulnerabilities.
- Proficiency with configuration management tools, such as Chef or Puppet, and fluency with at least one major scripting language.
- Experienced programmer capable of writing code in at least 2-3 major programming languages.
- Contributions to major open source projects.
- Familiarity with large scale monitoring and analysis systems, such as ELK or Splunk (we use ELK).
- History managing a large database at scale (we use MySQL).
- Experience with the advanced features of public cloud platforms such as AWS or Azure (we use AWS).
- Experience working on a remote team.