Software Engineer (Data Infrastructure, Remote)

Copenhagen, Capital Region
Posted 2 weeks, 4 days ago
Software Development

About the role

Job summary

This role focuses on the data aspect of an AI team, responsible for data collection to support model training operations. The position involves building high-quality datasets at a large scale through a combination of infrastructure, engineering, and research.

Qualifications

  • BS/MS/PhD in Computer Science or a related field.
  • Over 5 years of experience in software development.
  • Proficient in bash/Python scripting within Linux environments.
  • Familiar with Docker and Infrastructure-as-Code, with experience in a major Cloud Provider, preferably GCP.
  • Experience with web crawlers and large-scale data processing is advantageous.
  • Strong multitasking abilities and adaptability to changing priorities.
  • Excellent written and verbal communication skills.

Responsibilities

  • Identify and source new audio data for the ingestion pipeline.
  • Manage and enhance the cloud infrastructure for data ingestion, currently utilizing GCP and Terraform.
  • Collaborate with scientists to optimize cost, throughput, and quality of data for model training.
  • Work with the AI Team and leadership to develop a dataset roadmap for future products.

Skills

  • Proficiency in scripting and cloud infrastructure management.
  • Experience in data processing workflows and web crawling.

Education

  • Advanced degree in Computer Science or a related field preferred.

Tools

  • GCP, Terraform, Docker, bash, Python.
Full Access

Ready to apply for this role?

Full Access gives you the company name, full job description, and a direct link to apply. The summary above helps you explore the role.

Share this job