Junior data engineer at woolies01

From Yury Pitsishin (Head of Machine learning at woolworths rewards)

On top of your plan we also usually assess the following skills: - Data Structures & Algorithms (Big O notation), - Linux, - Solution Design of complex high throughput systems (with primary focus on non-functional requirements like security, cost, scalability), - DevOps, - Big Data frameworks, Spark, Scala, - solid experience with one of clouds.

I'll be happy to share more details during the catch up.

Please find below what we usually require for a Junior role.

Minimal Requirements:

Bachelor or above degree in Computer Science

3+ years of commercial software development experience

Good communication and stakeholder management skills

Solid Python and SQL

Hands-on Linux

Strong problem solving skills

Practical experience with Big Data or ML tools, packages and techniques

Ideal candidate:

Experience of building production grade ML pipelines in GCP or AWS

GCP Cloud Engineer or above level of certification

Solid experience with streaming technologies like Kafka, Pubsub or Kinesis

Questions

  • Data structures and algorithms :: programming language specific?
  • How does understanding BigO notation help solve data engineering problems?
  • What are the kinds of tasks that should be able to be completed using linux
  • What is a high throughput sytem?
  • DevOps is quite broad. any key skills within this? CI/CD, Docker?
  • Spark, Scala - how to get experience with these? Are personal projects enough?
  • Any recommended resources?
  • Who are the typical / common stakeholders?
  • What are typical deliverables? 1 table? Multiple tables? Refresh frequency? within BigQuery? Documentation with field details?
  • Are there any assurances or SLAs about data / pipelines that the team delivers?
  • What do people think is data engineering but actually isn't handled by your team?
  • There was no mention of Airflow. This seems to be a common data engineering