Junior data engineer at woolies01

From Yury Pitsishin (Head of Machine learning at woolworths rewards)

On top of your plan we also usually assess the following skills: - Data Structures & Algorithms (Big O notation), - Linux, - Solution Design of complex high throughput systems (with primary focus on non-functional requirements like security, cost, scalability), - DevOps, - Big Data frameworks, Spark, Scala, - solid experience with one of clouds.

I'll be happy to share more details during the catch up.

Please find below what we usually require for a Junior role.

Minimal Requirements:

Bachelor or above degree in Computer Science

3+ years of commercial software development experience

Good communication and stakeholder management skills

Solid Python and SQL

Hands-on Linux

Strong problem solving skills

Practical experience with Big Data or ML tools, packages and techniques

Ideal candidate:

Experience of building production grade ML pipelines in GCP or AWS

GCP Cloud Engineer or above level of certification

Solid experience with streaming technologies like Kafka, Pubsub or Kinesis

Questions

Data structures and algorithms :: programming language specific?
How does understanding BigO notation help solve data engineering problems?
What are the kinds of tasks that should be able to be completed using linux
What is a high throughput sytem?
DevOps is quite broad. any key skills within this? CI/CD, Docker?
Spark, Scala - how to get experience with these? Are personal projects enough?
Any recommended resources?
Who are the typical / common stakeholders?
What are typical deliverables? 1 table? Multiple tables? Refresh frequency? within BigQuery? Documentation with field details?
Are there any assurances or SLAs about data / pipelines that the team delivers?
What do people think is data engineering but actually isn't handled by your team?
There was no mention of Airflow. This seems to be a common data engineering