Junior data engineer at woolies01
From Yury Pitsishin (Head of Machine learning at woolworths rewards)
On top of your plan we also usually assess the following skills: - Data Structures & Algorithms (Big O notation), - Linux, - Solution Design of complex high throughput systems (with primary focus on non-functional requirements like security, cost, scalability), - DevOps, - Big Data frameworks, Spark, Scala, - solid experience with one of clouds.
I'll be happy to share more details during the catch up.
Please find below what we usually require for a Junior role.
Minimal Requirements:
Bachelor or above degree in Computer Science
3+ years of commercial software development experience
Good communication and stakeholder management skills
Solid Python and SQL
Hands-on Linux
Strong problem solving skills
Practical experience with Big Data or ML tools, packages and techniques
Ideal candidate:
Experience of building production grade ML pipelines in GCP or AWS
GCP Cloud Engineer or above level of certification
Solid experience with streaming technologies like Kafka, Pubsub or Kinesis
Questions
- Data structures and algorithms :: programming language specific?
- How does understanding BigO notation help solve data engineering problems?
- What are the kinds of tasks that should be able to be completed using linux
- What is a high throughput sytem?
- DevOps is quite broad. any key skills within this? CI/CD, Docker?
- Spark, Scala - how to get experience with these? Are personal projects enough?
- Any recommended resources?
- Who are the typical / common stakeholders?
- What are typical deliverables? 1 table? Multiple tables? Refresh frequency? within BigQuery? Documentation with field details?
- Are there any assurances or SLAs about data / pipelines that the team delivers?
- What do people think is data engineering but actually isn't handled by your team?
- There was no mention of Airflow. This seems to be a common data engineering