Junior data engineer at woolies01
From Yury Pitsishin (Head of Machine learning at woolworths rewards)
On top of your plan we also usually assess the following skills: - Data Structures & Algorithms (Big O notation), - Linux, - Solution Design of complex high throughput systems (with primary focus on non-functional requirements like security, cost, scalability), - DevOps, - Big Data frameworks, Spark, Scala, - solid experience with one of clouds.
I'll be happy to share more details during the catch up.
Please find below what we usually require for a Junior role.
Bachelor or above degree in Computer Science 3+ years of commercial software development experience Good communication and stakeholder management skills Solid Python and SQL Hands-on Linux Strong problem solving skills Practical experience with Big Data or ML tools, packages and techniques
Experience of building production grade ML pipelines in GCP or AWS GCP Cloud Engineer or above level of certification Solid experience with streaming technologies like Kafka, Pubsub or Kinesis
- Data structures and algorithms :: programming language specific?
- How does understanding BigO notation help solve data engineering problems?
- What are the kinds of tasks that should be able to be completed using linux
- What is a high throughput sytem?
- DevOps is quite broad. any key skills within this? CI/CD, Docker?
- Spark, Scala - how to get experience with these? Are personal projects enough?
- Any recommended resources?
- Who are the typical / common stakeholders?
- What are typical deliverables? 1 table? Multiple tables? Refresh frequency? within BigQuery? Documentation with field details?
- Are there any assurances or SLAs about data / pipelines that the team delivers?
- What do people think is data engineering but actually isn't handled by your team?
- There was no mention of Airflow. This seems to be a common data engineering