Senior Data Developer, Analytics Preparation

Masovia, Warsaw, Poland
04 May 2021
11 May 2021
Full Time
Contract Type
Experience Level
Experienced (non-manager)

Within Data Office, you be a key member of the Analytics Data Preparation team. This technical team is dedicated to preparing bespoke datasets to-order to accelerate data science activities across AstraZeneca's R&D portfolio.

In this senior technical position, you will lead creation of bespoke datasets and data assemblies for our scientists.

We ask you to draw upon various blends of analytical, data-scientific and data development skills to assess data-scientist project objectives, understand the data and system landscape, then use this information to design, generate and verify your outputs.

You will be a specialist in data formatting wrangling, transformation and working 'smart' with data toolkits to achieve results rapidly.

Furthermore, as the work will be for R&D customers we need a specialist to develop a good working understanding of the scientific applications of data, allowing you to partner with our data scientists and accelerate their work.

As no two projects will be the same, you will use technical, creative problem-solving and all-round good development practice skills to achieve successful outcomes, including supporting and mentoring other team members. Furthermore, you will be encouraged to utilise your experience to help the team improve development practices, support an agile, value-focused culture and provide input into technical projects sought at improving data-science tools and infrastructure.

Your work within the Analytics Data Preparation team will enable data evidence and scientific innovation to proceed at pace and at scale within AstraZeneca, ultimately accelerating development of new, life-changing medicines for patients.

Typical Accountabilities

  • Lead technical and data analysis work to establish requirements for new data preparation projects.
  • Design data preparation workflows and define outputs (e.g., proposed data models and modes of access) and success criteria for projects.
  • Request and setup appropriate analytical environments a to achieve data preparation and data-scientific outcomes.
  • Perform data wrangling, cleaning, combining, transformation, cross-linkage etc. to achieve new datasets and data assemblies that enable specific data-scientific research workflows to be conducted to a high quality and in an accelerated manner.
  • Generating reusable scripting and code notebooks for sharing, fostering reuse and supporting smart working practices.
  • Setting out data verification and testing plans, including UAT and user handover (data, code, information 'readmes')
  • As a senior member of the Analytics Data Preparation Team, we require you draw on passion for cutting-edge solutions to support the team lead in driving better business results, help enable entirely new data science paradigms and support operational efficiently.
  • Actively encouraging, agile and outcomes-focused team culture.
  • Partner with Standards Group and Early Study management colleagues to support portfolio and TA level data standards to enable the right balance of efficiency, flexibility and value in how data is coordinated and used


  • Must have Pharma or Healthcare experience.
  • MSc in life sciences/economics/computer science
  • Excellent skills in one or multiple scripting and programming languages to (Python, R, SQL etc)!
  • Experience of data analysis profiling, investigating, interpreting and detailing data requirements, e.g. data modelling techniques and hands on modelling experience.
  • Experience of agile working practices, e.g., Scrum!
  • Solid grasp of applied mathematical, machine learning and AI techniques and workflows.


  • Experience with knowledge graphs
  • Good awareness of a range of data & analytics patterns including distributed computing (Hadoop/Spark), NoSQL, virtualization, data streaming, container technology (e.g., Docker), traditional warehousing.
  • Knowledge and preferably experience of the Big-Data ecosystem. This might include: HDFS, Hadoop, Parquet, Spark, Spark streaming, HBase, HIVE, PIG, Presto, Sqoop, Mesos etc.
  • Phd in life sciences/economics/computer science is a plus
  • Any domain knowledge in Oncology is valued