Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Mateusz Krupski

Warsaw, Poland

Summary

Experienced, results-oriented and problem-solving Senior Data Engineer with more than 4 years of diverse experience in Information Technology, including Development and Implementation of various Big Data applications as well as DevOps culture (CI/CD), mainly in cloud environments. Strong team player with excellent communication skills, as well as effective working independently.

Overview

4
4
years of professional experience
1
1
Certification

Work History

Senior Data Engineer

GetInData | Part Of Xebia
09.2023 - Current
  • Collaborated in new data platform project for retail group Ahold Delhaize - Albert Heijn, focusing on advanced data solutions and system optimisations
  • Developed and optimized Databricks notebooks for data transformation, delivering structured data to consumers using Python, SQL, and Spark, ensuring high data quality and accessibility
  • Designed and maintained Continuous Integration/Continuous Deployment (CI/CD) environment within Databricks workspace, streamlining code deployment and enhancing collaboration.
  • Development of ETL backend for Store Support application, ensuring efficient data extraction, transformation, and loading processes for operational excellence.

Senior Big Data Developer

Santander Corporate & Investment Banking
06.2023 - Current
  • Designed and implemented a robust financial data flow process architecture, optimizing for accuracy and efficiency in data handling
  • Defined and standardized data architecture, formats, and partitions, significantly improving the efficiency of Big Data processing.
  • Developed and maintained Scala/Spark engines, ensuring technical excellence and functional validation of executions for consistent data outcomes.
  • Leading the development of a project focused on distributed client participation across countries in investment transactions, enhancing global data integration.

Data Engineer

BI4ALL
02.2023 - Current
  • Successfully designed and implemented a complex ETL pipeline using Databricks (Python, PySpark, Spark SQL) and Azure Data Factory to efficiently ingest and transform data from multiple sources into a Data Lake and Azure SQL Database.
  • Automated data processing using Databricks Jobs and Data Pipelines (DLT), improving overall efficiency and accuracy of the data processing with Spark.
  • Created CI/CD pipelines in Azure DevOps to automate development implementation with Databricks and Azure Data Factory, enabling faster and more reliable deployment of data pipelines.

DevOps / Data Engineer - IT Manager

Procter & Gamble
03.2022 - 02.2023
  • Developed and maintained a highly scalable Big Data application using Python, Spark SQL, PySpark and deployed it on Azure Kubernetes Service with Spark pods implementation (100M records per processing)
  • Created a Docker-based app development environment (Python, Spark, YAML) and integrated it with Redis Cache for efficient app development.
  • Created custom log queries using Azure Log Analytics Query Language (KQL) to search, filter, and analyze app performance.
  • Ingested data with Python, PySpark, Spark SQL and optimised query, processing using Spark.
  • Optimized PostgreSQL database to improve query performance.
  • Built and managed Data Pipelines and Jobs with Databricks (Delta Live Tables, PySpark, Python, SQL) and Azure Data Factory to streamline data processing.
  • Built and maintained CI/CD pipelines using Azure DevOps and managed infrastructure environments with Terraform.
  • Led a team of 2 Data Engineers in the refactoring of the application architecture and implementation of the Big Data App solution, utilizing Agile methodology.

Junior Data Scientist

Atos
08.2021 - 02.2022
  • Designed and implemented an ETL pipeline with Databricks (Python, Pyspark, Spark SQL) and Azure Data Factory to load data into MySQL database.
  • Created reports and dashboards with Tableau and PowerBI for data-driven decision-making.
  • Assisted in creating Machine Learning models for predictive analytics.
  • Utilized Azure Synapse Analytics and Azure Log Analytics to troubleshoot and resolve data processing issues.

Data Engineer

Nissan Sales CEE
09.2019 - 08.2021
  • Built scripts using Python, PySpark and Spark SQL to schedule Databricks Jobs for financial calculations and task automation.
  • Developed and maintained data models in MSSQL database, migrating to Azure SQL Database.
  • Built data pipelines with Azure Data Factory for data processing.
  • Implemented Azure Functions with Python and Spark for data extractions.
  • Built Azure DevOps project structure and CI/CD pipelines for consistent deployment of code and infrastructure. Integrated environments.

Education

Master of Engineering - Computer Science - Specialisation in Data Engineering.

Warsaw University of Technology
Warsaw, Poland
06.2023

Bachelor of Engineering - Electrical Engineering

Warsaw University of Technology
Warsaw, Poland
01.2022

Bootcamp - Cloud Data Engineering With AWS

Onwelo Academy
Warsaw, Poland
04.2022

Bootcamp - Data Engineering With Databricks

Accenture Academy
Warsaw, Poland
03.2022

Skills

  • Programming Languages - Python, Scala, Spark, PySpark, SQL, Java and Shell Scripting
  • Databases - MSSQL, PostgreSQL, MySQL, MongoDB, Redis, Neo4j
  • Databricks - Delta Lake, Delta Live Tables, Jobs, Data Pipelines,
  • Azure - Azure Data Factory, Azure Synapse Analytics, Azure Log Analytics Azure Data Lake, Azure DevOps, Functions, WebApps, AKS, Cosmos DB, SQL Databases, VMs
  • AWS - Redshift, Athena, Glue, Lambda, S3, EC2
  • Tools - Docker, Kubernetes, CI/CD pipelines, Terraform, Jira, GitHub
  • Data Visualisation - PowerBi, Tableau, Databricks SQL Dashboards
  • Methodologies - Agile, Scrum, Waterfall, DevOps
  • Machine Learning algorithms - Linear and Logistic Regression, Decision tree, SVM, Naive Bayes, KNN, K-means, Random Forest

Certification

Databricks Certified: Data Engineer Associate

Databricks Certified: Associate Developer for Apache Spark

Databricks Certified: Lakehouse Fundamentals

Microsoft Certified: Azure Data Engineering Associate DP-203

Microsoft Certified: Azure Developer Associate AZ-204

Microsoft Certified: Azure Data Scientist Associate DP-100

Microsoft Certified: Azure Data Fundamentals DP-900

Microsoft Certified: Power BI Data Analyst Associate PL-300

AWS Certified: AWS Cloud Practitioner CLF-C01

DevOps Institute Certified: DevOps Foundation

Languages

Polish
Native language
English
Proficient
C2
German
Intermediate
B1

Timeline

Senior Data Engineer

GetInData | Part Of Xebia
09.2023 - Current

Senior Big Data Developer

Santander Corporate & Investment Banking
06.2023 - Current

Data Engineer

BI4ALL
02.2023 - Current

DevOps / Data Engineer - IT Manager

Procter & Gamble
03.2022 - 02.2023

Junior Data Scientist

Atos
08.2021 - 02.2022

Data Engineer

Nissan Sales CEE
09.2019 - 08.2021

Master of Engineering - Computer Science - Specialisation in Data Engineering.

Warsaw University of Technology

Bachelor of Engineering - Electrical Engineering

Warsaw University of Technology

Bootcamp - Cloud Data Engineering With AWS

Onwelo Academy

Bootcamp - Data Engineering With Databricks

Accenture Academy
Mateusz Krupski