Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Yauheni Subota

Gdansk

Summary

Adaptable and results-oriented data engineer with 4 years of experience in the development of batch/stream data pipelines, experienced in data migration and transformation. Background includes the creation of analytics solutions, building and maintaining data processing systems on top of GCP. I possess strong programming skills in Python and an intermediate-level proficiency in Scala. My career has spanned multiple industries, including Finance, Retail, and Distribution.

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Engineer

EPAM Systems PL
08.2022 - Current
  • Designed and developed ETL pipelines using the Airflow DAG concept, encompassing tasks such as data extraction from client APIs, transforming XML into JSONL with providing metadata attributes, data validation, batch loading of data into BigQuery, and the implementation of SQL code for data unnesting, filtering, and grouping, ensuring efficient storage and delivery of processed data to support business needs.
  • Managed and orchestrated the maintenance and scheduling of Airflow DAGs.
  • Developed a Scala-based Batch Dataflow template for processing table-structured data, incorporating custom business logic on fields, and loading the transformed data into BigQuery.
  • Created a Cloud Function to trigger Dataflow jobs based on Cloud Storage bucket events, enhancing automation and efficiency.
  • Designed and implemented a Stream Dataflow template in Scala, enabling real-time transformation of XML/JSON messages and their seamless loading into BigQuery.
  • Crafted Terraform scripts to manage and maintain cloud infrastructure components, including BigQuery tables and views, Pub/Sub topics and subscriptions, Airflow DAGs, Cloud Functions, Cloud Bucket, Dataflow jobs, and NiFi templates.
  • Deployed GCP infrastructure components, web applications and triggering NiFi and Dataflow job execution via Jenkins.
  • Developed XML parser (Python) to process files in a streaming fashion, ensuring the efficient denormalization of XML data.
  • Implemented custom SQL code within an existing module to provide new attributes for enhanced business reports, based on data aggregation and analysis.

Data Engineer

EPAM Systems BY
04.2021 - 08.2022
  • Developed Spark jobs in Scala on the Databricks platform to transform data in alignment with client preferences, and load in Azure Blob Storage.
  • Deployed the Confluent Platform on a clustered infrastructure through Azure Kubernetes, configuring custom Kafka settings for enhanced data streaming.
  • Established a robust Spark environment within Databricks, integrated with Azure's CI/CD, ensuring efficient development and deployment.
  • Developed NiFi Batch templates to extract batch data from Teradata and transferring it to Cloud Storage while created NiFi Stream templates for real-time message extraction from MQ, and pushing the data to Pub/Sub topics.
  • Crafted a custom Python script to generate synthetic data following customer data modeling hierarchies, enabling automated and scheduled data flows into BigQuery.
  • Introduced a custom Qlik extension using JavaScript, enhancing functionality with a writeback feature.

Data & Workflow Engineer

Syberry
05.2020 - 04.2021
  • Provided crucial support in stabilizing and enhancing existing data processes within Airflow for the efficient handling of substantial data flows.
  • Embedded specialized features for extracting data from unique data storage systems, ensuring seamless integration with the broader data ecosystem.
  • Developed and fine-tuned custom normalizers to transform diverse data formats, including XML, JSON, TXT and HTML, to prepare it for the subsequent stages of the ETL system.
  • Established a Docker QA container within Google Cloud Platform (GCP) to facilitate the execution of quality assurance checkers for data validation.
  • Created and implemented comprehensive QA scripts to validate data deliveries, ensuring accuracy and alignment with client preferences.

Education

Bachelor of Information Technology - Software Engineer

Belarus National Technical University
Minsk, Belarus
06.2022

Skills

  • Apache Airflow
  • BigQuery
  • Jenkins
  • Apache NiFi
  • Apache Beam
  • Cloud Composer
  • Cloud Dataflow
  • Cloud Scheduler
  • Cloud Function
  • Cloud Storage
  • Apache Cassandra
  • Bash
  • Pub/Sub
  • Terraform
  • Qlik Sence
  • Apache Spark
  • Apache Kafka
  • Docker

Certification

  • Professional Data Engineer, Google

The exam assesses the ability to design and build data processing systems, ensure solution quality, and operationalize ML models

  • Associate SQL Analyst, Databricks

The earner has demonstrated their ability to use Apache Spark SQL to query, transform, and present data

Languages

Russian, Belarusian
Native language
English
Upper intermediate
B2

Timeline

Data Engineer

EPAM Systems PL
08.2022 - Current

Data Engineer

EPAM Systems BY
04.2021 - 08.2022

Data & Workflow Engineer

Syberry
05.2020 - 04.2021

Bachelor of Information Technology - Software Engineer

Belarus National Technical University
  • Professional Data Engineer, Google

The exam assesses the ability to design and build data processing systems, ensure solution quality, and operationalize ML models

  • Associate SQL Analyst, Databricks

The earner has demonstrated their ability to use Apache Spark SQL to query, transform, and present data

Yauheni Subota