Summary

Overview

Work History

Education

Skills

Certification

Languages

Timeline

Yauheni Subota

Gdansk

Summary

Adaptable and results-oriented data engineer with 4 years of experience in the development of batch/stream data pipelines, experienced in data migration and transformation. Background includes the creation of analytics solutions, building and maintaining data processing systems on top of GCP. I possess strong programming skills in Python and an intermediate-level proficiency in Scala. My career has spanned multiple industries, including Finance, Retail, and Distribution.

Overview

years of professional experience

Certification

Work History

Data Engineer

EPAM Systems PL

08.2022 - Current

Designed and developed ETL pipelines using the Airflow DAG concept, encompassing tasks such as data extraction from client APIs, transforming XML into JSONL with providing metadata attributes, data validation, batch loading of data into BigQuery, and the implementation of SQL code for data unnesting, filtering, and grouping, ensuring efficient storage and delivery of processed data to support business needs.
Managed and orchestrated the maintenance and scheduling of Airflow DAGs.
Developed a Scala-based Batch Dataflow template for processing table-structured data, incorporating custom business logic on fields, and loading the transformed data into BigQuery.
Created a Cloud Function to trigger Dataflow jobs based on Cloud Storage bucket events, enhancing automation and efficiency.
Designed and implemented a Stream Dataflow template in Scala, enabling real-time transformation of XML/JSON messages and their seamless loading into BigQuery.
Crafted Terraform scripts to manage and maintain cloud infrastructure components, including BigQuery tables and views, Pub/Sub topics and subscriptions, Airflow DAGs, Cloud Functions, Cloud Bucket, Dataflow jobs, and NiFi templates.
Deployed GCP infrastructure components, web applications and triggering NiFi and Dataflow job execution via Jenkins.
Developed XML parser (Python) to process files in a streaming fashion, ensuring the efficient denormalization of XML data.
Implemented custom SQL code within an existing module to provide new attributes for enhanced business reports, based on data aggregation and analysis.

Data Engineer

EPAM Systems BY

04.2021 - 08.2022

Developed Spark jobs in Scala on the Databricks platform to transform data in alignment with client preferences, and load in Azure Blob Storage.
Deployed the Confluent Platform on a clustered infrastructure through Azure Kubernetes, configuring custom Kafka settings for enhanced data streaming.
Established a robust Spark environment within Databricks, integrated with Azure's CI/CD, ensuring efficient development and deployment.
Developed NiFi Batch templates to extract batch data from Teradata and transferring it to Cloud Storage while created NiFi Stream templates for real-time message extraction from MQ, and pushing the data to Pub/Sub topics.
Crafted a custom Python script to generate synthetic data following customer data modeling hierarchies, enabling automated and scheduled data flows into BigQuery.
Introduced a custom Qlik extension using JavaScript, enhancing functionality with a writeback feature.

Data & Workﬂow Engineer

Syberry

05.2020 - 04.2021

Provided crucial support in stabilizing and enhancing existing data processes within Airflow for the efficient handling of substantial data flows.
Embedded specialized features for extracting data from unique data storage systems, ensuring seamless integration with the broader data ecosystem.
Developed and fine-tuned custom normalizers to transform diverse data formats, including XML, JSON, TXT and HTML, to prepare it for the subsequent stages of the ETL system.
Established a Docker QA container within Google Cloud Platform (GCP) to facilitate the execution of quality assurance checkers for data validation.
Created and implemented comprehensive QA scripts to validate data deliveries, ensuring accuracy and alignment with client preferences.

Education

Bachelor of Information Technology - Software Engineer

Belarus National Technical University

Minsk, Belarus

06.2022

Skills

Apache Airflow
BigQuery
Jenkins
Apache NiFi
Apache Beam
Cloud Composer
Cloud Dataflow
Cloud Scheduler
Cloud Function

Cloud Storage
Apache Cassandra
Bash
Pub/Sub
Terraform
Qlik Sence
Apache Spark
Apache Kafka
Docker

Certification

Professional Data Engineer, Google

The exam assesses the ability to design and build data processing systems, ensure solution quality, and operationalize ML models

Associate SQL Analyst, Databricks

The earner has demonstrated their ability to use Apache Spark SQL to query, transform, and present data

Languages

Russian, Belarusian

Native language

English

Upper intermediate

Timeline

Data Engineer

EPAM Systems PL

08.2022 - Current

Data Engineer

EPAM Systems BY

04.2021 - 08.2022

Data & Workﬂow Engineer

Syberry

05.2020 - 04.2021

Bachelor of Information Technology - Software Engineer

Belarus National Technical University

Yauheni Subota

Summary

Overview

Work History

Data Engineer

Data Engineer

Data & Workﬂow Engineer

Education

Bachelor of Information Technology - Software Engineer

Skills

Certification

Languages

Timeline

Data Engineer

Data Engineer

Data & Workﬂow Engineer

Bachelor of Information Technology - Software Engineer

Similar Profiles

CHANSCHERANI CHANSCHERANI null

Swetha Maria JosephSwetha Maria Joseph

KORUPOLU MANOJ KUMARKORUPOLU MANOJ KUMAR

Fadzai Cecilia MpofuFadzai Cecilia Mpofu

Anshuman PandeyAnshuman Pandey