Key Responsibilities - Design, develop, and maintain scalable ETL/ELT pipelines using Databricks, Apache Spark, and cloud-native services. - Leverage Azure data services (Data Lake, SQL DB, Service Bus, Cosmos DB, OLTP) for ingestion, transformation, and pipeline orchestration. - Implement and manage Change Data Capture (CDC) processes, including change tracking and change data feed, to support incremental updates. - Plug into and extend a custom metadata-driven Databricks accelerator framework used to manage Lakehouse operations. - Optimize the downstream data environment by proactively identifying and resolving upstream data issues. - Utilize Delta Lake, Unity Catalog, Delta Live Tables, and Databricks Workflows to enforce governance and streamline operations. - Participate in planning and preparing for a future migration from Azure to AWS, including cross-training, architecture evaluation, and knowledge transfer. Required Experience and Skills - Strong expertise in Databricks and Apache Spark, especially within large-scale data environments. - Proficiency in SQL and Python for data transformation and pipeline automation. - Hands-on experience with key Azure services, including Storage, SQL DB, Cosmos DB, Service Bus, and OLTP systems. - Experience implementing CDC mechanisms and managing real-time or batch data syncs. - Familiarity with Delta Lake architecture, Unity Catalog, Delta Live Tables, and job orchestration in Databricks. - Ability to diagnose and resolve upstream data quality issues to improve downstream consumption and performance. - Willingness and ability to contribute to an upcoming cloud migration to AWS, with foundational knowledge or strong interest in AWS-native data tooling. Preferred Qualifications - Experience supporting data modernization or cloud migration initiatives (especially Azure to AWS). - Familiarity with AWS data tools such as S3, Glue, Redshift, Kinesis, or DynamoDB is a plus. - Background in metadata-driven architectures or accelerator-style data platforms. - Strong collaboration skills and experience working with data architects, product owners, and analytics stakeholders.
Job Type
Hybrid role
Skills required
Azure, Python
Location
Houston, TX
Salary
No salary information was found.
Date Posted
July 10, 2025
PDI Technologies is seeking a skilled Data Engineer with expertise in Databricks and Apache Spark to build scalable data pipelines and support cloud migration initiatives. This hybrid role is based in Houston, TX, focusing on both Azure and AWS environments.