JOB DESCRIPTION: "Spark, Scala/Python, HIVE, Hadoop, BIGDATA developer with Exposure to Cloud (Azure Preferably). • 4-5 Years of experience in Building and Implementing data ingestion and curation process developed using Big data tools such as Spark (Scala/python), Hive, Spark, HDFS, Sqoop, Hbase, Kerberos, Sentry and Impala etc. • Ingesting huge volumes data from various platforms for Analytics needs and writing high-performance, reliable and maintainable ETL code Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance. • .Hands on Experience on writing shell scripts. Complex SQL queries, Hadoop commands and Git.4 • Good Hands-on creating Database, Schemas, Hive tables (External and Managed) with various file formats (Orc, Parquet, Avro and Text etc.), Complex Transformations, Partitioning, bucketing and Performance optimizations. • .Recent Exposure to Cloud will be a good to have. Azure will be preferred.6.Spark • Complex transformations, data frames, semi-structured data, utilities using spark, Spark Sql and spark configurations.7.Proficiency and extensive Experience with Spark & Scala/Python and performance tuning is a MUST. • Monitoring performance of production jobs and advising any necessary infrastructure changes. • .Ability to write abstracted reusable code components. • .Code versioning experience using Bitbucket and CI/CD pipe line.
Job Type
Fulltime role
Skills required
No particular skills mentioned.
Location
Cary, North Carolina
Salary
No salary information was found.
Date Posted
May 28, 2025
CapB InfoteK is seeking a Hadoop Big Data Developer with expertise in Spark, Scala/Python, and cloud technologies. The role involves building data ingestion processes and optimizing ETL code for analytics.