Job Overview:
We are seeking a skilled Data Engineer with a minimum of 8 years of experience to design, develop, and maintain data solutions focused on data generation, collection, and processing. The ideal candidate will have expertise in PySpark, Python, and Notebooks, and will be responsible for creating data pipelines, ensuring data quality, and implementing ETL (Extract, Transform, Load) processes to migrate and deploy data across systems.
Key Responsibilities:
• Data Solutions Development:
o Design, develop, and maintain data pipelines to extract, transform, and load data across various systems using PySpark.
o Create and manage data solutions for efficient data generation, collection, and processing.
• Data Quality Management:
o Ensure high data quality and integrity through implementing data validation and cleansing processes.
o Monitor and troubleshoot data pipeline performance, ensuring reliability and efficiency.
• Performance Optimization:
o Optimize data infrastructure to enhance performance and scalability.
o Implement strategies for efficient data storage, retrieval, and processing.
• Data Processing:
o Process large volumes of structured and unstructured data, integrating data from multiple sources into usable datasets.
o Utilize Notebooks for interactive data analysis and visualization.
• Technical Expertise:
o Demonstrate advanced proficiency in Python for developing data processing logic and automation scripts.
o Strong understanding of data engineering concepts and best practices.
• Collaboration:
o Work collaboratively with cross-functional teams, including data scientists and business stakeholders, to understand data requirements and provide effective solutions.
o Participate actively in team discussions and contribute to continuous improvement initiatives.
• Documentation and Deployment:
o Maintain comprehensive documentation of data processes, data models, and pipeline configurations.
o Involved in code release and production deployment processes.
Required Skills and Qualifications:
• Must-Have Skills:
o Minimum of 5 years of experience in data engineering or related roles.
o Proficient in Python and PySpark for data processing and manipulation.
o Experience with Notebooks (e.g., Jupyter, Databricks) for data analysis and visualization.
• Additional Skills:
o Strong understanding of data integration and ETL processes.
o Familiarity with data modeling, database design, and SQL.
o Knowledge of distributed computing and big data technologies.
• Soft Skills:
o Strong analytical and problem-solving skills.
o Excellent communication skills for effective collaboration with team members and stakeholders.
o Ability to work independently and proactively contribute to team goals.
Education and Experience:
• Bachelor’s degree in Computer Science, Information Technology, or a related field.
• Minimum of 5 years of relevant experience in data engineering or related fields, with a focus on PySpark, Python, and Notebooks.
Work Environment:
• Work Mode: Hybrid or Remote, depending on company policy.
• Notice Period: Immediate to 15 days preferred.
Interview Process:
• The selection process will include technical interviews to assess expertise in PySpark, Python, and data engineering concepts.