Chandigarh
Full-Time
Junior: 1 to 3 years
7L - 8L (Per Year)
Posted on Jul 19 2023

About the Job

Skills

Cloudera
Hadoop
Oozie
Data Engineer
Data mining
data injection
Data Pipelines

1. Data Acquisition

- Candidate should manage the existing Data pipelines built for data ingestion.

- Create and manage new data pipelines following the best practices for the new ingestion of data.

- Continuously monitor the data ingestion through Change Data Capture for the incremental load.

- Any failed batch job schedule to be analyzed and fixed to capture the data.

- Maintaining and continuously updating the technical documentation of the ingested data and maintaining the centralized data dictionary, with necessary data classifications.


2. Data Extraction and Cleaning

- Extraction of data from the data sources to be cleaned and ingested into a big data platform.

- Automation of data cleaning has to be defined before ingestions.

- Data cleaning to handle the missing data and remove any outliers and resolve any inconsistencies.

- Data quality check has to be performed in terms of accuracy, completeness, consistency, timeliness, believability, and interpretability.


3. Data Integration, Aggregation and Representation

- Exposing Data views or Data models to Reporting and source systems using Hive or Impala, or similar tools.

- Exposing cleansed data to the Artificial Intelligence team for building data science models.


4. Informatica Data Catalog

- Implement and configure the Informatica Enterprise Data Catalog (EDC) solution to discover and catalog data assets across the organization.

- Develop and maintain custom metadata scanners, resource configurations, and lineage extraction processes.

- Integrate EDC with other Informatica tools, such as Data Quality (IDQ), Master Data Management (MDM), and Axon Data Governance.

- Define and implement data classification, data profiling, and data quality rules to improve data visibility, accuracy, and trustworthiness.

- Collaborate with data stewards, data owners, and data governance teams to identify, document, and maintain business glossaries, data dictionaries, and data lineage information.

- Establish and maintain data governance policies, standards, and procedures within the EDC environment.

- Monitor and troubleshoot EDC performance issues, ensuring optimal performance and data availability.

- Train and support end-users in effectively utilizing the data catalog for data discovery and analysis.

- Keep up to date with industry best practices and trends, continuously improving the organization's data catalog implementation.

- Collaborate with cross-functional teams to drive data catalog adoption and ensure data governance compliance across the organization.


Skill Set:

- Certified Big Data Engineer from Cloudera/AWS/Azure

- Expertise with Big data products – Cloudera stack.

- Expertise in Big Data querying tools, such as Hive, Hbase, and Impala.

- Expertise in SQL, writing complex queries/views, partitions, and bucketing.

- Strong Experience in Spark using Python/Scala.

- Expertise in messaging systems, such as Kafka or RabbitMQ.

- Hands-on experience in the Management of the Hadoop cluster with all included services.

- Implementing ETL process using Sqoop/Spark.

- Implementation including loading from disparate data sets, Pre-processing using Hive.

- Ability to design solutions independently based on high-level architecture.

- Collaborate with other development teams.

- Expertise in building stream-processing systems, using solutions such as Spark-Streaming, Apache NIFI, and KAFKA.

- Expertise with NoSQL databases such as HBase.

- Experience with Informatica Enterprise Data Catalog (EDC) implementation and administration.

- Strong knowledge of data management, data governance, and metadata management concepts.

- Proficiency in SQL and experience with various databases (e.g., Oracle, SQL Server, PostgreSQL) and data formats (e.g., XML, JSON, CSV).

- Experience with data integration, ETL/ELT processes, and Informatica Data Integration.


Location: Chandigarh

Salary: No bars for the right candidate.

Working: 5 days (WFO)

About the company

Spark Brains is the best website design, development, and digital marketing company in India that can meet all of your business needs. We focus on providing high value to clients by providing reliable, integrated, responsive, innovative, and cost-effective solutions that drive business growth. SparkBrains is a firm specializing in technological solutions for businesses - the ones that exist, an ...Show More

Industry

IT Service Provider

Company Size

11-50 Employees

Headquarter

Chandigarh

Other open jobs from Spark Brains Pvt. Ltd.