Data Engineer
MS should be in EEE, ECEC, CSI or IT
Data Engineer strong in - Hadoop, Python, JS, KAFKA< SCALA, Data Lake etc
- Design and Develop Artificial Intelligence and Machine Learning applications in Big Data ecosystems using spark.
- Develop and maintaining the backend Databases and ETL flow for an enterprise portal of Verizon.
- Development (coding the actual application), Reviewing and maintaining code, Unit Testing & Integration Testing, Application Build and Deployment.
- Develop data pipeline to integrate UI frameworks (NodeJs) with messaging systems (Kafka).
- Analyzing existing databases (postgres and MYSQL) to develop effective systems.
- Develop store procedures in Oracle and MySQL and optimizing queries to improve frontend performance.
- Participate in discussions with clients for analyzing the user requirement for new projects and decide on its possibility of implementation as per the existing technologies.
- Develop spark applications for processing batch and streaming data sets using programming languages python, Scala and Java.
- Create a synchronizing mechanism from a local server to HDFS using LFTP and python.
- Setup and maintain a multi node KAFKA cluster for the production environment.
- Analyzing information to recommend and plan the installation of new systems or modifications of an existing system.
- Hadoop, Spark Cluster Setup from the scratch for preproduction environment.
- Write SQL queries and performed Back-End Testing for data validation to check the data integrity during migration from back-end to front-end.
- Use Hive data warehouse tool to analyze the data in HDFS and developed Hive queries.
- Implement Spark custom UDF's to achieve comprehensive data analysis using both Scala and python.
- Providing POC’s on Core Spark and Spark SQL replicating the existing ETL logic.
- Develop many successful ETL flow for the Data movement using python as well as pyspark.
- Take up different tasks in the various phases of software development which also includes preparing software specifications and business requirement documents.
- Create Dimensional models for the backend tables in MYSQL.
- Work extensively in data analysis, wireless systems for developing predictive and forecast algorithms.
- Create and pushed data many HIVE external tables with the structured data for the Data Scientists to train their models.
- Accumulate the real-time networking data from the provided REST api’s and pushed to the DB using python.
- Use the anomaly detection custom function developed by the data scientists as a spark UDF and created an RDBMS table for the UI Dashboards.