Job Code : BSA-22-0081

 Positions : 3

 Job Type : Full Time

 Location : Columbus, Ohio, USA

 Key Skills :

Job Description:

• Configuring, installing, and managing of Apache Hadoop Clusters, Mapr, and Hortonworks & amp; ClouderaHadoop Distribution.
• Installing, configuring, monitoring, and using Hadoop components like Hadoop MapReduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Hortonworks, Oozie, Apache Spark,
•Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes,Troubleshooting, Cluster Planning, Manage and review data backups, Manage & review log files
• Analysing Hadoop cluster and different big data analytic tools including Pig, Hive Sqoop Importing and exportingdata into HDFS using Sqoop.Develop Spark jobs and Hive Jobs to summarize and transform data.
• Built on-premises data pipelines using Kafka and Spark for real time data analysis. Perform streaming dataingestion using Kafka to the spark distribution environment.
• Maintaining cluster health and HDFS space for better performance.
• Setup security using Kerberos and AD on Hortonworks clusters/Cloudera CDH
• Responsible for installing, configuring, supporting, and managing of Cloudera Hadoop Clusters.
• Built a prototype for real time analysis using Spark streaming and Kafka.
• Expertise in Administration, Installing, Upgrading and Managing distributions of Hadoop clusters with MapR on a cluster of nodes in different environments such as Development, Test and Production (Operational &Analytics) environments.
• Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation, and summarization activities on user behavioural data.
• Converting Map Reduce programs into Spark transformations using Spark RDD's and Scala.
• Implement Kerberos security in all environments. Defined file system layout and data set permissions.
• Regular Maintenance of Commissioned/decommission nodes as disk failures occur using MapR File
• Setting up high availability for major production cluster and designed automatic failover control using zookeeperand quorum journal nodes.
• Installing cluster, commissioning & decommissioning of Data Nodes, Name Node recovery, capacityplanning.
• Experience with Cloudera Navigator and Unravel data for Auditing hadoop access.
• Use of Spark Streaming to divide streaming data into batches as an input to spark engine for batch processing.
• Hadoop security setup using MIT Kerberos, AD integration (LDAP) and Sentry authorization.
• Configure Zookeeper to implement node coordination, in clustering support.
• Load log data into HDFS using Flume, Kafka and performing ETL integrations.
• Configure Kafka for efficiently collecting, aggregating and moving large amounts of click stream data from manydifferent sources to MaprFS.
• Develop a data pipeline using Kafka and Storm to store data into HDFS.
• Experience in understanding the security requirements for Hadoop and integrating with Kerberos authenticationinfrastructure- KDC server setup, crating realm /domain, managing principles, generation key tab file each service and managing keytab using keytab tools.
• Writing shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to anywarning or failure conditions.
• Configure various property files like core-site.xml, hdfs-site.xml, mapred-site.xml based upon the job requirement.
• Working on Name node high availability customizing Zookeeper Services.
• Experience on manual failover and automatic failover.
• Expertise in Hadoop job schedulers such as Fair scheduler and Capacity scheduler.
• Installation of various Hadoop Ecosystems and Hadoop Daemons in cloudera and EMR.
• Cluster coordination services through Zookeeper
• Import data using Sqoop to load data from MySQL to HDFS on regular basis
• Create Hive tables and work on using Hive QL