● Working on configuring Data ingest with Apache SQOOP and FLUME into the target HDFS.
● Having good exposure to the Hadoop-2.0 version and Cloudera manager CDH3.
● AWS Cloud implementation experience on EC2, EMR, S3, and other services
● Spark SQL with MySQL (JDBC) source
● PySpark RDD Actions & Transformations and Dataframe API.
● Spark Architecture and Components: Spark Core & PySpark SQL with Data frames
● Big Data Technologies: Hadoop (CDH Distribution), MapReduce framework & Spark Ecosystem.
● Hadoop High-Level Languages: Hive 0.8.0.
● Data Analysis using Spark Core & Spark SQL & Spark Streaming.
● Delivered data analysis projects using Hadoop based tools and the python data science stack
● Interacting with clients and handling their queries and resolving issues.
● Knowledge of end-to-end project workflow.