Linux / Windows (Command line). An understanding of Unix/Linux including system administration and shell scripting
Proficiency with Hadoop v2, MapReduce, HDFS, Spark
Management of Hadoop cluster, with all included services
Good knowledge of Big Data querying tools, such as Pig, Hive, Impala and Spark
Data Concepts (ETL, near-/real-time streaming, data structures, metadata and workflow management)
The ability to function within a multidisciplinary, global team. Be a self-starter with a strong curiosity for extracting knowledge from data and the ability to elicit technical requirements from a non-technical audience
Collaboration with team members, business stakeholders and data SMEs to elicit, translate, and prescribe requirements. Cultivate sustained innovation to deliver exceptional products to customers
Experience with integration of data from multiple data sources
Strong communication skills and the ability to present deep technical findings to a business audience