Expires soon General Electric

Data Engineer (m/f)

  • City of London (Greater London)
  • IT development

Job description

Additional Cities

Cramlington, Stutensee
Career Level

Relocation Assistance


GE Oil & Gas
Business Segment

Oil & Gas Headquarters

Digital Technology

Germany, United Kingdom
Postal Code

NE23 1WW
Role Summary/Purpose

You will be a member of an integrated team of data engineers, software engineers, data scientists and a product owner to deliver successful outcomes driving efficiency and creating new revenue streams
Essential Responsibilities

·  Collaborating with system engineers, data scientists, frontend developers and software developers to implement solutions that are aligned with our stakeholder’s and our own strategic directions

·  Implementing, documenting and supporting data engineering solutions in an agile environment

·  Creating data visualizations using the latest development methods and infrastructure

·  Create custom software components (e.g. specialized UDFs) and analytics applications

·  Usage of Spark 2.x APIs like DataFrames, DataSets, SQL Apis, spark sessions, utilization of latest spark features for scaling of algorithm development

·  Installation of Python and other libraries required for Data engineering on Hadoop environment

·  Following tasks are carried out by Ayush independently

·  AWS EMR/Spark configurations, Monitoring of Spark UI Monitoring for Java Heap, usage of clusters and execution of jobs

·  Usage of AWS S3 buckets for executing of Spark jobs, knowledge of Spark parameters for remote code and application jar execution

·  ETL operation in Spark and Optimal storage of intermediate data on AWS S3 buckets – knowledge of parquet formats, query and read from these files in Spark jobs

·  Creating Bootstrap scripts for any libraries installs for all clusters

·  Generic monitoring of Spark Clusters using Ganglia – reporting of memory usage, CPU stats for Spark Jobs so that Data Engineering /Data scientist team can optimize the scaled spark codes

·  Experience with relational databases, (preferably Oracle, Postgres, Greenplum)

·  Significant experience writing complex SQL queries, strong PL/SQL skills

·  Experience with at least one programming language (preferably Scala, Java or Python)

·  Experience in Unix/Linux environments

·  Good English skills (written and spoken)

·  Positive attitude and team player

Applications from job seekers who require sponsorship to work in the UK are welcome and will be considered alongside all other applications. However, non-EU/EEA candidates may not be appointed to a post if a suitably qualified, experienced and skilled EU/EEA candidate is available to take up the post, as the employing body is unlikely, in these circumstances, to satisfy the Resident Labour Market Test. For further information please visit the UK Border Agency website
Desired Characteristics

·  Variety of languages and tools (e.g. scripting languages) to marry systems together

·  Integration of these libraries to current algorithm/Spark jobs- integration of JUnit, Spark Unit-Testing framework in the spark codes

·  Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala, Phoenix

·  Programing skills in Map/Reduce , Spark jobs –heavy lifting programming skills will be required for custom implementations or specialized implementations and use case based on Algorithms and machine learning models built/configured by Data scientist/Algorithm teams

·  Development of Map/Reduce , Spark Jobs/Pipelines on Hadoop distributed environment in Python and Java - Pyspark and Java-Spark jobs


About Us

GE is the world's Digital Industrial Company, transforming industry with software-defined machines and solutions that are connected, responsive and predictive. Through our people, leadership development, services, technology and scale, GE delivers better outcomes for global customers by speaking the language of industry.
Primary Country

United Kingdom