Title : Sr. Big Data Engineer – Midstream Oil & Gas

Location : Houston, TX

Duration : 12+ months

Visa : USC /GC/ L2/H4/TN

Will likely be local candidates since client is requiring midstream Oil & Gas experience.

The Data Engineer will primarily develop in Spark and Python and use a variety of Big Data platforms to provide solutions to core challenges facing the business. S/he will use machine learning, data science and artificial intelligence. The Data Engineer will also sit with business users to discuss work challenges and requirements and may interact with senior leadership.

Specifically, the individual will provide subject matter expertise to use cases pertaining to the following:

1. Reliability and Safety – goal to reduce equipment failure

2. Optimization – goal increase throughput to increase fracking

3. Product Loss – goal reduce loss of oil and gas product and balance accounting of product

Responsibilities include, but are not limited to:

Work independently on data projects for multiple business functions

Implement data flows connecting operational systems, BI systems, and the big data platform

Automate manual data flows for repeated use and scalability

Develop data-intensive applications with API’s and streaming data pipelines

Prepare and transform data into a usable state for analytics

Document and maintain source-to-target mappings and data lineage

Install, configure, and administer Hadoop ecosystem tools and technologies

Operationalize machine learning models

Assists data analysts and data scientists with query optimization, performance tuning, and data processing

Identify opportunities for data improvements and presents recommendations to management


The successful candidate will meet the following qualifications:

5+ years of functional programming experience in Python and/or Scala

5+ years of distributed computing experience with Spark and/or Map Reduce

3+ years of experience developing SQL-based applications

3+ years of experience developing ETL data pipelines

3+ years of system administration on Linux systems

2+ years of experience with multiple Big Data storage tools such as MapR-DB, HBase, MapR-FS, HDF, Parquet, Avro, and ORC

Experience in software development life-cycle including change management and testing

Experience creating data pipelines with Streamsets

