HSBC Global Banking & Markets – Big Data Program - INT-005

Preferred Disciplines: Computer Science, Computer Engineering or related area; MSc, PhD, PDF
Project length: 16-24 months (4 units)
Desired start date: As soon as possible
Location: London, UK and/or New York City, NY, USA and/or Toronto, ON
No. of Positions: 2
Preferences: N/A
Company: HSBC


About Company:

HSBC – World’s Best Bank 2017 (Euromoney)

Global Banking and Markets (GBM) is the Investment Bank arm of HSBC and is undergoing a major transformatoin to set itself up for the future. As part of this transformation, under the Chief Operating Officer’s sponsorship, the Big Data Program has come to life.

Globalising our data from our business lines; Global Banking, Global Markets, HSBC Securities Services, Global Liquidity & Cash Management, is imperative to identifying new commercial, operational risk/regulatory opportunities.

HSBC has access to over 90% of the world’s trade and capital flows, giving it a truly unique data footprint. That combined with 47 jurisdictions and over 25,000 employees means an unrivalled number of use cases for development.

Project Description:

We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company and to help build out some core services that power our Machine Learning and advanced analytics systems

Background and required skills

Research Objectives/Sub-Objectives:

  • Developing tools and systems to help analyse and manage large (sometimes unstructured) datasets
  • Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
  • Implementing ETL processes (within HDP environment)
  • Monitoring performance and advising any necessary infrastructure changes
  • Defining data retention policies
  • Act as a link between the data scientists to the software development team.
  • Help develop prototype machine learning models into robust production systems


  • To be discussed by the company and applicants

Expertise and Skills Needed:

  • Proficient understanding of distributed computing principles and of the fundamental design principles behind a scalable application
  • Management of Hadoop cluster and an ability to solve any ongoing issues with its operation
  • Proficiency with Hadoop v2, MapReduce, HDFS
  • Experience with building stream-processing systems
  • Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
  • Experience with NoSQL databases, such as HBase, MongoDB
  • Knowledge of various ETL techniques and frameworks,
  • Experience with various messaging systems, such as Kafka
  • Experience with Big Data ML toolkits, such as Dask, SparkML, or H2O
  • Experience with Google Cloud Platform/Hortonworks
  • Experience in Python, particularly the scientific stack and Anaconda
  • Able to integrate multiple large data sources and databases into one system.
  • Knowledge of user authentication and authorization between multiple systems, servers, and environments

Nice to have skills:

  • Ability to process and rationalise message data and semi/unstructured data
  • Understanding multi-process architecture/parallel processing and of the threading limitations of Python
  • Git(hub)
  • Knowledge of at least one Python web framework (preferably: Flask, Tornado, and/or twisted).
  • Familiarity with event-driven programming in Python.
  • Experience of Continuous Integration
  • Basic understanding of front-end technologies, such as JavaScript, HTML5, and CSS3
  • Strong unit test and debugging skills


For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects.
  2. Interested students need to get the approval from their supervisor and send their CV along with a link to their supervisor’s university webpage by applying directly to Jillian Hatnean, jhatnean(a) or Sherry Zhao, szhao(a)