A Feature Discovery System for Data Science Across the Enterprise
Existing data lake systems lack the support for storing or discovery features that could be used with different ML projects.
These limitations negatively affect the process of decision-taking. Data scientists spend most of their time finding, preparing,
and integrating relevant data sets to finish analytics tasks. Feature discovery systems are needed to ease the process of building
data science pipelines to drive significant insights efficiently, effectively and fairly. To meet these needs, we should overcome
challenges, such as: (a) discovering links and similarities among data items at different granularities, such as table and column,
and (b) developing mechanisms to make the features searchable not only searching the relevant data (c) processing complex
queries that extract the relevant features efficiently, (d) tracking vast amount of features used in different ML projects and
accuracy of the models used these features, and (e) detecting feature bias used throughout a data science pipeline.