Instacart Introduces Griffin: An Extensible, Self-Servicing ML Platform


Instacart provides grocery delivery and pickup services in the US and Canada.

Customers can use the service to order goods from participating stores, and a personal shopper will conduct them for shopping.

As one can imagine, the Instacart experience relies heavily on machine learning. Nearly every product and business innovation at the company is based on machine learning (ML), including a list of the best things to help customers find more than 1 billion products and enabling 5,000+ brand partners to connect to their products.

To serve a variety of use scenarios, they combine third-party solutions (including Snowflake, AWS, Databricks, and Ray) with internal abstraction layers that offer consistent access. This strategy enables them to deploy customized and varied solutions while keeping up with the MLOps industry.

Griffin was created to make it easier for MLEs (Machine Learning Engineers) to easily manage product releases, swiftly iterate on machine learning models, and monitor their production applications.

Using the system developed by the Instacart team are a few key system concerns based on these objectives:

  1. Scalability: At Instacart, the platform should be able to host machine learning applications.
  2. Extensibility: The platform should be adaptable to add new features and interface with other machine learning and data management backends.
  3. Generality: Apart from the platform’s extensive connection with third-party solutions, it should offer an unified workflow and consistent user experience.

As stated by the team, the platform has four basic components:

  1. Using the MLCLI interface to create a platform for machine learning applications and control the model lifecycle. MLCLI gives MLEs the ability to alter training, evaluating, and inferring tasks in their applications and carrying them inside containers (including, but not limited to, Docker). This creates a consistent interface and eliminates the problems brought about by the execution environment.
  2. Workflow Manager & ML Launcher: This pipeline orchestrator uses Airflow to schedule containers and ML Launcher, a proprietary abstraction, to containerize job execution.
  3. Feature Marketplace: They create data for a feature management platform that supports platforms using batch and real-time feature engineering such as Snowflake, Spark, and Flink. It controls feature computation, offers feature storage, supports feature discoverability, enables feature versioning, stops offline / online feature drift, and permits feature sharing. With the help of a hybrid solution, they have scalability for balanced latency and storage costs.
  4. Tasorflow, Pytorch, Sklearn, XGBoost, FastText, and Faiss, such as Tentorflow, Pytorch, Sklearn, XGBoost, FastText, and Faiss, are a framework and training platform that is built-in to the Training & Inference platform. They standardize package management, metadata management, and code management to accommodate a variety of frameworks and assurance trustworthy model deployment in production. They state that it is now possible to triple the number of machine learning applications in a single year thanks to the ability of the platform to adapt MLEs to a model architecture and inference procedure.

The team will make a number of use cases, such as real-time recommendations, for their own right and allow for the creation of these aspects in their extensible platform.