menu
techminis

A naukri.com initiative

google-web-stories
Home

>

Databases

>

ORiGAMi: A...
source image

Mongodb

3w

read

100

img
dot

Image Credit: Mongodb

ORiGAMi: A Machine Learning Architecture for the Document Model

  • ORiGAMi is a Transformer-based architecture designed for supervised learning on semi-structured data like JSON in a document model database.
  • It addresses the challenges faced by the ML community in working with semi-structured formats compared to traditional tabular data.
  • The architecture tokenizes documents into key-value pairs and structural tokens, making prediction directly from semi-structured documents possible.
  • By training on datasets with as few as 200 labeled samples, ORiGAMi combines data efficiency with Transformer model flexibility.
  • The model's token sequences serve as input for predicting the next token, ensuring valid document generation.
  • ORiGAMi reformulates classification to predict any field within a document, eliminating the need for separate models or pipelines.
  • Example use case includes user segmentation based on user profiles containing nested structures like device history and subscription details.
  • With ORiGAMi, users can make predictions on raw documents, preserving nested structures and updating predictions as user behavior changes.
  • The architecture is open-sourced on GitHub, with command-line interfaces for training models and making predictions seamlessly.
  • ORiGAMi provides a way for document-native machine learning, inviting users to explore, contribute, and apply it to real-world problems.

Read Full Article

like

6 Likes

For uninterrupted reading, download the app