Answering questions on how to ingest data into SageMaker Offline Feature Store by directly writing into S3

Photo by Emily Morter on Unsplash

In a previous blog post I showed how to ingest data into SageMaker Offline Feature Store by writing directly into S3. I have received feedback and suggestions for advanced scenarios which I will discuss in this Q&A.

Q: How can I ingest historical data where feature records have different timestamps?

I simplified my previous example by assigning each feature record the same timestamp. However, in a real-word scenario it is much more likely that historical feature records have different timestamps. …

How to backfill the SageMaker Offline Feature Store by writing directly into S3

Photo by Jan Antonin Kolar on Unsplash

What is this about?

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, update, retrieve, and share machine learning (ML) features. It was introduced at AWS re:Invent in December 2020 and has quickly become one of the most popular services within the SageMaker family. Many AWS customers I have spoken to since re:Invent expressed their interest in SageMaker Feature Store (SMFS).

Some of these customers have historical feature data they would like to migrate to the SMFS offline store which can store large volumes of feature data that is used to keep track of historical feature values and to create train/test…

Overcoming the infamous 512 token limit

Photo by Raphael Schaller on Unsplash

What is this about?

In a previous blog post we looked at how to get started with the newly released CUAD dataset that helps automate contract reviews. We loaded the model and ran a first prediction on a short extract (first 100 words) of a contract. As mentioned in the article, the kind of NLP models we use for this task usually have a 512 word limit. What this means is that the model we have set up is not able to scan the entire contract for information. Instead it is limited to an extract of the contract that is shorter than 512 words.

Notes from Industry

A deep dive into a newly released Natural Language Processing dataset for Contract Understanding

Contract review is the process of thoroughly reading a contract to understand the rights and obligations of an individual or company signing it and assessing the associated impact. It is widely viewed as one of the most repetitive and most tedious jobs that junior law firm associates must perform. It is also expensive and an inefficient use of a legal professional’s skills. In this blog post I show how to set up a newly released dataset and associated machine learning models to automate contract reviews.

Photo by Scott Graham on Unsplash

What does contract review entail?

When it comes to contract review, a lawyer’s job is to manually review hundreds of…

Heiko Hotz

Senior Solutions Architect for AI/ML at AWS — Focusing on Natural Language Processing (NLP)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store