Sessions

>

Workshop

Preparing Data for LLM Applications Using Data Prep Kit

Data

When building data intensive applications, a significant portion of your time will be dedicated to data wrangling (cleaning, de-duping, removing markups, etc.). Data Prep Kit (https://github.com/IBM/data-prep-kit) is a new open source project that helps you with this.

Data Prep Kit (DPK) is an open source python library that can scale from your laptop to a highly scalable cluster in the cloud. It has been used at scale to prepare terabytes of data to train the IBM Granite Large Language Models (LLMS).

A few noteworthy features of DPK include: de-duping documents (exact dedupe and fuzzy dedupe), handling documents and code, language detection (spoken languages and programming languages), removing PII, malware detection and creating embeddings for a vector database.

In this talk, I will go over some interesting features of the Data Prep Kit. If time permits, I will show a demo.

Time & Duration:

11:45 am

Location

Lovelace Room

AI Themes

AI Tracks

Data

Featured Speaker

Sujee Maniyam

AI Engineer | IBM AI Alliance

Sujee Maniyam is a seasoned practitioner focusing on Generative AI, Machine Learning, Deep Learning, Big Data, Distributed Systems, and Cloud technologies. He also love teaching and has taught and mentored thousands of professionals.

Event Details

Time & Duration:

11:45 am

90 – 120 min

In-Person Location:

Lovelace Room

Watch online:

Watch live broadcast on Zoom

Featured Speaker

Sujee Maniyam

AI Engineer | IBM AI Alliance

Other Speakers Sessions

Collaborate to innovate and advance.

Begin your journey of collaborative innovation and collective growth with Ai Summit. Your participation helps propel the democratization of AI knowledge and applications.

Sessions

>

Preparing Data for LLM Applications Using Data Prep Kit

Time & Duration:

Location

AI Themes

AI Tracks

Featured Speaker

Event Details

Time & Duration:

In-Person Location:

Watch online:

Featured Speaker

Other Speakers Sessions

Collaborate to innovate and advance.

Get Involved

Resources

About Us

Support

Ai Summit
Fall 2024

Sessions

>

Preparing Data for LLM Applications Using Data Prep Kit

Time & Duration:

Location

AI Themes

AI Tracks

Featured Speaker

Event Details

Time & Duration:

In-Person Location:

Watch online:

Featured Speaker

Other Speakers Sessions

Collaborate to innovate and advance.

Get Involved

Resources

About Us

Support

Ai Summit Fall 2024

Ai Summit
Fall 2024