Skip to content

Latest commit

 

History

History
81 lines (44 loc) · 6.21 KB

resources.md

File metadata and controls

81 lines (44 loc) · 6.21 KB

New Features & Enhancements

  • We are working hard with your feedback and DPK 1.0.0 alpha has been released with simplified APIs for language transforms. Check out this page for example implementation of some of these transforms.
  • HAP and PII recipe notebooks contributed by our partners in GSI team are now merged.

Data Prep Kit Resources

📄 Papers

  1. Data-Prep-Kit: getting your data ready for LLM application development
  2. Granite Code Models: A Family of Open Foundation Models for Code Intelligence
  3. Scaling Granite Code Models to 128K Context

🎤 External Events and Showcase

  1. Workshop at the AI for Connectivity Hackathon: “Preparing Data for LLM Applications with Docling & Data Prep Kit” - Jan 25, 2025

  2. Talk on DPK at IBM TechXchange Agents day - Jan 23, 2025 - Slides

  3. DPK tutorial at CODS-COMAD 2024 - Dec 18, 2024

  4. “Generative AI Model Data Pre-Training on Kubernetes: A Use Case Study” was accepted for KubeCon EU 2025 - Dec 2024

  5. DPK has been added to AI Alliance's “Living Guide to Applying AI” - Dec 2024

  6. Workshop on Preparing Data for LLM Applications Using Data Prep Kit -Dec 2024 - Video

  7. DPK tutorial and hands on session at IIIT Delhi - Nov 22, 2024

  8. Talk and Hands on session at MIT Bangalore - Nov 8, 2024

  9. PyData NYC 2024 - 90 mins Tutorial - Nov 6, 2024

  10. "Data Prep Kit: A Comprehensive Cloud-Native Toolkit for Scalable Data Preparation in GenAI App" - Oct 28-29, 2024 - Video | Slides

  11. Tech Educator summit IBM CSR Event - Oct 16, 2024

  12. Data Science Dojo Meetup - Oct 9, 2024 - Video

  13. Open Source RAG Pipeline workshop with Data Prep Kit at TechEquity's AI Summit in Silicon Valley - Oct 2024

  14. "RAG with Data Prep Kit" Workshop @ Mountain View, CA, USA ** - info - Sep 21, 2024

  15. IBM TechXchange Las Vegas

  16. Unstructured Data Meetup - SF, NYC, Silicon Valley

  17. Data Exchange Podcast with Ben Lorica - Sep 2024

  18. Open Source AI Demo Night - Aug 8, 2024

  19. "Building Successful LLM Apps: The Power of high quality data" - Video | Slides - Aug 2024

  20. "Hands on session for fine tuning LLMs" - Video - Aug 2024

  21. "Build your own data preparation module using data-prep-kit" - Video - Aug 2024

Example Code

Find example code in readme section of each tranform and some sample jupyter notebooks for getting started here

Blogs / Tutorials

Relevant online communities

We Want Your Feedback!

Feel free to contribute to discussions or create a new one to share your feedback