Managing datasets can be a hassle - especially at a large scale. Especially for unstructured data, storing, accessing the data, and version-controlling it is hard. Davit Buniatyan, Activeloop CEO, has battled with managing petabyte-scale data during his time at the Princeton Neuroscience lab. In this meetup, he will cover the critical stack needed to resolve the biggest pains of dataset management. Moreover, he will present the open-source Hub package - that he is building to be the SQL for images.
After a brief brainstorming session, we will get hacking - improving our free dataset visualization tool (https://app.activeloop.ai/datasets/explore) and the open-source Hub package (the fastest way to access and manage datasets for PyTorch and TensorFlow).
Can't make it to this online meetup? Hacktoberfest is virtual and open to participants from around the globe. Sign up to participate today.
First, sign up on the Hacktoberfest site at https://hacktoberfest.digitalocean.com.
1+ merged PR: stickers
3+ merged PRs: T-shirt plus a choice of stickers or face mask
5+ merged PRs: T-shirt, stickers, and a face mask
Best Contributor Award: SONY WH-CH710 noise-canceling headphones
All contributors get a contributor badge!
To qualify for the official limited edition Hacktoberfest shirt from Hacktoberfest team, you must register and make four pull requests between October 1-31.
Hacktoberfest meetups are welcoming, open, and inclusive. Please read our Events Code of Conduct before attending. Happy hacking!
Locations have different requirements for who can attend. This location is open to the following:
Welcome (5:00 - 5:05 PDT) - Intro to the Activeloop team and Hacktoberfest
Network (5:05 - 5:10 icebreaking)
Dataset Management pains intro (5:10 - 5:17) - explain what issues we are solving
Intro to HUB - the fastest way to access and stream datasets for Tensorflow and PyTorch (5:17 - 5:25)
Demo visualization tool - (5:25 - 5:30)
Brainstorming (5:30 - 5:45)
Get hacking (assign people to the teams based on brainstorming) - (5:45 - 6:45)
Show and tell (6:45 - 7:00)
Activeloop (www.activeloop.ai), is a company backed by Y Combinator and a member of NVIDIA’s prestigious Inception program. Activeloop is a dataset management system that streamlines data scientists’ data aggregation, preparation, as well as optimizes the training of machine learning models. The company’s open-source Hub package is the fastest way to access and manage datasets for PyTorch and TensorFlow. Thanks to the package, you can build scalable data pipelines in no time.