Welcome to Mdataset Documentation!

Mdataset (/em-dataset/) is a Python library that enables researchers and students to gather and manage their datasets. It accomplishes this by providing tools, such as downloading datasets from the most popular dataset archives (e.g., Kaggle, Hugging Face, AcademicTorrents, and Scraping). Additionally, it allows users to process their datasets by providing tools for Optical Character Recognition, Transcription, File Transfer, Translation, and a powerful Text-Speech Engine.

It provides a streamlined interface that allows users to download a dataset in less than three lines of code or perform an operation. It is the first dataset library of its kind that offers open-source AI tools to assist in generating synthetic datasets and more. Our goal is to make it easier for researchers and students in physics, biology, and artificial intelligence to use our library to create robust datasets and share them with the world.

Check out the usage section for further information, including how to Installation the project.

Note

This project is under active development.

Contents