Skip to content

micah2021/kaggledatasets

 
 

Repository files navigation

kaggledatasets

Collection of Kaggle Datasets ready to use for Everyone

License Release Platform Support

System Python 3.5 Python 3.6 Python 3.7
Linux Build Status Build Status Build Status
macOS Build Status Build Status
Windows Build Status Build Status Build Status

More About Kaggle Datasets

import kaggledatasets as kd

heart_disease = kd.structured.HeartDiseaseUCI(download=True)

# Returns the pandas data frame to be used in Scikit Learn or any other framework
df = heart_disease.data_frame()

# Returns the tensorflow dataset type compatible with TF 2.0
dataset = heart_disease.load()
for batch, label in dataset.take(1):
    for key, value in batch.items():
        ...

# Returns the data loader for PyTorch
# Work in progress for PyTorch support

Installation

Binaries

Commands to install from binaries via Conda or pip wheels are on our website: https://kaggledatasets.github.io

From Source

Get the kaggledatasets Source

git clone --recursive https://github.com/kaggledatasets/kaggledatasets
cd kaggledatasets

Install Dependencies

pip install -r requirements.txt

Install kaggledatasets

python setup.py install

Getting Started

Communication

  • GitHub Issues: bug reports, feature requests, dataset requests, install issues, help wanted, thoughts, etc.
  • Slack: The Kaggle Datasets Slack hosts a primary audience of moderate to experienced Kaggle Datasets users and developers for general chat, online discussions, collaboration etc. If you need a slack invite, please visit: http://bit.ly/kdslack

Releases and Contributing

kaggledatasets is expecting to have a 30 day release cycle (major releases). Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions and make sure you go through our Contributing Guide. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, new datasets, utility functions or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking kaggledatasets in a different direction than you might be aware of.

License

kaggledatasets is Apache-2.0 licensed, as found in the LICENSE file.

About

Collection of Kaggle Datasets ready to use for Everyone (Looking for contributors)

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.3%
  • Shell 3.7%