>Business >How to load Data in Python with Scikit-Learn

How to load Data in Python with Scikit-Learn

Prior to developing machine learning models, you are required to load your data into memory. 

In this blog article, you will find out how to load data for machine learning in Python leveraging scikit-learn. 

 

Packaged Datasets 

The scikit-learn library is packaged with datasets. These datasets are good for obtaining a handle on a provided machine learning algorithm or library feature prior to leveraging it in your own work. 

This recipe illustrates how to load the widespread Iris flowers dataset. 

1 

2 

3 

4 

# Load the packaged iris flowers dataset 

# Iris flower dataset (4×150, reals, multi-label classification) 

iris = load_iris() 

print(iris) 

 

Load from CSV 

It is really typical for you to possess a dataset as a CSV file on your local workstation or on a remote server. 

This recipe illustrates to you how to load a CSV file from a URL, in this scenario the Pima Indians diabetes classification dataset. 

You can know more with regards to the dataset here. 

Dataset file 

Dataset details 

From the prepped X and Y variables, you can train a machine learning model. 

d the Pima Indians diabetes dataset from CSV URL 

Python 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

# Load the Pima Indians diabetes dataset from CSV URL 

import numpy as np 

import urllib 

# URL for the Pima Indians Diabetes dataset (UCI Machine Learning Repository) 

url = “https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv” 

# download the file 

raw_data = urllib.urlopen(url) 

# load the CSV file as a numpy matrix 

dataset = np.loadtxt(raw_data, delimiter=”,”) 

print(dataset.shape) 

# separate the data from the target attributes 

X = dataset[:,0:7] 

y = dataset[:,8] 

 Conclusion 

In this blog article, you found out about the scikit-learn method comes with packaged data sets which includes the iris flowers dataset. These datasets can be loaded simply and leveraged for explore and experiment with differing machine learning models. 

You also observed how you can load CSV data with scikit-learn. You learned of a way of opening CSV files from the web leveraging the urllib library and how you can read that information as a NumPy matrix for leveraging in scikit-learn. 

Add Comment