Loading ¶
In the vast majority of cases we will not be generating data, but loading it from somewhere else.
We use the
np.load
function to put data stored in a
.npy
or
.npz
file into a variable.
Usually you have the NumPy array stored on your computer; however, to make your life easier I host our data sets online so you can download them whenever.
In this example, I have
protein-contact-maps.npy
which stores 500 protein contact maps.
Note
A protein contact map is a graphical representation that illustrates spatial proximity between amino acid residues in a protein structure. It is commonly used in structural bioinformatics to visualize and analyze the interactions and relationships between different parts of a protein.
If you had this file on your computer, you can simply specify the local file path.
np.load("protein-contact-maps.npy")
To get this array automatically in Python, we have to use the
urllib.request
module to request the data, then convert it into a format NumPy can read.
You do not need to know how this works for this course, but I just want you to know what is going on when you see code from me like this in the future.
import io
from urllib import request
npy_file_url = "https://github.com/oasci-bc/python/raw/refs/heads/main/docs/files/npy/steamboat-willie.npy"
# Download the .npy file
response = request.urlopen(npy_file_url)
content = response.read()
# Load the .npy file
contact_maps = np.load(io.BytesIO(content))
# Print information from the array.
print(contact_maps.ndim)
print(contact_maps.shape)
print(contact_maps[0][0])
3 (500, 16, 16) [ 0. 3.83347011 6.93431997 9.98147011 12.15553474 10.16739845 9.63760185 13.3811655 14.22265434 12.16785812 11.25229836 13.22015572 14.96301937 17.38097 19.78103447 22.77887917]
The line
from urllib import request
imports the request module from the
urllib
package in Python.
Specifically, it imports the
request
module from the
urllib
library, which provides functions for opening and reading URLs.