Class for RIR measurement databases

Class to get the acoustic time-series and other meta-data of RIR acoustic measurements

A. Helper funtions

We will define many class properties with @property and to make sure all the attributes are initialized before their use, we define the following method

source

checked_property

 checked_property (attr_name:str, attr_type:type=<class 'object'>,
                   doc:Optional[str]=None)

Ensures that the attribute is initialized before accessing it.

	Type	Default	Details
attr_name	str		string with the name of the protected attribute to access, example: ’_fs’
attr_type	type	object	Type of the attribute: for _fs for example is float
doc	Optional	None	String containing a descrption of the class attribute

Example of use:

class Mics(ABC):  
    _fs: Optional[int] = None  
    fs = checked_property('_fs', float)

And if we use the property to access _fs without it being initialized, it should give an error

mic = Mics()
print(mic._fs)  # ``_fs`` is None,
try:
    print(mic.fs)  # ❌ But the property ``fs`` requires _fs to be initialized to a float value
except ValueError as e:
    print(f"Caught ValueError: {e}")

None
Caught ValueError: Attribute '_fs' is not initialized.

B. Database for microphones

The base class to handle RIR measurements.

This class defines common properties and methods for the different RIR databases that will inherit from it. The class DB_microphones will be an abstract class (from abc import ABC, abstractmethod)

ABC: base clase to declare an Abstract Base Class
abstractmethod: it is a decorator to indicate which methods have to be implemented by the subclasses

This is useful since this base class can not be implemented and will force the subclasses to implement certain methods abstractmethod

Inspired by MNIST dataset, we will download the data in a folder structure like ./root/class_name/raw.

root: is a parameter passed to the class
class_name: is the name of the class used to download the database
raw: is the subfolder where the raw data is downloaded

and we will include a mirror list with the urls where we can find the data to download, and a list resources that contains tuples with the name of the file to download and it’s md5 checksum.

Base class

source

DB_microphones

 DB_microphones (root:str='./data', dataname:str='RIR',
                 signal_start:int=0, signal_size:Optional[int]=None)

Base class for microphone databases. I define the @property methods here, so I don’t have to redefine them in the subclasses.

	Type	Default	Details
root	str	./data	Path to the root directory of the database, where the data will be dowloaded
dataname	str	RIR	String matching the name of the resources to download and load. (if several resources are available, all will be downloaded but only the first one will be loaded).
signal_start	int	0	Start index of the signal in the data
signal_size	Optional	None	int or None. Size of the signal to be extracted from the data, if None, the whole signal will be loaded.

Zea database

Database from Elias Zea . It will inherit from DB_microphones

This is one of the RIR databases. It will have to implement it’s own attributes:
+ mirrors
+ resources
+ microphone spacing

And the methods:
+ To check what resource to load
+ To download the resources
+ To unpack the downloaded resources
+ To load the selected resource (database/dataname)
+ To get the different attributes in the database: dx, dt, fs, num_mics, num_sources
+ And also the data related with the microphone recordings: imic, position, time_samples, signal

source

ZeaRIR

 ZeaRIR (root:str='./data', dataname:str='Balder', signal_start:int=0,
         signal_size:Optional[int]=None)

ZeaRIR database.

	Type	Default	Details
root	str	./data	Path to the root directory of the database, where the data will be dowloaded
dataname	str	Balder	String matching the name of the resources to download and load. (if several resources are available, all will be downloaded but only the first one will be loaded).
signal_start	int	0	Start index of the signal to load.
signal_size	Optional	None	# int or None. Size of the signal to be extracted from the data, if None, the whole signal will be loaded.

Checks that Zea database works

db = ZeaRIR(root="./data", dataname="RIR", signal_start=0, signal_size=128)

Matched resources to download:
- BalderRIR.mat
- FrejaRIR.mat
- MuninRIR.mat
Loading the resource ./data/ZeaRIR/raw/BalderRIR.mat ...

It has checked what resources match with dataname “RIR”, and found three resources. It downloads all the matching resources. It only loads the data for the first resource “Balder”, because there each object of this class should only return signals from the same room. To load other rooms I give a singular dataname corresponding to the name of that resource.
If the resources are already in the folder, it will skip the download:

db._download_resource(resource_name="Balder") # Just return (no error message) because "BalderRIR.mat" is in the raw folder

And we can check that the correct room and its parameters are properly loaded

print(db)

Database: ZeaRIR
Download: ['BalderRIR.mat', 'FrejaRIR.mat', 'MuninRIR.mat']
Load room: BalderRIR.mat
Path to raw resource: ./data/ZeaRIR/raw/BalderRIR.mat
Path to unpacked data folder: ./data/ZeaRIR/raw
Sampling frequency: 11250 Hz
Number of microphones: 100
Number of total time samples: 3623
Number of time samples selected: 128
Number of sources: 1
Signal start: 0
Signal size: 128
Source ID: 0

We can check the data that it has loaded from the memory and that the main get methods work:

print(f"Loaded chunk of data of size {db._RIR.shape}")
print(f"Output of get_mic  (4 time samples): {db.get_mic(imic=0, start=0, size=4)}")
print(f"Output of get_time (4 time samples): {db.get_time(start=0, size=4)}")
print(f"Test of get_pos: {db.get_pos(imic=1)}")

Loaded chunk of data of size (128, 100)
Output of get_mic  (4 time samples): [ 0.00041836  0.0001148  -0.00129174  0.00162724]
Output of get_time (4 time samples): [0.00000000e+00 8.88888889e-05 1.77777778e-04 2.66666667e-04]
Test of get_pos: [0.03 0.   0.  ]

Before implementing the downloading method, I used this code to test how to download the resources and what MD5 should I write for each resource (since it is not provided in the given mirror).

from torchvision.datasets.utils import calculate_md5, check_md5

# db = ZeaRIR(root="./data")
for file, md5_class in db.resources:
    url = os.path.join(db.mirrors[0], file)
    download_url(url, root=db.raw_folder, filename=file)
    md5 = calculate_md5(os.path.join(db.raw_folder, file))
    print(f"File: {file}, MD5: {md5}")
    assert check_md5(os.path.join(db.raw_folder, file), md5_class), (
    f"Check the MD5 of the resource '{file}' for the class '{db.__class__.__name__}' "
)

File: BalderRIR.mat, MD5: bc904010041dc18e54a1a61b23ee3f99
File: FrejaRIR.mat, MD5: 1dedf2ab190ad48fbfa9403409418a1d
File: MuninRIR.mat, MD5: 5c90de0cbbc61128de332fffc64261c9

It may be useful to check the name of the resources before instantiating an object (which will initiate the downloading process).
I can implement a class method to print the resources that can be downloaded.

ZeaRIR.print_resources()

Resources for class ZeaRIR:
- BalderRIR.mat 
- FrejaRIR.mat 
- MuninRIR.mat

Note

I am developing using nbdev, which includes an option patch from the library fastcore, that allows to implement a method of a class outside of the class definition, by declaring to which class it has to “patch” the method.
In the autogenerated .py file it will appear in a way that I am not that familiar, so I opted to just use patch for didactic purposes, but the exported code is already in the class definition.

@patch(cls_method=True)  
def print_resources(cls: DB_microphones):
    print(f"!!Method overwritten by a patch!!")
    print(f"Resources for class {cls.__name__}:")
    for name, md5 in cls.resources:
        print(f"- {name} ")

Note

Pylance linting does not like patch and will underline it as a possible error.
I have added it directly to the class (the following code is just for testing purposes). (This is a callout from Quarto)

I can overwrite the method with patch (note the extra line)

ZeaRIR.print_resources()

!!Method overwritten by a patch!!
Resources for class ZeaRIR:
- BalderRIR.mat 
- FrejaRIR.mat 
- MuninRIR.mat

MeshRIR database

Database from Shoichi Koyama, National Institute of Informatics, Tokyo, Japan . It will inherit from DB_micorphones

source

MeshRIR

 MeshRIR (root:str='./data', dataname:str='S1', signal_start:int=0,
          signal_size:Optional[int]=None, source_id:int=0)

Base class for microphone databases. I define the @property methods here, so I don’t have to redefine them in the subclasses.

	Type	Default	Details
root	str	./data	Path to the root directory of the database, where the data will be dowloaded
dataname	str	S1	String matching the name of the resources to download and load. (if several resources are available, all will be downloaded but only the first one will be loaded).
signal_start	int	0	Start index of the signal to load.
signal_size	Optional	None	Size of the signal to load. If None, the whole signal will be loaded.
source_id	int	0

Now let’s check the MeshRIR database implementation:

db2 = MeshRIR(root="./data", dataname="S32", signal_start=0, signal_size=128, source_id=31)

Matched resources to download:
- S32-M441_npy.zip
Unpacked folder ./data/MeshRIR/raw/S32-M441_npy already exists. Skipping unpacking.

Since this is a heavier database, I have already checked that the downloading method works.
This database requires to unzip the resource, the class has checked that the unpacked folder already exists, so it does not download and unpack the resource.

print(db2)

Database: MeshRIR
Download: ['S32-M441_npy.zip']
Load room: S32-M441_npy.zip
Path to raw resource: ./data/MeshRIR/raw/S32-M441_npy.zip
Path to unpacked data folder: ./data/MeshRIR/raw/S32-M441_npy
Sampling frequency: 48000 Hz
Number of microphones: 441
Number of total time samples: 32768
Number of time samples selected: 128
Number of sources: 32
Signal start: 0
Signal size: 128
Source ID: 31

Test of main get methods:

print(f"In this database we do not preload all the database.")
print(f"Output of get_mic  (4 time samples): {db2.get_mic(imic=0, start=0, size=4)}")
print(f"Output of get_time (4 time samples): {db2.get_time(start=0, size=4)}")
print(f"Test of get_pos: {db2.get_pos(imic=1)}")

In this database we do not preload all the database.
Output of get_mic  (4 time samples): [0.00599654 0.00572385 0.00485317 0.00515282]
Output of get_time (4 time samples): [0.00000000e+00 2.08333333e-05 4.16666667e-05 6.25000000e-05]
Test of get_pos: [-0.4 -0.5  0. ]

Good Practices (after coding)

Things that I have learnt, or thought they are interesting after coding this notebook

Use of Inheritance
- There are different experimental databases but it is useful to crete a base class with the methods that I want to use for my applications.
- In the base class I try to define common attributes. The “protected” attributes starting with underscore ex: _fs. The “private” attributes starting with double-underscore ex: __fs.
- The class can have methods getter to return the protected and private attributes. In particular Iwill use @property to define which attributes I want to access. I can access obj._fs with obj.fs property method.
- It is possible to instantiate objecs of the base class, although it will not have the information we require, since this is an abstract class. To avoid wrong uses, there is the package abc (abstract base class) that includes definitions that are useful to define the behaviour of classes like this.
- Inheriting from ABC (Abstract Base Class) and declaring @abstractmethods that each subclass have to implement, avoids the instantiation of objects of any abstractclass or its subclasses is the abstractmethods are not overridden.
- This is useful to remind you that you have to implement all the abstractmethods before you can use a class.
- The base class contains the commonalities between databases so I do not have to repeat code.
- In the base class I can init only the strictly necessary attributes, but if there is a set of operations that may be used in different subclasses, I can define a method, like _prepare_data(self), and in the subclasses.__init__() I can use that method defined in the base class. This avoids a case where a new subclass has a different init logic and I have to review the init logic of the base class.
Logic and options to download the databases
- Inspired by MNIST I can write in the subclasses the class attributes mirrors and resources, with the urls where I can download the files(resources).
- From MNIST I also use some downloading logic and what libraries to use to download, unpack and check data.
- When downloading files from github, do not use the url that can be seen in the explorer, but use the url where github saves the raw data: "https://raw.githubusercontent.com/{USER}/{REPO}/{BRANCH}/{PATH_DATA_FOLDER}/" substituting the USER, REPO, BRANCH, PATH_DATA_FOLDER of the file that you want to download, as seen in the normal github url of the data.
Use a lot assert
- It is very useful to check for errors and that your parameters are supposed to be of a certain kind or in certain bounds.
- Sometimes Pylance or other linters show errors although the code is perfectly functional, because it can not detect the type of your data, an assert before the line of code where Pylance shows an error can tell Pylance that your data is gonna be of the type that is supposed to be, therefore, the operations such as + are compatible with those variables.
Use the method __str__(), to print useful information of the object, like different attributes, statistics, etc. Then use it as print(obj).