class Mics(ABC):
int] = None
_fs: Optional[= checked_property('_fs', float) fs
Class for RIR measurement databases
A. Helper funtions
We will define many class properties with @property
and to make sure all the attributes are initialized before their use, we define the following method
checked_property
checked_property (attr_name:str, attr_type:type=<class 'object'>, doc:Optional[str]=None)
Ensures that the attribute is initialized before accessing it.
Type | Default | Details | |
---|---|---|---|
attr_name | str | string with the name of the protected attribute to access, example: ’_fs’ | |
attr_type | type | object | Type of the attribute: for _fs for example is float |
doc | Optional | None | String containing a descrption of the class attribute |
Example of use:
And if we use the property to access _fs
without it being initialized, it should give an error
= Mics()
mic print(mic._fs) # ``_fs`` is None,
try:
print(mic.fs) # ❌ But the property ``fs`` requires _fs to be initialized to a float value
except ValueError as e:
print(f"Caught ValueError: {e}")
None
Caught ValueError: Attribute '_fs' is not initialized.
B. Database for microphones
The base class to handle RIR measurements.
This class defines common properties and methods for the different RIR databases that will inherit from it. The class DB_microphones will be an abstract class (from abc import ABC, abstractmethod)
- ABC: base clase to declare an Abstract Base Class
- abstractmethod: it is a decorator to indicate which methods have to be implemented by the subclasses
This is useful since this base class can not be implemented and will force the subclasses to implement certain methods abstractmethod
Inspired by MNIST dataset, we will download the data in a folder structure like ./root/class_name/raw
.
- root: is a parameter passed to the class
- class_name: is the name of the class used to download the database
- raw: is the subfolder where the raw data is downloaded
and we will include a mirror
list with the urls where we can find the data to download, and a list resources
that contains tuples with the name of the file to download and it’s md5 checksum.
Base class
DB_microphones
DB_microphones (root:str='./data', dataname:str='RIR', signal_start:int=0, signal_size:Optional[int]=None)
Base class for microphone databases. I define the @property methods here, so I don’t have to redefine them in the subclasses.
Type | Default | Details | |
---|---|---|---|
root | str | ./data | Path to the root directory of the database, where the data will be dowloaded |
dataname | str | RIR | String matching the name of the resources to download and load. (if several resources are available, all will be downloaded but only the first one will be loaded). |
signal_start | int | 0 | Start index of the signal in the data |
signal_size | Optional | None | int or None. Size of the signal to be extracted from the data, if None, the whole signal will be loaded. |
Zea database
Database from Elias Zea . It will inherit from DB_microphones
This is one of the RIR databases. It will have to implement it’s own attributes:
+ mirrors
+ resources
+ microphone spacing
And the methods:
+ To check what resource to load
+ To download the resources
+ To unpack the downloaded resources
+ To load the selected resource (database/dataname)
+ To get the different attributes in the database: dx
, dt
, fs
, num_mics
, num_sources
+ And also the data related with the microphone recordings: imic
, position
, time_samples
, signal
ZeaRIR
ZeaRIR (root:str='./data', dataname:str='Balder', signal_start:int=0, signal_size:Optional[int]=None)
ZeaRIR database.
Type | Default | Details | |
---|---|---|---|
root | str | ./data | Path to the root directory of the database, where the data will be dowloaded |
dataname | str | Balder | String matching the name of the resources to download and load. (if several resources are available, all will be downloaded but only the first one will be loaded). |
signal_start | int | 0 | Start index of the signal to load. |
signal_size | Optional | None | # int or None. Size of the signal to be extracted from the data, if None, the whole signal will be loaded. |
Checks that Zea database works
= ZeaRIR(root="./data", dataname="RIR", signal_start=0, signal_size=128) db
Matched resources to download:
- BalderRIR.mat
- FrejaRIR.mat
- MuninRIR.mat
Loading the resource ./data/ZeaRIR/raw/BalderRIR.mat ...
It has checked what resources match with dataname “RIR”, and found three resources. It downloads all the matching resources. It only loads the data for the first resource “Balder”, because there each object of this class should only return signals from the same room. To load other rooms I give a singular dataname corresponding to the name of that resource.
If the resources are already in the folder, it will skip the download:
="Balder") # Just return (no error message) because "BalderRIR.mat" is in the raw folder db._download_resource(resource_name
And we can check that the correct room and its parameters are properly loaded
print(db)
Database: ZeaRIR
Download: ['BalderRIR.mat', 'FrejaRIR.mat', 'MuninRIR.mat']
Load room: BalderRIR.mat
Path to raw resource: ./data/ZeaRIR/raw/BalderRIR.mat
Path to unpacked data folder: ./data/ZeaRIR/raw
Sampling frequency: 11250 Hz
Number of microphones: 100
Number of total time samples: 3623
Number of time samples selected: 128
Number of sources: 1
Signal start: 0
Signal size: 128
Source ID: 0
We can check the data that it has loaded from the memory and that the main get methods work:
print(f"Loaded chunk of data of size {db._RIR.shape}")
print(f"Output of get_mic (4 time samples): {db.get_mic(imic=0, start=0, size=4)}")
print(f"Output of get_time (4 time samples): {db.get_time(start=0, size=4)}")
print(f"Test of get_pos: {db.get_pos(imic=1)}")
Loaded chunk of data of size (128, 100)
Output of get_mic (4 time samples): [ 0.00041836 0.0001148 -0.00129174 0.00162724]
Output of get_time (4 time samples): [0.00000000e+00 8.88888889e-05 1.77777778e-04 2.66666667e-04]
Test of get_pos: [0.03 0. 0. ]
Before implementing the downloading method, I used this code to test how to download the resources and what MD5 should I write for each resource (since it is not provided in the given mirror).
from torchvision.datasets.utils import calculate_md5, check_md5
# db = ZeaRIR(root="./data")
for file, md5_class in db.resources:
= os.path.join(db.mirrors[0], file)
url =db.raw_folder, filename=file)
download_url(url, root= calculate_md5(os.path.join(db.raw_folder, file))
md5 print(f"File: {file}, MD5: {md5}")
assert check_md5(os.path.join(db.raw_folder, file), md5_class), (
f"Check the MD5 of the resource '{file}' for the class '{db.__class__.__name__}' "
)
File: BalderRIR.mat, MD5: bc904010041dc18e54a1a61b23ee3f99
File: FrejaRIR.mat, MD5: 1dedf2ab190ad48fbfa9403409418a1d
File: MuninRIR.mat, MD5: 5c90de0cbbc61128de332fffc64261c9
It may be useful to check the name of the resources before instantiating an object (which will initiate the downloading process).
I can implement a class method to print the resources that can be downloaded.
ZeaRIR.print_resources()
Resources for class ZeaRIR:
- BalderRIR.mat
- FrejaRIR.mat
- MuninRIR.mat
I am developing using nbdev, which includes an option patch
from the library fastcore
, that allows to implement a method of a class outside of the class definition, by declaring to which class it has to “patch” the method.
In the autogenerated .py file it will appear in a way that I am not that familiar, so I opted to just use patch for didactic purposes, but the exported code is already in the class definition.
@patch(cls_method=True)
def print_resources(cls: DB_microphones):
print(f"!!Method overwritten by a patch!!")
print(f"Resources for class {cls.__name__}:")
for name, md5 in cls.resources:
print(f"- {name} ")
Pylance linting does not like patch
and will underline it as a possible error.
I have added it directly to the class (the following code is just for testing purposes). (This is a callout from Quarto)
I can overwrite the method with patch (note the extra line)
ZeaRIR.print_resources()
!!Method overwritten by a patch!!
Resources for class ZeaRIR:
- BalderRIR.mat
- FrejaRIR.mat
- MuninRIR.mat
MeshRIR database
Database from Shoichi Koyama, National Institute of Informatics, Tokyo, Japan . It will inherit from DB_micorphones
MeshRIR
MeshRIR (root:str='./data', dataname:str='S1', signal_start:int=0, signal_size:Optional[int]=None, source_id:int=0)
Base class for microphone databases. I define the @property methods here, so I don’t have to redefine them in the subclasses.
Type | Default | Details | |
---|---|---|---|
root | str | ./data | Path to the root directory of the database, where the data will be dowloaded |
dataname | str | S1 | String matching the name of the resources to download and load. (if several resources are available, all will be downloaded but only the first one will be loaded). |
signal_start | int | 0 | Start index of the signal to load. |
signal_size | Optional | None | Size of the signal to load. If None, the whole signal will be loaded. |
source_id | int | 0 |
Now let’s check the MeshRIR database implementation:
= MeshRIR(root="./data", dataname="S32", signal_start=0, signal_size=128, source_id=31) db2
Matched resources to download:
- S32-M441_npy.zip
Unpacked folder ./data/MeshRIR/raw/S32-M441_npy already exists. Skipping unpacking.
Since this is a heavier database, I have already checked that the downloading method works.
This database requires to unzip the resource, the class has checked that the unpacked folder already exists, so it does not download and unpack the resource.
print(db2)
Database: MeshRIR
Download: ['S32-M441_npy.zip']
Load room: S32-M441_npy.zip
Path to raw resource: ./data/MeshRIR/raw/S32-M441_npy.zip
Path to unpacked data folder: ./data/MeshRIR/raw/S32-M441_npy
Sampling frequency: 48000 Hz
Number of microphones: 441
Number of total time samples: 32768
Number of time samples selected: 128
Number of sources: 32
Signal start: 0
Signal size: 128
Source ID: 31
Test of main get methods:
print(f"In this database we do not preload all the database.")
print(f"Output of get_mic (4 time samples): {db2.get_mic(imic=0, start=0, size=4)}")
print(f"Output of get_time (4 time samples): {db2.get_time(start=0, size=4)}")
print(f"Test of get_pos: {db2.get_pos(imic=1)}")
In this database we do not preload all the database.
Output of get_mic (4 time samples): [0.00599654 0.00572385 0.00485317 0.00515282]
Output of get_time (4 time samples): [0.00000000e+00 2.08333333e-05 4.16666667e-05 6.25000000e-05]
Test of get_pos: [-0.4 -0.5 0. ]
Good Practices (after coding)
Things that I have learnt, or thought they are interesting after coding this notebook
- Use of Inheritance
- There are different experimental databases but it is useful to crete a base class with the methods that I want to use for my applications.
- In the base class I try to define common attributes. The “protected” attributes starting with underscore ex: _fs. The “private” attributes starting with double-underscore ex: __fs.
- The class can have methods getter to return the protected and private attributes. In particular Iwill use
@property
to define which attributes I want to access. I can accessobj._fs
withobj.fs
property method. - It is possible to instantiate objecs of the base class, although it will not have the information we require, since this is an
abstract
class. To avoid wrong uses, there is the package abc (abstract base class) that includes definitions that are useful to define the behaviour of classes like this. - Inheriting from ABC (Abstract Base Class) and declaring
@abstractmethods
that each subclass have to implement, avoids the instantiation of objects of any abstractclass or its subclasses is the abstractmethods are not overridden. - This is useful to remind you that you have to implement all the abstractmethods before you can use a class.
- The base class contains the commonalities between databases so I do not have to repeat code.
- In the base class I can init only the strictly necessary attributes, but if there is a set of operations that may be used in different subclasses, I can define a method, like
_prepare_data(self)
, and in the subclasses.__init__() I can use that method defined in the base class. This avoids a case where a new subclass has a different init logic and I have to review the init logic of the base class.
- Logic and options to download the databases
- Inspired by MNIST I can write in the subclasses the class attributes
mirrors
andresources
, with the urls where I can download the files(resources). - From MNIST I also use some downloading logic and what libraries to use to download, unpack and check data.
- When downloading files from github, do not use the url that can be seen in the explorer, but use the url where github saves the raw data:
"https://raw.githubusercontent.com/{USER}/{REPO}/{BRANCH}/{PATH_DATA_FOLDER}/"
substituting theUSER
,REPO
,BRANCH
,PATH_DATA_FOLDER
of the file that you want to download, as seen in the normal github url of the data.
- Inspired by MNIST I can write in the subclasses the class attributes
- Use a lot
assert
- It is very useful to check for errors and that your parameters are supposed to be of a certain kind or in certain bounds.
- Sometimes Pylance or other linters show errors although the code is perfectly functional, because it can not detect the type of your data, an assert before the line of code where Pylance shows an error can tell Pylance that your data is gonna be of the type that is supposed to be, therefore, the operations such as + are compatible with those variables.
- Use the method
__str__()
, to print useful information of the object, like different attributes, statistics, etc. Then use it asprint(obj)
.