2022-06-18

Why are there so many zeros in my wav reading?

I'm trying to make a speech recognition algorithm. I've a wav file containing +-20 minutes of speech. I've read it into a numpy array, each chunck of 1024 values is a row. As somehow not all chunks, provided by the wave module's file's readframes method, are of the same length, some rows are padded with zeros using the numpy.padd function in order to make the array have a homogenous shape. These paddings are only appended behind the array, and thus cannot cause the following.

I've noticed that there are columns in the array that are only containing zeros. These columns are always occuring in pairs and are always seperated by two columns containing normal values. The sixth column seems to be an exception on this pattern: It does also contain ones sometimes. Are these columns real recorded speach or are they put in between by my computer for some reason? This is important to know, as I don't wan't my algorithm to train on computer generated values. Should I delete those values or is it better to keep them? The array does not have to be playable as audio anymore.

Here's a sample out of the array. I can't attach the complete array, as it is way to big for that. A more complete version can be found here.

[78, 1, 0, 0, 79, 1, 0, 0, 12, 1, 0, 0, 185, 0, 0, 0, 177, 0, 0, 0, 28, 1, 0, 0, 245, 1, 0, 0, 38, 3, 0, 0, 106, 4, 0, 0, 81, 5, 0, 0, 148, 5, 0, 0, 74, 5, 0, 0, 168, 4, 0, 0, 229, 3, 0, 0, 83, 3, 0, 0, 31, 3, 0, 0, 26, 3, 0, 0, 33, 3, 0, 0, 40, 3, 0, 0, 22, 3, 0, 0, 246, 2, 0, 0, 211, 2, 0, 0, 136, 2, 0, 0, 240, 1, 0, 0, 247, 0, 0, 0, 176, 255, 0, 0, 97, 254, 0, 0, 69, 253, 0, 0, 131, 252, 0, 0, 54, 252, 0, 0, 81, 252, 0, 0, 188, 252, 0, 0, 79, 253, 0, 0, 207, 253, 0, 0, 48, 254, 0, 0, 97, 254, 0, 0, 73, 254, 0, 0, 6, 254, 0, 0, 175, 253, 0, 0, 90, 253, 0, 0, 58, 253, 0, 0, 73, 253, 0, 0, 101, 253, 0, 0, 132, 253, 0, 0, 147, 253, 0, 0, 164, 253, 0, 0, 199, 253, 0, 0, 224, 253, 0, 0, 8, 254, 0, 0, 97, 254, 0, 0, 228, 254, 0, 0, 163, 255, 0, 0, 138, 0, 0, 0, 89, 1, 0, 0, 251, 1, 0, 0, 45, 2, 0, 0, 157, 1, 0, 0, 95, 0, 0, 0, 161, 254, 0, 0, 185, 252, 0, 0, 56, 251, 0, 0, 134, 250, 0, 0, 226, 250, 0, 0, 51, 252, 0, 0, 246, 253, 0, 0, 175, 255, 0, 0, 208, 0, 0, 0, 240, 0, 0, 0, 96, 0, 0, 0, 124, 255, 0, 0, 119, 254, 0, 0, 223, 253, 0, 0, 243, 253, 0, 0, 120, 254, 0, 0, 77, 255, 0, 0, 254, 255, 0, 0, 253, 255, 0, 0, 63, 255, 0, 0, 224, 253, 0, 0, 61, 252, 0, 0, 27, 251, 0, 0, 35, 251, 0, 0, 180, 252, 0, 0, 154, 255, 0, 0, 1, 3, 0, 0, 6, 6, 0, 0, 189, 7, 0, 0, 116, 7, 0, 0, 111, 5, 0, 0, 118, 2, 0, 0, 116, 255, 0, 0, 133, 253, 0, 0, 55, 253, 0, 0, 124, 254, 0, 0, 17, 1, 0, 0, 19, 4, 0, 0, 87, 6, 0, 0, 42, 7, 0, 0, 53, 6, 0, 0, 195, 3, 0, 0]
[240, 0, 0, 0, 235, 254, 0, 0, 155, 254, 0, 0, 87, 0, 0, 0, 80, 3, 0, 0, 25, 6, 0, 0, 137, 7, 0, 0, 253, 6, 0, 0, 146, 4, 0, 0, 57, 1, 0, 0, 35, 254, 0, 0, 73, 252, 0, 0, 56, 252, 0, 0, 185, 253, 0, 0, 251, 255, 0, 0, 60, 2, 0, 0, 204, 3, 0, 0, 18, 4, 0, 0, 15, 3, 0, 0, 21, 1, 0, 0, 137, 254, 0, 0, 95, 252, 0, 0, 99, 251, 0, 0, 137, 251, 0, 0, 210, 252, 0, 0, 46, 255, 0, 0, 161, 1, 0, 0, 40, 3, 0, 0, 102, 3, 0, 0, 95, 2, 0, 0, 127, 0, 0, 0, 92, 254, 0, 0, 160, 252, 0, 0, 18, 252, 0, 0, 216, 252, 0, 0, 133, 254, 0, 0, 181, 0, 0, 0, 136, 2, 0, 0, 248, 2, 0, 0, 213, 1, 0, 0, 130, 255, 0, 0, 202, 252, 0, 0, 199, 250, 0, 0, 241, 249, 0, 0, 56, 250, 0, 0, 121, 251, 0, 0, 15, 253, 0, 0, 72, 254, 0, 0, 235, 254, 0, 0, 188, 254, 0, 0, 180, 253, 0, 0, 61, 252, 0, 0, 191, 250, 0, 0, 167, 249, 0, 0, 108, 249, 0, 0, 33, 250, 0, 0, 144, 251, 0, 0, 106, 253, 0, 0, 20, 255, 0, 0, 227, 255, 0, 0, 169, 255, 0, 0, 173, 254, 0, 0, 82, 253, 0, 0, 3, 252, 0, 0, 69, 251, 0, 0, 115, 251, 0, 0, 146, 252, 0, 0, 109, 254, 0, 0, 137, 0, 0, 0, 64, 2, 0, 0, 28, 3, 0, 0, 241, 2, 0, 0, 233, 1, 0, 0, 148, 0, 0, 0, 143, 255, 0, 0, 60, 255, 0, 0, 183, 255, 0, 0, 185, 0, 0, 0, 203, 1, 0, 0, 149, 2, 0, 0, 208, 2, 0, 0, 97, 2, 0, 0, 127, 1, 0, 0, 103, 0, 0, 0, 66, 255, 0, 0, 74, 254, 0, 0, 172, 253, 0, 0, 121, 253, 0, 0, 158, 253, 0, 0, 218, 253, 0, 0, 242, 253, 0, 0, 207, 253, 0, 0, 97, 253, 0, 0, 191, 252, 0, 0, 91, 252, 0, 0, 144, 252, 0, 0, 82, 253, 0, 0, 100, 254, 0, 0, 122, 255, 0, 0, 34, 0, 0, 0]
[1, 0, 0, 0, 65, 255, 0, 0, 89, 254, 0, 0, 151, 253, 0, 0, 47, 253, 0, 0, 69, 253, 0, 0, 221, 253, 0, 0, 237, 254, 0, 0, 76, 0, 0, 0, 166, 1, 0, 0, 187, 2, 0, 0, 133, 3, 0, 0, 3, 4, 0, 0, 86, 4, 0, 0, 179, 4, 0, 0, 47, 5, 0, 0, 198, 5, 0, 0, 96, 6, 0, 0, 197, 6, 0, 0, 210, 6, 0, 0, 145, 6, 0, 0, 37, 6, 0, 0, 217, 5, 0, 0, 226, 5, 0, 0, 73, 6, 0, 0, 35, 7, 0, 0, 86, 8, 0, 0, 105, 9, 0, 0, 11, 10, 0, 0, 252, 9, 0, 0, 222, 8, 0, 0, 224, 6, 0, 0, 151, 4, 0, 0, 73, 2, 0, 0, 111, 0, 0, 0, 152, 255, 0, 0, 125, 255, 0, 0, 130, 255, 0, 0, 105, 255, 0, 0, 214, 254, 0, 0, 135, 253, 0, 0, 223, 251, 0, 0, 108, 250, 0, 0, 144, 249, 0, 0, 117, 249, 0, 0, 255, 249, 0, 0, 244, 250, 0, 0, 239, 251, 0, 0, 101, 252, 0, 0, 38, 252, 0, 0, 99, 251, 0, 0, 105, 250, 0, 0, 195, 249, 0, 0, 239, 249, 0, 0, 4, 251, 0, 0, 216, 252, 0, 0, 218, 254, 0, 0, 61, 0, 0, 0, 161, 0, 0, 0, 19, 0, 0, 0, 227, 254, 0, 0, 190, 253, 0, 0, 69, 253, 0, 0, 179, 253, 0, 0, 209, 254, 0, 0, 21, 0, 0, 0, 250, 0, 0, 0, 35, 1, 0, 0, 76, 0, 0, 0, 162, 254, 0, 0, 204, 252, 0, 0, 104, 251, 0, 0, 239, 250, 0, 0, 177, 251, 0, 0, 127, 253, 0, 0, 162, 255, 0, 0, 42, 1, 0, 0, 95, 1, 0, 0, 246, 255, 0, 0, 23, 253, 0, 0, 154, 249, 0, 0, 220, 246, 0, 0, 235, 245, 0, 0, 32, 247, 0, 0, 33, 250, 0, 0, 214, 253, 0, 0, 203, 0, 0, 0, 227, 1, 0, 0, 234, 0, 0, 0, 137, 254, 0, 0, 202, 251, 0, 0, 251, 249, 0, 0, 42, 250, 0, 0, 77, 252, 0, 0, 154, 255, 0, 0, 24, 3, 0, 0, 162, 5, 0, 0, 107, 6, 0, 0, 139, 5, 0, 0, 154, 3, 0, 0]


No comments:

Post a Comment