Covid-19 Listening Project

Credit: Enzo De Sena/dreamstime.com

The Covid-19 Listening Project is a collaboration with audio researcher and programmer Dr Enzo De Sena (University of Surrey) with consultation with Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy) and Niki Loverdu (KU Leuven, Belgium). It aims to reveal and communicate characteristics of the Covid-19 genome sequence – and its mutations and strains – through data sonification. The project has been featured on Italian TV’s prime-time current affairs programme DiMartedi hosted by Barbara Gallavotti (broadcast on May 19 2020) to ≈3 million viewers, and in Metro London on November 13 2020 (≈1.3 million circulation). The sonification process, which allows the spike protein, mutations and the several strains are clearly audible is detailed below.

The initial process assigned the types of mutation and their position in the genome as – respectively – chromatic notes in metric placement. This is detailed in the report below. This system was employed in the creation of a choral work Chorus of Changes. Here, over 500 genome sequences are translated into two octaves of a B minor scale. The translations are selected by mapping the most common mutations types (‘note deltas’) into the most common diatonic scale degrees on a sample of Western Art Music (see Huron 2008) [6] using the DataLoop Crypto device. This results in familiar melodic motifs for the most commonly retained mutations and pandiatonic blurring for the more novel mutations. At the tempo selected this results in a surprisingly engaging piece of music lasting over 42 minutes, where the language of mutation is translated into the language of motivic transformation, a deeper sonification beyond arbitrary chromatic or ‘safe’ scale choices. This is performable by choir and organ but is here rendered with MIDI instrumentations in Ableton Live with UAD and Native Instrument plugins.

Data gathering

The project uses the database of the National Center for Biotechnology Information (NCBI), which is being updated every day with new COVID-19 sequences coming from research centres across the world. The website is available here: NCBI Covid-19 page

The data is parsed and downloaded using the covid-genome-matlab-parser.

How the mutations are obtained from the NCBI dataset

The first step is to obtain the mutations are obtained from the NCBI dataset:

  • The genomic comparison is run on the basis of the nucleotide sequences (in GACU symbols).
  • In order to reduce the amount of data (currently over 700 genomes), only the first measurement of a day is considered; this currently reduces the number of genomes to 74.
  • The comparison of genome pairs is carried out using the Needleman-Wunsch global alignment algorithm [1], which allows to identify the genetic mutations.
  • The mutations are considered starting from the beginning of the first protein (NSP1), i.e. at nucleotide n. 266 of the NCBI accession MT019529. Mismatches in the last 100 bases are also ignored (for reference, the RNA of COVID-19 is about 30k bases long).

How the mutations are translated to music

The second step is to translate the mutations into music. Below are two examples of how to do this.

COVID 19 Listening Project sonifying mutations over time up to April 2020
COVID 19 Listening Project sonifying the new variants (up to December 2020)

Below are the details of the procedure to generate the sound from the mutations:

  • The nucleotide mutations are translated into notes using the table below; a constant value is added to all of them; the  sign indicates an insertion or a deletion; all other types of mutations are ignored.
Old basisNew basisMidi noteNote
C47B2
U48C3
A49C♯3/D♭3
GA51D♯3/E♭3
GU52E3
GC53F3
AG54F♯3/G♭3
AU55G3
AC56G♯3/A♭3
UG57A3
UA58A♯3/B♭3
UC59B3
CG60C4
CA61C♯4/D♭4
CU62D4
G63D♯4/E♭4
A64E4
U65F4
C66F♯4/G♭4
G50D3
  • The mutations are then organised in 8 groups according to the table below.
Protein nameGroupInstrumentAngle
NSP11Cello-45°
NSP21Cello-45°
NSP32Cello-30°
NSP43Cello-15°
NSP53Cello-15°
NSP63Cello-15°
NSP73Cello-15°
NSP83Cello-15°
NSP94Cello
NSP104Cello
NSP124Cello
NSP135Cello+15°
NSP145Cello+15°
NSP155Cello+15°
NSP165Cello+15°
S6Double base+30°
ORF3a7Violin+45°
E7Violin+45°
M7Violin+45°
ORF67Violin+45°
ORF7a7Violin+45°
ORF87Violin+45°
N7Violin+45°
ORF107Violin+45°
Non-coding DNA8Violin+45°
  • In order to facilitate spatial discrimination, each group is rendered in a different direction, as indicated in the table above. This specific separation corresponds to the actual lenght of each protein. In other words, this results in a similar experience you’d have if the RNA was actually rolled out around you and each mutation would reproduce a sound from a corresponding direction. This means that not only one can identify the position of the mutation by using the time delay, but also the perceived position of the sound in space, with mutations closer to the beginning of the sequence appearing to your left, while later ones appearing to your right. The specific spatialisation technique used here is Perceptual Soundfield Reconstruction (PSR) [2,3,4].
  • In order to increase presence, each source is rendered with reverberation using Scattering Delay Networks (SDN) [2,5]; the simulated room is rectangular with size 10 m x 10 m x 3 m. The listener is approximately in the center of the space, and the sound source is 2 meters away from the listener.
  • In order to increase presence, each source is rendered with reverberation using Scattering Delay Networks (SDN) [2,5]; the simulated room is rectangular with size 10 m x 10 m x 3 m. The listener is approximately in the center of the space, and the sound source is 2 meters away from the listener.
  • The (informal) protein labels are taken from the NYT article in [7].

Authors

  • Enzo De Sena 
  • Milton Mermikides 

Department of Music and Media, University of Surrey, Guildford, UK

Acknowledgements

We would like to thank Gemma Bruno (Telethon Institute of Genetics and Medicine, Italy) and Niki Loverdu (KU Leuven, Belgium) for the useful discussions on topic. Also, we would like to thank our friends and colleagues for the useful feedback on the presentation.

The project also uses internally:

  • Yauhen Yakimovich’s yamlmatlab 
  • Ken Schutte’s MIDI tooldbox 

References

[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).

[2] H. Hacıhabiboğlu, E. De Sena, Z. Cvetković, J.D. Johnston and J. O. Smith III, “Perceptual Spatial Audio Recording, Simulation, and Rendering,” IEEE Signal Processing Magazine vol. 34, no. 3, pp. 36-54, May 2017.

[3] E. De Sena, Z. Cvetković, H. Hacıhabiboğlu, M. Moonen, and T. van Waterschoot, “Localization Uncertainty in Time-Amplitude Stereophonic Reproduction,” IEEE/ACM Trans. Audio, Speech and Language Process. (in press).

[4] E. De Sena, H. Hacıhabiboğlu, and Z. Cvetković, “Analysis and Design of Multichannel Systems for Perceptual Sound Field Reconstruction,” IEEE Trans. on Audio, Speech and Language Process., vol. 21 , no. 8, pp 1653-1665, Aug. 2013.

[5] E. De Sena, H. Hacıhabiboğlu, Z. Cvetković, and J. O. Smith III “Efficient Synthesis of Room Acoustics via Scattering Delay Networks,” IEEE/ACM Trans. Audio, Speech and Language Process., vol. 23, no. 9, pp 1478 – 1492, Sept. 2015.

[6] Huron, D. (2008) Sweet Anticipation: Music and the Psychology of Expectation. MIT Press

[7] https://www.nytimes.com/interactive/2020/04/03/science/coronavirus-genome-bad-news-wrapped-in-protein.html (Accessed on: 8/3/2020)

(Visited 30 times, 1 visits today)