Összes szerző
Zsiga Tamás
az alábbi absztraktok szerzői között szerepel:
-
Oz Kilim
Regularizing the combinatorial fitness landscape -
Aug 31 - csütörtök
11:30 – 11:50
Elméleti biofizika
E42
Regularizing the combinatorial fitness landscape
Oz Kilim1,2, Alex Olar1 , Tamás Zsiga1 and István Csabai1
1 Eötvös Lorand University, Department of Physics of Complex Systems, Institute of Physics
2 Semmelweis University, Data-Driven Health Division of National Laboratory
High throughput sequencing and large scale biophysical measurements such as deep mutational scanning [1] coupled with the ability to learn high dimensional complex functions with neural networks opens an exciting window of opportunity for sequence to function mapping. However, due to the sparsity of real world measurements possible from the full combinatorial sequence space we must develop some heuristics for regularizing such deep networks to avoid overfitting or “shrinking” over the subspaces of measurements. Such regularization would allow for model generalization and the ability to make predictions of the function of unseen sequences tractable. We explore two potential avenues to accomplish such regularization to enable distant exploration of the fitness landscape In-silico.
We expire the heuristic of “factorization of the combinatorial space” with a geometric proof. We use this concept on NK model [2] generated data for the saturation of the combinatorial space of sequences with letter length 5. We present the theoretical bounds for fitness landscape reconstruction based on the theory of compressed sensing and pose open questions about DMS library design for optimal generalizability. We explore the Walsh Hadamard transform for regularizing the fitness landscape and present the idea that a key concept for progression in the topic would be to define optimal strategies for collecting data on ensembles of genotypes that is sufficient for discovering the biologically relevant epistatic structure of systems [3].
We take a real world manifestation of this problem: SARS-CoV-2 variant-of-concern prediction. Such models would allow for development of vaccines ahead of time and drastically reduce severity of epidemics. We explore how “early” receptor binding domain RBD sequence’s deep mutational scanning measurements [4] can act as a training set for models to make phenotypic predictions about Omicron variants such as antibody escape and ACE2 binding.
Acknowledgment
This work was supported by the European Union's Horizon 2020 research and innovation program under grant agreement No. 874735 (VEO) and by the National Research, Development, and Innovation Office of Hungary within the framework of the MILAB Artificial Intelligence National Laboratory. Further support was provided by the Stipendium Hungaricum Program under the Tempus Public Foundation. The article was funded by the National Research, Development and Innovation Office in Hungary (RRF-2.3.1-21-2022-00006, Data-Driven Health Division of National Laboratory for Health Security).
References
[1] Fowler, Douglas M., and Stanley Fields. "Deep mutational scanning: a new style of protein science." Nature methods 11.8 (2014): 801-807.
[2] Kauffman, Stuart A., and Edward D. Weinberger. "The NK model of rugged fitness landscapes and its application to maturation of the immune response." Journal of theoretical biology 141.2 (1989): 211-245.
[3] Aghazadeh, Amirali, et al. "Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions." Nature communications 12.1 (2021): 5225.
[4] Starr, Tyler N., et al. "Deep mutational scans for ACE2 binding, RBD expression, and antibody escape in the SARS-CoV-2 Omicron BA. 1 and BA. 2 receptor-binding domains." PLoS pathogens 18.11 (2022): e1010951.