The Internet Archive and URL configured websites
Links are critical to deep-time knowledge preservation
Links are critical to deep-time knowledge preservation
Cheap and simple data blob storage is widely available and relatively easy to access and use. However, easily applying the right kind of compute is still laborious with too many complex barriers, arcane commands, hard to understand costs, and psychological anxiety that interrupts flow state when prototyping, analyzing, and generally doing data intensive research and engineering.
URLs can safely and effectively store user created code or configuration (even credentials [1]) (access to things). URLs can also . This allows anyone with the link to not only view the created resources but also to edit.
We have a plan for the lifecycle of the organization.
There's excitement in the molecular biology scientific community around potentially transformative new AI models applied to molecular structure prediction, binding predictions, and molecular dynamics simulations. It feels like every week on Linkedin there is a new exciting generative model or molecular simulation prediction tool released. This sparks a desire among researchers to collaboratively explore and evaluate these extremely promising cutting edge tools.
Public competitions can be an excellent way to learn from each other and channel resources. Bits to Binders was one of a recent number of new challenges/competitions. Myself and 4 others joined this challenge to form a team that would generate a protein binder to a specific immune cell receptor. This designed molecule has the potential to treat some cancers. Our team had access to GPU resources, the target, and a brand new generative AI model for protein design: RFDiffusion. None of us previously knew each other, and all come with different backgrounds and experience and technology
However, despite our enthusiasm, our team, and many like ours, encountered significant struggles. Internally within group, we struggled to effectively share workflows and results. The organizers of the competition had done a great job to gather sponsors and organizations who collectively provided a significant number of valuable GPU resources. However, we could only utilize a very tiny fraction of available GPUs due to both mundane and complex technical and logistical challenges. There was simply a lack of infrastructure to support this social-technical phenomenon. We wanted a vibrant community, and the organizers did an absolutely fantastic job, but the outcome of the competition specifically in respect to sharing the methodologies was hampered by a lack of suitable infrastructure.
Put simply, there is no universal, straightforward way to share a computational workflow that requires visualization and interactivity (due to molecular structures) also connected to GPUs running AI models and simulations, running seamlessly on whatever GPU resources are available to you.
This missing connection means that innovation remained isolated in the groups and labs of origin, regardless of the desire to share them.
Recognizing these limitations, we propose a socio-technical initiative—a collaborative space dedicated to overcoming these barriers through new frameworks and tools connecting human collaborative focus. At its heart, this initiative seeks to harness the very best aspects of human collaboration: shared innovation and collective learning, and in this particular space, via leaderboards of workflows. We aim to create a vibrant ecosystem where AI-driven molecular workflows can be seamlessly shared, evaluated, and integrated across diverse research teams with access to different kinds of compute and technical resources and abilities.
A central part of our vision is developing the collective set of open-source tools that express workflows directly in the browser: metapage workflows, combined with existing proven social structures for efficient idea exploration and refinement: public competitions and leaderboards that can rapidly and efficiently evaluate and spread to target communities novel and powerful molecular dynamics software tools, in a form they can immediately use.
The metapages platform and tools allow researchers to easily create persistent, durable, yet flexible leaderboards. These leaderboards aren't just static records; they are dynamic, adaptable workflows that researchers can easily manipulate, extend, and incorporate into their own workflows, in whole or in by parts. By leveraging a public compute grid, researchers can effortlessly connect their own or others computational resources, whether their own laptop, HPC clusters, or turn-key cloud solutions. This makes shared workflows independent of where they run (as long as you have or we provide the necessary compute resources), thus enhancing accessibility and reducing the barriers to participation and collaboration.
Here is an example that was quickly put together to test a tool, shared among the team, that we can share with the entire world simply via a url or embedding anwhere where URLs can be: https://metapage.io/m/10fc3aea326e40f1873afe0ad48a5b4e
The outputs of this workflow tool can be connected to other workflows, and copied with a single click, replicating all the code and data.
This initiative is in the house of The Open Molecular Software Foundation whose mission statement is a succinct superset of ours. Our particular alignment is especially about accelerating progress. We envision empowering researchers of all technical backgrounds to seamlessly integrate innovations in molecular dynamics AI models developed by one group into the workflows of another, and they will be able to do this because they will be able to rely on tools that are more reliable, as the tools will have been (and can continue to be evaluated by a community. This democratization of AI tools and knowledge will unlock previously constrained collaboration potential, fueling advancements and discoveries across the molecular biology and broader scientific community.
We will follow up with a post about the technical differences between metapage workflows and other comparable stacks such as Jupyter Notebooks and workflow systems such as Nextflow.
https://foundry.adaptyvbio.com/competition
For an a given leaderboard entry, the display is great, with an interactive molecule
But the design is simply a text description with a lot of the process unavailable, meaning other researchers don’t really have access to the workflow:
References
Competition with dashboard
https://foundry.adaptyvbio.com/competition
https://beta.adaptyvbio.com/benchbb
Check out the workflow on the app
There are new bio AI models coming out every week. Last week is was bioemu
:
How to keep up?
Deciding where how how to publish your data intensive scientific workflows can be difficult, with no clear solution.
We're excited to announce our second weekly science challenge, focusing on biological data visualization! This week, we're exploring gene expression data analysis, showcasing how Metapages can handle scientific workflows in your browser.
Metapages supports, aligns, and hopes to even extend the FAIR principles, with the caveat that metapages are about connecting code and compute to data, with our emphasis on code and compute. So some of these principles are not quite applicable and we discuss how they could be extended here.
The POSI principles are a set of goals to promote open scholarly infrastructure that guide the heart and direction of the metapage platform. Some aspects specific to our organization:
Metapage workflows require three things to run:
It’s one thing to have access to data, but it’s another thing to have code and also a reliable, reproducible way to run the code that operates on the data. In other words, it’s not enough to just serve data, we also need principles around HOW we create compute environments that process the data.
Our data expiration policy takes a long term view: some data never expires while our organization lives, some data lives a very long time, and some data expires relatively rapidly.
Binder is a way to share Jupyter notebooks. That means all the advantages and disadvantages of Jupyter notebooks.
Bits In Bio Global Mixer Presentation