The Quantitative Biosciences Institute (QBI) at University of California at San Francisco and the Office for Science and Technology of the Embassy of France in the United States presented the “HealthAI Symposium,” a French American Innovation event at the University of California, San Francisco from December 6-7, 2023.
Supported by renowned research institutions, the health symposium brought together leaders, industry experts, and researchers from the United States and France to talk about AI applications for the health sciences.
MLtwist’s Audrey Smith joined Laurence Calzone (Research Engineer, Computational Systems Biology of Cancer, Institut Curie), Michael Keiser (Associate Professor, Institute for Neurodegenerative Diseases, University of California at San Francisco), and Hocine Lourdani (Group Product Manager, Deepcell) in the Artificial Intelligence for Cell Biology panel discussion, moderated by Moderated by Lisiena Hysenaj of Parvus Therapeutics.
Check out the full panel discussion (Day 1 of 2).
hello everyone welcome I’m jacen fabus the Chief Operating Officer for the quantitative biosciences Institute qbi
0:31
and I’d like to give you a warm welcome to today’s event uh today we’re kicking up our third joint event with the
0:39
scientific Department of the embassy of France and specifically the French
0:44
Consulate of San Francisco uh on the topic of research and health our first joint event was in
0:51
2020 it was virtual due to the pandemic and I’d like to say that we were slightly ahead of our time with the
0:57
topic being AI big data and health in 2021 it was still virtual um it was a
1:05
multi-day event titled reshaping science and health each year has been a great
1:10
success but I’m happy to welcome you to our first in-person event today in 2023
1:17
Health Ai and with this I’ll now pass the mic to our partner the French
1:22
Consulate of San Francisco represented by Emmanuel poak vour the atache for
1:28
Science and
1:37
technology good afternoon everyone um I’m Emanuel P I’m the ATT for Science
1:43
and Technology at the at the Consulate General of France in San Francisco it’s my great pleasure today uh to be your
1:50
co-host on France’s behalf I would like to thank our longtime partner qbi and
1:56
jacn for uh hosting us and co-organizing this event with us which is a French American event so you
2:03
will see a lot of French or French speaking people around and that’s normal I don’t think I need to convince
2:09
you that the topic of today’s Symposium is of utmost importance we’re proud that
2:15
France can contribute to these advances hand inand with a renowned partner like qbi I’ll try to keep the speech as short
2:23
as possible but I have a few things to say um one of our main missions at the French office for Science and Technology
2:30
of the embassy of France uh along with all the scientific attaches in Atlanta
2:35
Boston uh houon Chicago Los Angeles and Washington um our objective is to Foster
2:42
scientific collaboration between French and US researchers and support students
2:47
and research Mobility between our two countries especially at the doctoral level or early stage of the carea it’s
2:55
not just by chance that the collaboration between France and the US is is one of the strongest and the most
3:02
enduring our two countries do have a lot to share in terms of knowledge and knowhow in all aspects of society
3:09
including science and research we also value the supportive
3:14
entrepreneurial entrepreneurial spirit that exists in the tech world and which
3:20
is particularly strong in the Silicon valet as some of you may know so incubators startups spin-off companies
3:27
and large Industries are the ones that truly turn the most mindblowing Research
3:32
into a reality that will impact and improve the lives of the greatest number we could not be more thankful to
3:39
see French and American actors of innovation come together in an event like this one and none of these would be
3:47
achiev none of this would be achievable without their kind participation so I really want to thank them
3:53
truthfully my final message for today is that this Symposium is a good example of
3:59
what what we can achieve with all of your support uh as well with the French and
4:05
the Fran ofile community so it’s only one example among many initiatives
4:11
supported by the French Consulate on an everyday basis our goal our job is to listen to to talk about
4:19
the scientific and technological subjects which are of crucial importance to you and to both our countries and the
4:26
future of Healthcare in the AI era is definitely one of them we aim to create
4:32
dedicated spaces in communities where scientists can meet share major scientific results and insights that
4:39
will impact the research the economy and the society significantly there are many ways in
4:45
which you can support our actions whether you are French or not so please
4:50
come and talk to us during these two days learn more about the things that we do and the many ways that you can get
4:56
involved and just a quick word to maybe mention the Photo Expo that you you may
5:02
have seen when you came in this room which was one of the initiatives of the office for Science and Technology last
5:09
summer and we actually had some laurates uh who won some prizes in this photo competition so next year please
5:16
participate without further Ado I leave the floor to Tanya CM who will also
5:22
introduce our first panel discussion and I wish you all a very fruitful and enjoyable Symposium thank you
5:34
yeah good afternoon um and welcome from me um as well I am Tanya corta I am a
5:40
professor here at UCSF in bi engineering and therapeutic Sciences I’m also the vice dean of research for the School of
5:48
Pharmacy and personally extremely excited about the Confluence of AI and
5:54
Discovery Science reaching all the way from really fundamental curiosity to to
6:00
impacts on our everyday life and so to me some of the most exciting advances
6:05
happen at the intersection of Science and Technology of the intersection between different disciplines in science
6:13
and at the intersection um between technology science and healthc care and
6:19
so I think that’s why to me um what we’re witnessing um all over the world
6:25
but also what we’ve been discussing in the last day and a half uh um in a sister Symposium on AI across biological
6:33
scales and what we are continuing to discuss today in the panel discussions um will be really at this exciting
6:41
Confluence um of advances in computer science in cell
6:48
biology in drug Discovery in health care and in clinical applications and so I’m
6:54
very excited to hear from experts in this area um how they are viewing all of
7:01
these um developments and what they can inform us on what the future holds in
7:07
these areas and so with this I’m going to introduce um our first um panel
7:15
moderator um lisana heai to introduce her panel on
7:22
um cell biology and Ai and also the panelists welcome thank you very much
7:36
so the first panel will be AI biology and we’ll have four
7:42
speakers the first one it’s Michael Kaiser from UCSF he’s an assistant professor at UCSF I think this
7:49
microphone is not working okay yeah
7:57
okay and the second speaker is uh Laurence from um Kon from she’s coming
8:05
from France from Institute C and then we have oin Lani that is an engineer on AI
8:12
and O Smith from ml twist so every one of the speakers will present itself uh
8:19
for three minutes and then we’ll have a panel discussion together so maybe Michael you can
8:28
start
8:35
all right maybe we’ll put this all right well thank you so I’m an
8:42
associate professor here and my background is originally in computer science and over years I actually came
8:49
into this area through uh through UCSF itself I was once upon a time a bioinformatics graduate student here and
8:56
so the lab and we’re giving very brief ENT um our lab focuses um on three major
9:03
application domains in bio medicine and biological research for us Ai and ml
9:10
questions are at the center of it and so we look at molecules and we spoke a little bit about that yesterday about
9:18
perturbations and here when I think of a perturbation it could be a molecule it could be an environmental Factor it
9:23
could be a functional genomic change to a biological system and so some of the areas we’ve been interested in have been
9:30
cell biology as relates to this panel also whole organisms and collaborations
9:35
in zebra fish and looking at neuroactive activity and then Imaging Imaging can
9:41
span so many things in our area of Interest here I’m showing some examples of collaborations in the space of
9:46
digital pathology where we work on becoming a force multiplier or a way to help Pathologists operate at a scale
9:54
resolution and generalizability that otherwise would be impossible to do as a
9:59
human and so we were asked to show one or two examples I’m going to take the easy route and show two that were from
10:06
the presentation I gave yesterday the first one um is a generative AI model
10:12
where we were ingesting images these are microscopy images from cellular screens
10:18
for a neurodegenerative disease phenotype specifically the tow protein
10:23
and at the left what you have are two fi are two different channels of the same field of view of these cells
10:29
at the top you see yfp fused tow protein and at the bottom we have some nuclei
10:36
now as I mentioned at the time one of the big problems with live cell Imaging to do drug Discovery and other screens
10:42
at scale of hundreds of thousands or more perturbations is we can’t put antibodies in that system we can’t run
10:50
it in a fix cell format it doesn’t scale and so if you’re doing this in live cell and we also look at different time
10:55
points we’re stuck with only seeing all of the tow what we really want to see though is the misfolded tow or the
11:02
Tangled towel this is the tow that builds up pathogenetically and we see it in the brains of patients who have
11:08
passed away from Alzheimer’s disease and that’s only some of it and so what we were able to do was train a deep
11:15
learning method that could take data from screens that had been done years before and ingesting those two input
11:22
images on the left generate what we wished we would have had all those years ago and specifically focus on the tanks
11:29
Town happy to talk about that and then the other sort of result that
11:34
illustrates area of interest for us is a second is a second generative approach and so this is a generative diffusion
11:42
model and what we’re able to do is generate fragment by fragment piece by
11:47
piece molecules within protein pockets and so combining these two approaches
11:54
you can imagine a whole pipeline of testing experimentation and assessment
11:59
in order to move forward understanding cellular diseases and Drug Discovery
12:05
thank you thank you very much and so now we are still in Academia and we’ll have
12:12
someone that comes from France L will introduce what she’s doing and who is
12:17
she thank you so my slide are not as beautiful as as what I hoped but um so
12:24
my name is Laurence Calzone I’m a researcher in the Currie Institute um I
12:29
coordinate together with Emanuel bario with in the audience the group of computational systems biology of
12:35
cancer um I’m a mathematical modeler and with the focus of uh on cancer
12:42
applications and what do I do what am I interested in is to build mechanistic model that describe the biology of the
12:48
disease and in our case in particular cancer but not only with the purpose to
12:53
understand the impact of of the alteration suggest optimal drugs and predict drugs response or drug synergies
13:00
in a patient specific manner um so where do I come from so I’m
13:07
a mathematician by training uh I uh I I started doing uh chemical kinetics in
13:14
particular and then I move to biology through Theory and I did a PhD in theoretical biology here in Virginia
13:20
Tech not here but in in the US uh so my first interest were to do some dynamical
13:26
modeling uh of uh of the cellular processes and uh and then when I moved to K uh the
13:32
the type of data we were dealing with could not be applied to ordinary differential equations really so this is
13:39
when we developed some some other formalism uh based on on both um
13:45
approaches but what we we really develop in the group are models of tumor Evolution and expansion uh taking into
13:52
account space and time so with have a a big big data and and a lot of parameters that we don’t have uh we also developed
13:59
some tools to model individual cell populations and we want to integrate of course all the omx data that we can into
14:05
our models so um I’m really interested in
14:11
mechanistic models and how to make the link with artificial intelligence models you know the we try to uh explain really
14:19
the type of results we get with AI models and we can do that from two perspectives first as an input to the
14:25
models so we learn from the data to build our mechanistic models of course with the stratification of patients but
14:32
also help us infer the models that we build both the networks and the the mathematical equations from this data we
14:40
want to integrate multimodel uh data as model parameters so again learn the parameters from this U from this data
14:48
and of extract features from the data which PA Pathways or which genes really separate the response to a to a
14:54
treatment or not and then when we have this model we are also interested in in in uh um AI to um improve the models and
15:03
how do we do it uh we search for optimal parameter sets so we have a lot of parameters that we need to infer and
15:11
this is this is really difficult to do it by hand and of course we expect to to use um uh AI models like surrogate
15:19
models to mimic uh the complex and Compu
15:24
computational heavy models that we build and we also use AI to generate synthetic
15:30
data for model optimization we don’t have sometimes we don’t have enough data and this is one way to augment the data
15:37
so we want to make the link between AI results and model outputs and really fill the gap between mechanistic models
15:43
and Ai and I think I will try to give this perspective in uh in the response to uh to the the questions that we will
15:49
we will talk about um the point is to explain the content of content of the black box as much as
15:57
possible and and I was also asked to give two uh two um um projects on which
16:04
we work in the lab the first one is uh is called is um the point is you know it’s the holy Grail to try to find
16:11
some biomarkers for a response prediction to to some treatments and here it’s about immunotherapy and we use
16:17
multiple data types to uh increase the the power to be predictive and explain
16:22
as much as possible uh why these patients respond or not and this of course open the road to special modeling
16:28
which brings me to the second project uh which is my expertise is spatial
16:34
modeling of tumor Invasion we model both intra and extracellular interactions um to uh to simulate in
16:42
silico treatments so we start from data we infer the model we infer the parameters and you can see here an
16:49
example of of a tumor and and invasive um that starts invading the the tissue
16:56
and then we can play with these models and search for optimal drugs in silico that we can then test um in uh in
17:03
experiments and I will finish here okay thank you now
17:10
it’s oin it’s a farm D that is going to explain us more what he has worked for
17:17
some companies in the Silicon Valley on AI so he will introduce
17:22
himself sure hi everyone thank you every thank you for inviting me and thank you
17:27
for being in the room today really appreciate it quick int introduction by myself my name is Jos Lord I am a farm D
17:34
indeed By by Train um started I am originally from France which I wouldn’t mention uh usually but feels relevant
17:42
today i started my career in in France actually in Pharma in biotech then went
17:47
to business school moved to the Bay Area a bit more than 10 years ago and since then mostly spent my time building AI
17:54
products and so I I’ll try to bring that perspective of a practitioner today and um and somebody who has led the
18:00
development and the commercialization of AI products in different Industries in
18:05
part Healthcare and uh and Life Sciences Industries so a couple of examples uh I
18:11
I extracted here uh is the work of uh I’ve conducted with the teams at arteries
18:16
and Rapid AI where the goal was really to leverage AI in a clinical setting um
18:22
to build software medical devices to support uh clinicians in their diagnosis
18:28
and treatment decisions so very specific set of challenges are related to that
18:33
it’s really about the measurement of certain clinical uh factors the
18:39
detection of certain pathologies and that’s a very specific type of of challenges that I’m sure will come up in
18:45
our discussion most recently I I’ve worked at Deep Cell where we took a very
18:51
different approach and Deep Cell is a company focused on Research especially at the Single Cell single cell research
18:58
and we applied AI to study the morphology of cells so we built an in end platform um able to take images of
19:06
cells in flow and extract High dimensional information from these images and help researchers characterize
19:14
and make sorting decisions U using um using the outputs of our foundation
19:20
model so very different set of of of challenges here and so I extracted just a few thoughts that may color my
19:27
intervention in this panel very high level and I won’t take too much time one is um I’m a product person
19:34
so I’m really focusing on building uh products and solving concrete problems
19:39
as we know here these days there’s a lot of hype a lot of excitement and for good reason around AI um but beyond buzzword
19:46
beyond the hype um I tend to focus on identifying very specific concrete problems and focus on the impact that AI
19:54
can have on solving that problem uh with an emphasis on the inter interpretability and the usability of
20:00
these models because the best model in the world can be very is not very useful
20:05
if um the the end user cannot really use it in good conditions um the second
20:10
element I wanted I kind of touched on is that challenges in AI develop product development and commercialization
20:16
implementation are very different depending on the on the industry uh in particular as I mentioned clinical uh
20:23
environments versus researchers only research focused environments are present very different types of
20:29
challenges and that may uh color my intervention today and finally one point that I wanted to to insist on and
20:35
actually that resonates a little bit with what Lance I believe was was presenting is that one thing I’ve observed in um recently in this and in
20:45
my my time working with biologists and researchers is that there is a persistent cultural difference Ando
20:51
difference in approach between researchers in biology and Ai and the AI community at large and it focuses a lot
20:58
thought and my experience in explainability and in whether or not a model actually describes a mechanism and
21:06
of course this is a complex topic so I W this is way too high level but I want to emphasize here that sometimes that qu
21:13
that question is very important and of course a model should be explainable and at the same time the value of a model
21:19
can be in its predictive power alone that happens and an example that I didn’t work on personally but I think
21:25
many of you in the room can relate to or or know of is Alpha fold for instance this model trained by Google Deep Mind
21:32
to predict the 3D folding structure of a protein based on its sequence of amino acids there is work on going on
21:39
explaining exactly how the model works and how it’s making its predictions but the value of the model in unlocking
21:45
avenues for research um just by being that potent and that that efficient in
21:51
predicting the structure is is already huge and sometimes explainability is not um the the only
21:58
aspect that we should focus on so I’ll stop here I took already too much time thank you everyone and I look forward to the to the
22:05
discussion thank you and so now we continue with other four panel and she’s
22:12
alre she’s a woman founder and she has build this company that is on AI
22:19
Services she will explain more about it hi everyone thank you for having me
22:24
so um two facts about me I’m French as you can probably be here uh and the
22:29
second one I’m the only non-scientist person in this panel so very uh very happy to be here very excited to be here
22:36
but I might offer another uh another point of view um so um I founded ml twist uh with my
22:44
co-founder three years ago and the idea is really uh to help soft a very um very
22:52
boring problem actually uh but that is like very persistent when you uh start training a model is um on uh the fact
23:00
that there are still like manual task that still need to happen when you need to create data processing or data
23:07
augmentation and when you try to not only build data pipelines for certain model but also maintain it because it
23:13
can break over and over again and that type of work actually um is taking 80%
23:20
of the time of any data scientist which is which is a lot and most of the time if you talk with data scientists what
23:26
they want to do is spend that time training a model not doing that genitori work that we’re happy to take
23:33
over um if you look about like the yellow shun you’re going to see that as
23:39
of right now a lot of um Technologies happen a lot of technologies have like raised over the past decade uh but what
23:48
really is not really like you cannot see happening is that you need to build the
23:53
data pipeline between all those yellow box to make um to train your model and that’s something that still not have
23:59
been like really solved as of today and that’s what we focus on if you look at the work uh the manual
24:07
work if you have a technical person on hand in your company or in your uh in your work um in in Academia you’re going
24:15
to have to go through all those different steps uh to get your data ready for a machine learning model to um
24:22
ingest the data and get trained on the data that can take up uh like several
24:28
weeks as you can see um and we build a platform that can auto generate data
24:34
pipeline um and all the uh purple uh steps that you can see are done
24:40
automatically by our platform which um um help people like you or data
24:47
scientist to just focus on you know like the exciting stuff which is uh working
24:52
on the accuracy of the data or just training your model and and checking the performance of the model
24:59
um as you can see we have been uh lucky enough to partner and uh have also great
25:04
customers that we allowed to talk about so we are working with s National Laboratory but also UC Davis health um
25:12
we are also working in uh with Stanford uh HAI department and um partnering with
25:20
Cap Gemini and and other uh names on on this slide um and so we have been able
25:26
to work work on a lot of different data types uh from what I was talking about
25:32
like images but also videos um it can be also text and it can be also dicom uh we
25:39
have actually worked on daos which is a spin-off of dicom um and that’s about it
25:46
that’s us in a nutshell oh you can you can see like a bit of our accomplishments too um
25:52
but yeah that’s that’s about it thank you thank you
26:00
so okay thank you very much everyone so as we saw in this panel we have uh
26:06
people from Academia that work really on building this model and making sure that we’ll have some discoveries from these
26:12
models and then people from industry that are making sure to create a product that can be used for everyone so now the
26:21
question I have it’s what is one of the
26:26
most impressive discoveries that we have had in the field of AI and cell biology the
26:33
recent years that we could not have had just with humans alone anyone wants to answer this
26:42
question maybe can you hear me yes maybe I’ll get started just because I just
26:47
reuse the example I took actually I mean in my presentation um I think Alpha fold
26:53
is is one of the most impressive developments that we’ve seen lately um
26:58
uh the it’s interesting to see that it’s a collaboration with Academia initially with emble um in Europe to build the
27:06
database and then the development of the model um you know was conducted by
27:12
Google deep mine U to your question Lana so the the problem of protein folding
27:19
and the 3D structure the prediction of a 3D structure of a protein is a well-known problem it’s been worked on for decades it’s been worked on
27:25
tirelessly by a number of groups and uh I’m sure many of you are in the room are aware of it so I won’t spend too much
27:32
time of it but the on on it but the level of performance that was reached recently in 2018 with the first version
27:38
and then 2020 um is just something that we couldn’t have um reached without the
27:44
latest development of AI um now you know I when we say would would not be
27:50
achievable by humans I’m always a little bit you know not sure what we mean we always use mathematical modeling we
27:56
always use different tools and these are created by humans and AI are also created by humans so I would still you
28:02
know put that on on the credit of humanity uh but this is I think a very
28:08
solid example thank you so another thing this is like maybe in the mind of everyone it’s we have seen all the
28:15
Sci-Fi movies where we see that okay something it’s taken by a malicious
28:21
attack and some criminals took over our mind that in this case it will be our
28:27
machine learning and I was thinking this when I heard the talk of Michael where you were talking about deep learning uh
28:34
techniques that you’re using in the lab and how you can determine it’s like a towel Protein that’s misfolded or all
28:40
this so how you make sure that someone will haug your like how how are we sure
28:46
that what do you do to protect your system and someone imagine someone comes
28:52
Haws your system and then your students or PhD students or post dog will use a
28:59
model that is not right and will claim discoveries that are
29:05
false ah well so this is it’s an interesting question too because in AI um there’s so
29:12
much of a history of releasing our code this is crucial if you want to publish
29:17
at a computer science conference um if you want to do any of these things you have to make what you’re doing available
29:24
and we’ve seen questions about this with things like chat PT and the question of guard rails or improper use but when we
29:31
think about drug Discovery um there have been papers where people have asked what about old techniques what about things
29:37
we’ve known for decades computationally how hard is it to misuse a model and that has always been
29:45
possible it hasn’t been the computation that’s kept the the computational barrier usually that’s kept someone for
29:51
making a very bad drug it’s been the synthesis the construction all of the other steps where the rubber meets the
29:58
road where there’s actually a lot of oversight and a lot of institutional support and that is one way that the
30:05
models themselves are less at the center of that question um but what you are
30:11
afraid on it’s not the the model it’s like this malicious attack don’t exist it just like we are scared of it I am
30:20
less afraid of models than I would be about how people choose to use things okay and so for me I’d always take that
30:26
answer to our organization do the universities the companies oversight
30:32
accountability that would be the place that I would ask because I can because the flip side of this is I remember
30:38
years and years and years ago I was um a student and someone was presenting on
30:44
some new technology or result a professor who was visiting and it wasn’t
30:50
published yet they were hoping to submit it to a really high-end journal and someone said but aren’t you afraid that
30:55
you’re by telling us all about how to do do this you know someone’s going to scoop you or use it in some way you don’t like and he said young man um I
31:04
would be thrilled if someone would try to steal my work okay great so thank you
31:09
for your answer yeah no just one thing about I mean in Academia we share as much as possible so we we put on GitHub
31:17
everything we can whenever we can to make sure that also we get the paternity of of what we develop but also uh like
31:25
you said we we hope that people can use it and then make it better and then we all work for the same purpose okay thank
31:32
you so one question I’m having now because I was searching a little bit
31:37
about ml twist and your collaborations and so that I thought ml twist is a
31:42
service and it’s used only by companies but I saw in your website that you collaborate with Stanford and Berkeley
31:49
and I was pretty surprised about it can you explain how you help this big
31:54
academic institutions that generate a lot of discoveries in AI to improve
32:00
their data sure so um we have like several ways to work
32:07
with them or to partner with them uh one of them is actually using uh the
32:12
students at those universities that like our medical experts for instance and who
32:18
can help us uh do some labeling uh for some some models uh we have other things
32:24
like you said like for Berkeley labs it was a different because we were working on um uh a grant uh for the Department
32:33
of energy and we were able to collaborate with them on a plan to win
32:38
actually the phase one of those uh of the that beer um and every every time so
32:44
we also working with 10 for hii and that’s something else like they are working on uh creating some models some
32:51
llms uh internally and we are generating the data for them so um that’s that’s
32:57
like we are very very proud of that uh but like every every time we talk with an Academia it’s like going to be a
33:03
different need uh but we’re happy to explore any way we can partner with uh Academia okay thank you so basically we
33:11
we saw that Michael said that everything the discoveries are done and the models are not misus as people use them
33:18
properly and then we have some people to help label well the products in order to
33:23
generate even more discoveries now as a biologist or immunologist that I am the
33:29
question I am asking it’s I’ve heard all this different models and companies that
33:35
exist is there any way I can learn because I don’t know how to code I can learn how to use or where I can be
33:43
informed about this new technologies that are coming
33:54
out um it really
33:59
so I guess I’m coming at this question from a again product perspective I tend
34:05
to believe that not everybody needs to be trained on the technical aspects of AI and that is the role of companies
34:12
largely but also of course the contribution of the academic Community to come up with tools that are very easy
34:18
and intuitive to interact with that in the background do very complex things and use a lot of code and use a lot of
34:25
very complex mathematical concept cep but from a user perspective just you know kind of Shield all of that from
34:32
users um I don’t see a very I may being correct and you may have a different
34:37
opinion I don’t see a very strong argument for AI to be any different than software engineering in general for
34:43
instance we’re not asking everybody to be able to code as you just said uh in research and I don’t see why that would
34:49
be different from AI um I think the argument is a little bit different when you think about students however uh when
34:55
we think about students in the fields of of biology I think the the the intersection of computational biology Ai
35:02
and my biology uh in general is reaching such a level that there is probably a
35:08
better argument there to say that the Next Generation in general should touch at least to some degree uh AI training
35:15
uh but it’s a field that’s evolving so fast anyway that you know it’s something that is a very complex convers you know topic and I don’t have the the expertise
35:22
on it thank you no I would say that uh the key for any project is collaboration so as a
35:29
cell biologist you don’t need to know AI but you need to be surrounded by people that can help you and I talk the same
35:35
language as you yeah so this is the question it’s like where do I find because every time I go Ai and then you
35:40
have this courses I’m like oh I’m lost what I do so the question was like where can I find this Char GDP for example was
35:47
all over and everyone was talking about it so I started to use it and it was easy and save my life but where is there
35:53
any website or something you well I I would emphasize that the collaborate point because that’s what I was thinking too because actually I don’t think chbt
36:00
is the right metaphor for how we often want to use AI for high stakes problems
36:06
because instead I I think of you know someone who uses the centrifuge without
36:11
training or the fact machine early on and now it’s gummed up either you’re
36:16
working with someone to at least get yourself started or there’s a core facility or there’s a way to get the
36:22
training okay and the problem with AI is let frequently that we would gum up a
36:29
machine or do something to the rotor it’s more commonly that it would appear to work in a way that was misleading or
36:36
not useful because it is a pattern recognition thing it will always find patterns and exploit them so collaborate
36:44
collaborate thank you and so uh I’m wondering just like you are all open to
36:50
have a contact from someone from industry and build a new model or help them on a this is this
36:57
is great I think it’s a message we have to the show so I think we are I don’t
37:06
know about the questions or I have five moment okay perfect so you’ll tell me
37:12
whenever yeah I have more questions so another question that I had it’s okay
37:18
we’re in the same space we need to be trained do you think like a training of AI like how to use Ai and not to misuse
37:25
it should be mandatory in every company now or in every lab like the safety
37:34
training um I’m not sure exactly how it would look like but I think that it
37:40
would be very important to start educating people on the good and the bad
37:47
um starting by data bias uh which is like something that is like as human we
37:53
we all have biases but if we are of it if you let people know that they need to
38:00
pay attention to the data they’re going to be selecting for their model um then there might be a chance that we’re going
38:05
to reduce the amount of bias happening in the model um and then to go back to your question where you were talking
38:12
about like learning how to code on on what it means to do Ai and so on I think
38:17
it’s actually important to keep that separate just because you’re going to have people from different backgrounds
38:24
collaborating and they will fight also biases that way if you have everyone you
38:29
know educating the same way learning the same way in the end you’re going to have a lot of biases okay thank
38:37
you just again just to say that I mean in the company you need a wide range of
38:42
expertise again so you need to have people dedicated to that and the other ones knowing the limitations of of the
38:49
results and how to use these results is really key I think this if we have to all of us if we have to learn something
38:55
is the liit ations and the biases to be clear about the biases
39:01
yeah and just one thing to say about it um because of the strength of all the
39:06
different expertises and opinions we don’t all need to be programmers or developers of AI what will happen more
39:12
and more is these tools in different ways make their ways into the workplace into the research environment all of
39:18
these and I worry that it seems like when you read articles when you think
39:24
about and you talk to people who have questions outside of this space it seems as though it’s categorical do I trust AI
39:30
or do I not trust AI um do I use a prediction or not but when you think
39:35
about things like chat GPT which I I’d like to you bringing up earlier we are used to speaking and writing and we can
39:43
critique it we can look at it we know sometimes when it’s wrong and if you try to look for that you find it more and
39:50
more and so I would say the training that would make the most sense is how to be a critical user of it yeah and what
39:56
techniques to have to test whether this thing is telling me what it so confidently is
40:03
presenting I completely agree with that point and to take it one step further I
40:08
think one one thing that we need to put on the agenda even more is the question
40:13
of Open Source uh in the industry in particular because indeed in Academia and we mentioned it earlier there is
40:19
this culture and this requirement to share your your results to share your
40:24
code to share your data and in the industry it’s not always the case and for good reasons of course if you
40:29
invested a lot of money in Training Systems you want to protect your IP you want to protect your first mover
40:35
advantage in the market and completely understand that but we need to find ways to still have enough data available and
40:41
enough source code available for researchers in Academia or or not to
40:47
actually validate these models and and um and test them and be critical about
40:53
their performance thank you so basically I’m more optimistic now on the use of
40:58
the AI and the models so now just to talk about cell biology in general what
41:04
is as as an immunologist I’m more like in characterizing the cell types and
41:11
it’s basically what you have done a little bit in Deep Cell so like will we be a moment we will just recognize cells
41:19
by uh only one feature just for example
41:24
Imaging or we’ll show could still do our flowetry with 300 different colors and
41:31
then analyze by ourselves and how how far is this I’ve heard also like in in
41:37
the same space I’ve heard also about wet lab so if you can explain this and where
41:44
we are in cell biology right now and what the Yeah I can draw from my recent
41:49
experience at Deep Cell to at least answer the first part of your question um so we are at the point point right
41:56
now with the the performance of AI models on Cell Imaging and we can really extract High dimensional information in
42:03
a way that is consistent an improper way to to say that would be to sequence the morphology of cells and at at scale
42:11
right and that’s completely new and that really challenges or or subverts let’s
42:16
say the the existing methods in facts and fetometry um where I think uh it’s very
42:23
interesting is that currently the way to characterize cell is based on the taxonomy in biology that is in evolution
42:30
but still very much rooted in surface markers cell markers in general and
42:36
getting out of that tonomy initially is a little bit different when you start looking at phenotype and that’s the the
42:41
next Frontier is really looking at phenotype for itself and really leverage that as a new omic basically uh type of
42:48
of data um and the next frti after that I think is multimodel you know research where you integrate that data with
42:54
multiomic information from different sources um so that’s that’s I think how you you get to a holistic view of a cell
43:02
that takes very different um you know lenses to analyze it did I did I answer your question I I think so maybe Michael
43:10
has something to add about I well I very much agree with what you’re saying and what I was also kind of reflecting on
43:16
and thinking about this is you ask kind of when we’re there um and I remember um
43:23
1965 a paper on strong inference quote from it is the measure of a method is its use and so we’re talking about
43:30
methods and technology and platforms AI or otherwise or combined in this hybrid system for what use and so for some uses
43:38
we might already be there right now and for some things we we cannot see them by
43:43
Imaging without the right marker and we don’t know that yet and so there’s this interest of multimodality how many
43:50
things we can bring together without breaking the bank on the experiment in the first place yeah and so I would
43:55
actually turn the question back to you and ask for what use um and what is sufficient and I I was thinking more
44:02
than Discovery and cell biology for example could be used in the biomicro space so for example I would imagine an
44:11
AI um like model that could help me
44:16
realize what cell types I have based on some phenotypes that I see on the
44:22
microscope or what can be the molecular pathway that
44:27
are affected on this phenotypes and I saw that you are doing something like
44:34
this maybe in the lab to link this maybe also in Pharmacology and this is what
44:40
how you see this feel and what are the discoveries in
44:47
there so this could be an hour conversation I’m sure others you know have thoughts
44:54
on to in there’s big differences between supervised and
44:59
unsupervised we’ll go and see what’s there based on the of information we are
45:04
already toed up to collect say images ands compared to I have a really
45:11
specific need a priority bringing in my human expertise I know that none of those image channels are are likely to
45:17
get the job done and if they did I’m actually kind of that would actually be a good counteract test to make sure I
45:23
couldn’t get that out that maybe I
45:30
need so would best way can
45:37
is that’s what we as scientists
45:43
allim as as
45:49
possible any have question I think when
46:03
I R te so how how this
46:24
works
46:32
resources question canor from cyber and so in cyber you never know
46:40
your computer all the best
46:45
practices down ACC don’t want never and
46:54
that’s what about else C
47:06
guelin scient
47:13
unless unless you a lot of to unless you scr dat and the model
47:24
itap
47:39
tocce controls now we needer controls because we will try to do what we ask
47:45
and that’s the difference and so that’s what in Cy
47:54
sec
48:04
um so not that M because we are generating the data we’re not generating the models but that’s definitely
48:10
something that needs to happen uh for our customers so we work with defense
48:16
for instance and that’s definitely um mandatory to to have like
48:21
some ways of trying to break the model um but you can do it yourself you can just go on chat GPT and and then ask
48:29
questions again and again and you’re going to see at one point it’s going to break and you’re going to even see some
48:34
biases appearing like very quickly on what the pilot is it is a man or is it a
48:40
woman um or nurse um and and you’re going to see that this is actually all
48:45
coming from how the model was trained and what data was was you know used to
48:51
generate the CH GPT so that’s very important and that goes back to what you
48:57
were saying Michael that we need to be able to understand that whatever we see
49:02
that is AI stamped we need to criticize it we need to be able to look at it with
49:07
an eye of hey is it real or is it bias or can I really trust what what it does
49:15
or what it says um and that’s like something that should be done even uh at the school level like with our young
49:22
kids uh because with like everything that is gener it right now we don’t know
49:27
what is true and we don’t know what is uh not true and that’s very important to train kids and students to keep that eye
49:36
uh out for for that
49:44
do no I agree but this is a scientific reasoning right you you want to see how far you can go and and find the
49:51
weaknesses of of your models and this is something we do uh even when we read an article we don’t we don’t take it as as
49:58
it is we want to question it and I guess this is what we need to keep in mind with AI models they’re not the truth
50:05
they they get closer to what we want and they answer a very specific question not Universal questions thank
50:13
so now as as we heard we not be scared of
50:18
thei they work in the same way as we work for experiments so they check every
50:24
step and and also they’re all working in a collaborative way so we should be afraid
50:31
to go and ask them if they
50:37
industry and
50:49
then and everyone if you have any questions for theel could should line
50:57
up there will be a microphone here so you can line up and ask
51:14
questions I start thank you so much for the panel uh so I have a question
51:19
because I see a lot of like academic or pure Tech kind of players how do you
51:25
work with the Biotech Industry how are your Solutions implemented do you feel
51:31
that there’s any bottleneck um from the pharmacetical industry to like adopt all these great
51:39
Technologies right well so one of the best answers I can give is the second result I showed with the small molecules
51:46
is actually a joint development with Genentech and so jentech um Partners came to us and we worked on a close
51:53
collaboration for more than a year here where there’s both Mutual support the IP
51:59
is worked out ahead of time by the university and Genentech and by setting that road up nicely then what we’re able
52:05
to do is get real feedback because one of the things that stood out to me in that collaboration was we made the
52:11
molecules they looked pretty good but our our in the early version and our collaborators said yeah but that looks
52:17
really strained like super strained in practice we’ve tried a whole bunch of public techniques and and the molecule
52:24
it has an that’s just completely UN unworkable in practice and so it’s
52:30
having at least a champion on each side and people working together directly
52:35
would be my answer to you on that no I can just say that we are in
52:42
France we are promoting a lot this uh this PhD program where you have industry and uh and Academia working together so
52:50
we are really uh try to uh teach students to embrace space industry and
52:57
uh and think about all the the positive things you can you can have like you know access to clusters access to money
53:05
to experiments that we may not have in
53:10
Academia one more thing I would say is that a lot of the startups that are trying to bring innovation in the field
53:16
of AI are often spin-offs actually of academic institutions and so that in Industry uh Academia collaboration is
53:23
almost you know inherent to a lot of the Innovation that’s ongoing now your question was more towards Pharma and
53:29
biotech at a large scale I think the Pharma and biote can be a difficult
53:34
industry to penetrate initially when you’re startup they’re looking for usually a lot of validation a lot of
53:40
very early data however I think in the field of drug Discovery in particular AI applied to drug Discovery you see a lot
53:46
of different Partnerships that are actually ongoing um Asen with benevolent AI for instance is an example okin with
53:53
sop sop is and another one there a number of of of Partnerships of the of that nature so I think we’re going in
54:00
that direction more and more and fora companies realize that they can embed uh AI um models uh and products very early
54:07
in their development so you you think that in five years from now there will be more and more drugs that are in phase
54:13
two phase three that have been like discovered thanks to AI definitely yeah
54:19
this is already happening they’re they’re already in the pipeline but very early so we’ll have to time will tell
54:26
thank you thank you I might because uh just a
54:32
quick comment before there is a lot of ethical consideration and I will be hosting the ethics and AI next week and
54:39
I see you know like it’s very tomorrow and I it’s very relevant but anyway my question was more as a sale biologist
54:46
and we we we generate more and more like big data set like omx data and and all of that and we build model and test
54:52
those models um but kind of on the other flip side where you were mentioning like
54:58
usually we want like an end product and the end prediction with AI but can we
55:03
flip things around the bit and use AI to help identify Gap in our knowledge and
55:10
how should we build framework to help actually scientists trying to make sense of those very large data set that are
55:16
very complicated and have a lot of connection and help hypothesis
55:21
generation if you have any opinion on that
55:27
all right I think there’s two levels to answer first answer the kind of glib one but one that we want to get to working
55:34
is all of the explainable AI techniques interpretable methods saliency mapping
55:40
oclusion or even building models that operate through rationale layers where you’re having it solve multiple tasks as
55:47
a series of steps that are fully differentiable all the way through and if the model works or does not work
55:53
through that that information gate then that tells you something because you can change what information the task has to
56:00
pass through and so that’s one category of answer a a technological answer an
56:05
architecture style answer in practice much of the saleny mapping methods
56:11
integrated gradients guided grab cam either bring in their own artifacts or smear out signal and can only tell you
56:17
really basic things like is it looking at the dog or the background but trying
56:23
to understand about the ear the know is iffy at that resolution and in molecules goodness gracious me we don’t even have
56:30
as much intuition and so that answer is where the field is trying to go methods
56:35
wise the flip side answer is is a is a simple one and it’s a logical answer instead which is this is just one
56:43
technique and if anything use very simple models wherever you can standed
56:49
up swap in and out statistical models and other models and ask whether the bang for the buck is worth it to bring
56:55
in something that we have less of an understanding of and so that goes back to being scientists and the fact that
57:01
all of us here in the room have some have agency in making that type of decision without having to rely on on
57:09
hoping that a model that learns for reasons we don’t know can tell us something new that we didn’t know to look for so it’s considering it as parts
57:15
and swapping the parts no I I I would just say that maybe
57:21
uh one way to look at it is to use these models to validate an intuition that you may have so the question will be clear
57:27
it will not identify gaps but it will confirm or infirm uh an an intuition
57:33
that you may have one last thing we mentioned earlier
57:39
in the discussion the difference between supervised approaches and unsupervised approaches I think that plays a role in
57:45
your question when it comes to identifying gaps building taking an unsupervised approach to look for high
57:52
dimensional uh structure uh global and local structure in high dimensional space
57:57
using what is in many cases now known as Foundation models so models that are looking very
58:04
generally at at a field and then can be trained more specifically for a given application is an is an approach that’s
58:10
very relevant you can look you can really that that’s what I would say just from the the the ml training approach
58:16
just being unsupervised or self-supervised yeah helps a lot thank you I like that you said
58:23
intuition that likely scientist are not going to be replaced
58:30
by hi so my little thesis was in proteomic and omic data at Stockholm
58:38
University I wonder One Thing how can you deal with the biophysical Pro
58:45
properties of a shape shifter proteins like Spike because we had a lot of
58:51
problem to read the cre data talking of a particular class of proteins that they
58:58
are able to change confirmation because of
59:04
pH the short answer is no one’s dealing with that yet the ways you could go about trying
59:10
to start would be to think about your representation and because when we start
59:16
a model that in order to do any deep learning or machine learning training we
59:21
have to make a decision first and this gets tol twist for instance and thinking about what our representation of our
59:27
information is not just the foul format but what is this this cloud of atoms
59:34
what is this as far as the model is going to ingest it by the time it sees it for the first time in the world and
59:41
so geometric neural networks might be good you can make graphs spatial graphs
59:46
in space and then who’s to say that a graph only has to operate in space it
59:51
can also go across time you can put edges together to have on embl so that’s something similar to like convolutional
59:57
approach but in multiple dimensions and now you’re convolving in non-cartesian space so that would be one way to do it
1:00:02
you can use attention as well um so there’s a lot of methodological answers oh but yeah I want just to tell you that
1:00:08
for desperation we started to look at possible binding sides for transcription
1:00:14
factors oh yeah well let’s continue this after because there’s a bunch of neat
1:00:20
methods that can be done with this thank you
1:00:32
good afternoon uh wonderful discussion for the planel my name is bushan I am a bi engineer and I develop bi materials
1:00:39
and cell therapy uh I’m new to um AI in B engineering so pardon my knife
1:00:45
question so I have two part question one is uh what’s the panel’s take on model
1:00:51
hallucination and how it impacts uh uh the the uh clinical data or clinical
1:00:57
output uh or in in output in cell biology as well and the second part of question second part of my question is
1:01:04
uh we are talking more about going high higher dimensional in terms of analyzing object or making uh new objects by the
1:01:13
by the means of generative AI uh but in terms of clinical translation since I work on translational side of a science
1:01:20
uh we want to have a simplistic system to be uh clinically translatable so what is the panel take on dimensionality
1:01:27
reduction uh going forward after we have a lot of higher dimensional data
1:01:34
in based on the nodding um the first part of your question sorry uh model
1:01:40
hallucination yes U mod so um my experience with it my harble experience
1:01:46
with it is related to generative AI specifically right I think that model Hallucination is not really a concept
1:01:52
that would really use until until really that that spiked a lot with stable diffusion models and llms um so when it
1:01:59
comes to model hallucination I think what what we want to to be wary of or what we want to have in mind back to
1:02:04
what we said a bit earlier is explainability on the one hand and reproducibility on the other uh so being
1:02:10
able false discoveries is inherent to science right there are always false discoveries the fact that other
1:02:16
scientists other researchers around the world can reproduce the experiments reproduce um the methodology and find
1:02:22
new results is really what correct the the course and finding making sure that
1:02:28
we push as an industry and as an academic Community for the availability
1:02:33
again of the models themselves their code but also the availability of data sets uh to reproduce and create new data
1:02:40
sets in the future I think would help alleviate a lot this concern um so that
1:02:45
that’s really my take on it when it comes to D dimensionality reduction I’m not sure I followed completely your
1:02:51
sorry your question but um there there depending on the use case depending on what you try to optimize especially
1:02:57
between the local and the global structure that you want to preserve uh one visual visualizing the data there
1:03:03
are different methods that that you may want to use um you know in in single Cellar and AQ map is really something
1:03:09
that is used a lot in other fields you know other types of dimensionality reductions are used so uh my experience
1:03:15
with it as well is is really to to focus on what do you want to retain as a
1:03:20
priority from the latent space and really optimize for that and trying trying different methods if
1:03:29
needed so in in the area Dimension reduction one of the answers always used to be PCA or or vae you know a
1:03:35
variational auto encoder where you try to compress information and then reconstruct it and conceptually in both
1:03:41
cases you’re asking what the variance of your data is and having a good representation of that variance so you can build it back in the autoencoder
1:03:47
case um there have been interesting advances recently that might help in this space where from understanding
1:03:54
correctly your concern is it’s a very large sparse and diverse feature space because if you’re thinking about medical
1:03:59
records you’re thinking about individuals we we have a lot of different spotty different information about each individual and that’s a
1:04:05
problem with classic issue in machine learning where you need to have the same features the same inputs for everybody
1:04:11
and so there were early versions there’s something called Deep patient um seven
1:04:16
years ago maybe that it was a vae for patients where you just have a fair amount of noise and you do like it’s
1:04:23
called a d raing a encoder where you inject random attack on the information
1:04:28
and you ask the model to fill in the gaps in the modern era we call that masking or self-supervised learning and
1:04:35
so you can do self-supervised learning in an autoencoder context to help reduce your data and have a stable
1:04:40
representation for a lot very different patients that have a lot of very different information you might also
1:04:46
think about and this came up actually yesterday for those who are here this idea of contrastive learning which is an
1:04:52
example where you look at pairs of of information about the same patient um or you take pairs of information about
1:04:58
different patients that had the same condition if that’s what you care about and you can begin to an odd in sort of a
1:05:03
self-referential way build up a better condensed representation also known as embedding or lateen space to do that so
1:05:11
that would be one way to think about dimensionality reduction in the case of hallucination absolutely I mean Hallucination is generative um otherwise
1:05:18
it’s just a bad prediction and kind of the answer to both of those is we’re scientists we go test it and so if you
1:05:26
believe a model is reliable enough to go test the predictions it’s an empirical answer to how we think about hallucination and based on what we see
1:05:32
you can do Active Learning feedback loops in order to improve the model where it makes the most
1:05:39
errors yeah thank you thanks hi uh my question is also in
1:05:47
association with the former two questions which were asked about the difficult problems in biology right so
1:05:54
it looks like once llms came in everybody took out like the simpler ones and they’re also important because we
1:06:00
haven’t really solved like docking based predictions and stuff like that but what is the next froner of biology beyond
1:06:06
that and what is stopping us from going there is it is it something that at the level of like multiple models talking to
1:06:12
each other or is it simply the compute power and if you could talk to like building realistically teaching a
1:06:18
machine biochemistry where are we uh from from there
1:06:25
[Music] ah okay
1:06:31
so in the one hand there’s the the relatively simple question of hey when
1:06:36
are we going to be able to actually make models that can predict if this will be a good drug even if we already know what it is and we already have it in the
1:06:42
pocket and even then and the issue is training data it has just been terrible
1:06:49
and we’re so limited of having enough data to train models at this scale even even for really simple things say I want
1:06:56
to train a neural network to recapitulate something we know how to do because it’s an existing docking program
1:07:02
that doesn’t give us the right answer but at least there’s a formula in what we’ve seen is most state-of-the-art
1:07:08
models need 80,000 or more training examples to recapitulate a simple sum of log energy style docking function which
1:07:15
is for sure way simpler than reality is so we need way more than that orders of magnitude more than 80,000 training
1:07:22
examples we’re not yet at a place where it feels like we can get that from experimental binding affinity and so
1:07:28
maybe an intermediate solution would be some of the things called ABF absolute binding for energy perturbation
1:07:33
calculations problem is you can only do a few of those calculations per day right now using classical methods and so
1:07:39
there’s this problem then of we really want this as a way to speed things up we want to make sure we can do it in a
1:07:44
generalizable way we don’t have enough data the other I so so that is kind of
1:07:49
where we’re at and I think we just need a huge amount more labels in that space realistically um there’s ways we can
1:07:56
cheat and try to get ourselves to kind of build up a foundation using these these techniques we talked about that are Foundation model based that are
1:08:02
unsupervised that are self-supervised to maybe solve some of the basics of geometry first and then have the
1:08:08
learning process really focus on this really hard part of the question but there’s another sideways answer I want
1:08:14
to give to your whole question because I just answered a lot about binding affinities specifically which you mentioned at the end but wasn’t the
1:08:19
whole question I would actually say the biggest advances were about to see is
1:08:25
the final mile problem so many things um are were POS are possible right now so
1:08:31
many models are sitting on GitHub as we talked about and are shared and so few
1:08:37
people are able to use them or it’s the same people every time it’s the same Community it’s the computational people
1:08:43
who made the models who Honestly made models that weren’t solving the questions people care about more often
1:08:50
than not unless there’s that collaboration that relationship and so so that’s the final mile like in postal
1:08:55
delivery it’s really easy to get it to your ZIP code all the money and difficulty of a postal service is
1:09:00
getting it to your door so what is it necessary to get it to the bench and so my best answer is this people
1:09:06
collaboration question but if we can close some of that Gap and I know that’s a space that mlst is in and also the
1:09:12
application and use of it I think that’s where you’re going to see the biggest change if no single new model will developed today there’s so many uses of
1:09:19
the models we have right now that nobody is doing yet because of this friction
1:09:28
Enlighten very briefly um I think infrastructure actually is is critical
1:09:34
um we do so everything that you just mentioned is is brilliant I think what another Frontier that that we should
1:09:40
look forward to is is integrating the different modalities of biology right now uh we do have very
1:09:46
different lenses that we can use to look at biology at the DNA level at the RNA level at the spatial transcriptomics
1:09:52
level the prot omic level etc etc and putting all of them together creating really a model that um a mental model
1:10:00
and a biological mechanistic model that can put all of these layers together is really the next Frontier that requires
1:10:06
data curation and data collection to a level that is extremely challenging and that requires infrastructure that is
1:10:13
extremely you know difficult also to scale and to to put together so that’s really also something to look forward to
1:10:19
is the creation of these databases at scale and these intuitive tools for people to be able to access the data
1:10:24
without having to be computational people only right we need the biologists
1:10:29
we need the Noni and tech people to actually look at that
1:10:34
data well uh thank you everyone uh for those great
1:10:40
[Applause] discussion thank you now I invite you to
1:10:46
take a break and then we will gather again at 3 for the second panel about Ai
1:10:53
and Drug Discovery thank
1:11:22
you