VoE - Tanya_mixdown === [00:00:00] Tanya Berger Wolf: We say that our main product is connections. We generate connections among people, among ideas, data, foundations to applications, internal and external partners, and even frameworks of intellectual engagement, when we say that we not only focus on research, but research, scholarship and creative expressions. These are different ways of asking and answering questions. [00:00:29] Joining me today in the ASC Tech Studios is Tanya Berger Wolf, director of the Translational Data Analytics Institute, and a professor of computer science, engineering, electrical and computer engineering, as well as evolution, ecology, and organismal biology at the Ohio State University. Welcome to Voices, Dr. Berger Wolf. [00:01:22] Tanya Berger Wolf: Thank you. Thank you for having me. [00:01:23] David Staley: Well, I'd like to begin, first of all with translational data analytics. And maybe we start with a definition. What does that term mean? [00:01:31] Tanya Berger Wolf: Well recently, hopefully most people have heard about the data and the analytics and maybe even translational, but in some different context. [00:01:39] So every one of those words hope are familiar, but data analytics is obviously the analysis of data analytics techniques that come with it, and we think of it as a little bit more general as anybody who generates touches. Is excited by, is producing methods for analysis and is representing data in their research scholarship and creative expressions. [00:02:02] Obviously, that's pretty much everybody today. The translational aspect of it refers of translating data into actionable insight inside. What, beyond applied, this is what in clinical sciences, people often refer to translational science as translating from bench to bedside. [00:02:24] David Staley: Bench to bedside. [00:02:26] Tanya Berger Wolf: Yes. [00:02:26] David Staley: So, so from the lab to. [00:02:29] Tanya Berger Wolf: To clinical settings in. So in more broad, in the broader sense, which we use the same term of translational data analytics is translating from the fundamental foundational research and scholarship to the applications that are beyond the university. And we work with partners outside of the university. And this is, government, nonprofits, businesses, and healthcare organizations. As well as just being motivated by solving societal challenges and addressing societal questions with data. [00:03:08] David Staley: And I want to get to some of those. But when we say data, I mean is this any different from statistics? [00:03:13] Tanya Berger Wolf: Well, we have faculty from all 15 colleges of the university. Representing 60 plus disciplines, everybody from arts and humanities to statistics and computer science. When people think of the core of data analytics, what comes to mind is probably statistics and maybe these days, artificial intelligence, machine learning. But certainly the applications of data analytics, the implications of data analytics, all the aspects of how do we use data analytics in society , to answer societal questions, to really solve societal address societal challenges, that goes way beyond statistics. How do we bring in data analytics to do healthcare? To public health we had a huge public health crisis That's beyond statistics. When we talk about the ethical and responsible data science and AI is, is not something that statisticians are equipped to address, or computer scientists who generate the computational approaches. This is something that we need to work in partnership with social scientists, with ethicists, with policy scholars, with legal scholars. These kinds of issues have moved very much beyond the boundaries of statistics and computer science. We even have an art and data program and a joint post-doc co-founded by the Translational Data Analytics Institute and the Global Arts and Humanities discovery theme, who looks at issues of the identity and self-representation in the age of computer generated art and text, which is something we're all familiar now with. We have scholars who look at the public discourse in 19th century France as an example of sort of public social media. These are all data. In general, one good thing that history is a data science, right? So, [00:05:24] David Staley: oh, my colleagues might be very interested to learn that, [00:05:28] Tanya Berger Wolf: right? How else do we learn about what happened, if not through evidence and data? [00:05:34] David Staley: It sounds like these scholars are not sort of working in isolation or working in their disciplinary silos. It sounds like all the projects involved are very interdisciplinary. Do I have that right? [00:05:45] Tanya Berger Wolf: Absolutely. This is the mission of TDAI and institutes like it. We We say that our main product is connections. We generate connections among people, among ideas, data ,Foundations to applications, internal and external partners, and even frameworks of intellectual engagement, when we say that we not only focus on research, But research, scholarship and creative expressions. These are different ways of asking and answering questions. [00:06:16] David Staley: Mm-hmm. You, you started to talk about some of these. Tell us some of the social challenges that TDAI is is addressing. [00:06:24] Tanya Berger Wolf: TDAI has five strategic directions because everybody needs a strategic direction. So we have two foundational strategic directions cross-cutting, and then three, what one would term as applications. The foundational ones are, not surprisingly, foundations of data science and ai, and that really is closer to statistics, mathematics, and computer science. And the other cross-cutting foundational direction is ethical and responsible data science and ai. [00:06:55] We think that it is these days fundamental to any engagement with data and data analytics, and then our application directions are health, environment, sustainability and climate and smart mobility. And so when we talk about health, this goes beyond just clinical science. It is about public health, about individual and public health and wellbeing, and wellbeing, again, this is not only what happens in the doctor's office. [00:07:27] David Staley: Mm-hmm. [00:07:28] Tanya Berger Wolf: It is where we live, how we live, who we live with, in what environment we live. How does the environment at home, at work, with the family and outside impact our physical and mental health? [00:07:45] David Staley: You arrived at Ohio State from University of Illinois. What, what drew you to Ohio State? [00:07:50] Tanya Berger Wolf: Well, there was this advertisement to the, for the position of the director of the Translational Data Analytics Institute, and I found this a really exciting and unique opportunity to enable interdisciplinary research at scale. And the scale at OSU is really impressive. The ability to really bring people together in a single environment, , whether it's intellectual, virtual, physical, is really exciting to me. This is something that I've always, this is how my brains wor brain works. This is how I do research, and this is, I think, where a lot of exciting research happens, but it's not very common in a university setting to have that opportunity. [00:08:36] I joked. When I came to OSU that my job as the director of the Translational Data Analytics Institute, the official title is Chief Serendipity Facilitator. But unfortunately I came to OSU in January, 2020. I, so the timing [00:08:51] David Staley: was gonna ask. [00:08:53] Tanya Berger Wolf: Yeah. The timing for serendipity facilitation was really unfortunate. We found out that serendipity and creativity did not really happen on Zoom very well. [00:09:03] David Staley: That, so [00:09:05] Tanya Berger Wolf: yeah, we, it, it is really, really hard to make these. Unplanned connection happen because you plan every meeting, you plan who is going to be at that meeting, what the agenda is going to be, and you cannot have these overlapping conversations. Only one person can talk at a time. Mm-hmm. Only one idea can be heard. You cannot have side conversations. You cannot have that spark of an idea happened and we're slowly coming back. We've also learned a lot how to move the needle a little bit on that creativity aspect of zoom conversations and meetings. [00:09:44] But I'm really, really excited that we're finally coming back in full 3D and can really help that serendipity and creativity happen by bringing people together, by maybe starting a conversation and, letting people fly, letting people just do what we as researchers love doing, which is ask questions and see where the thought takes us. [00:10:11] David Staley: I'd like to talk about your research and you are a computational ecologist, and I'd like you first of all tell us what computational ecology is. [00:10:21] Tanya Berger Wolf: Well, my research is as I'm a computer scientist, first of all, I did my training and my PhD and everything that I do, I still think as a computer scientist, but I speak ecology very, very well. [00:10:35] And I've worked with biologists for the last 20 plus years. I'm also married to one. I blame him for all of that. So and the questions, the challenges, the problems that are coming are coming from biologists, and particularly the biologists that I work with are ecologists. So they're interested not only to of the study of organisms, but in the context of their environment and evolution but with the emphasis on on environment. [00:11:07] And the specific type of biologist that I started working with behavioral ecologists. So they're interested in the behavior of animals in the context of their environment and often evolution. And so the first conversations that happened were mostly around understanding animal behavior using computational approaches. [00:11:28] It grew beyond that. And so my research now is computational ecology, environment, and conservation. So it went beyond ecology. It still sits at the intersection of the specific type of biology, which is ecology and evolutionary biology, wildlife biology, as well as, social sciences and computer science. [00:11:50] David Staley: it sounds like you're talking about biology as data. Is that what, or is that what ecology and biology is today? Is it about really a study of data? [00:11:59] Tanya Berger Wolf: Oh, absolutely not. [00:11:59] David Staley: And not, not like sort of living things [00:12:02] Tanya Berger Wolf: or Absolutely not. The biologists that I work with really go out into the field and study those zebras and baboons and ants and, birds and plants and everything else out in the natural habit habitats. Occasionally those natural habit habitats are cities, but yes, but there's still natural habits field biologists and the fundamental biological understanding, the scientific method hasn't changed*.* It is all to quote Henri Poincaré is about observation and experiment and, we still go through this cycle of observing, then generating a hypothesis, and then conducting an experiment that tests that hypothesis. The quote that I really, really like from Poincaré's book in the early 20th century. [00:12:53] Where the book is Science and Method, which lays down really this kind of formal definition of what we today call the scientific method. And so he concludes this book saying The scientific method, consistent observation and experiment. Shocking. If the scientist had an infinity of time at his disposal, it would be sufficient to say to him, look, and look carefully. But since he has no time to look at everything and above all to look carefully and something I tell my students all the time, since it is better not to look at all than to look carelessly, he's forced to make a selection. The first question then is to know how to make this selection. And so what I would argue that data science, computer science, technology altogether have not changed in any way the scientific method. What they have done is enabling scientists to look at more things more carefully, and that has been my entire career. On the one hand, technology brings in more things to look at, whether it's the microscope that increased the resolution and went down to the very minimal scale of the cell and subcellular and so on, or the telescopes , and the satellites that increase the scale to the entire planet, technology is only enabling scientists to look at more things. Computational methods, I would argue, are the ones that are helping scientists to look more carefully at those things because at that level of things to look at, the human brain needs a little bit of help to find patterns, to find associations, and to test whether these patterns are significant, interesting, important in the context of all of the deluge of all the other ways, things to look at. [00:14:42] And so computational approaches are actually good at that, but they are only a partner in the scientific exploration. [00:14:52] David Staley: It sounds like you fell into this through serendipity. Is that a fair statement? [00:14:56] Tanya Berger Wolf: Yes, it is. Very much so. As an undergraduate student, I was very much a math and computer science, but as I mentioned, my husband who is i*n ecology, Moha Wolf, I spent *many, many conversations with him and his colleagues, where I walked away with a feeling, oh my God, there's gotta be a better way of answering this question. [00:15:19] And it continued into my PhD, which was very theoretical computer science. But towards the end of it, I really felt that there's too many times where I felt that I could find a better way of answering these ecological questions. And so I figured I should actually try and I did post docs in comp, and I was trying to find a way to call this, , what should I call this field? [00:15:44] David Staley: Hmm. [00:15:44] Tanya Berger Wolf: And the time computational biology was really starting to become well mainstream. Everybody in sciences at least knew about the human genome project. They were training computational biologists. People were talking about that field. And I was like, okay, well, computational biology is mostly about molecular biology and the genome. What would you call something which is about the ecology type of biology? Well, it should be computational ecology, obviously. Right. And I had to explain to everybody. That's the problem with coming up with a term because you then have to explain to everybody what it actually means. And, I did a postdoc with ecologists and while doing that, I also started talking to colleges beyond my advisors, beyond my mentors found a wonderful collaborator, Dan Rubenstein in Princeton, who was asking questions about behavior of zebras. [00:16:39] And I'm like, yeah, okay, well it's like this social network stuff, which was also all the rage that people were doing at the time. Facebook was just beginning, you know, all these online social platforms. I'm like, but I'm sure somebody has done, because you need the, not the static version, but dynamic. [00:16:55] David Staley: Mm-hmm. [00:16:56] Tanya Berger Wolf: And turns out that nobody has done, so we had to develop it and we developed a whole framework of computational approaches and then applied it to understanding behavior of zebras and then it, weirdly, I don't know why, but it took my colleagues three years to get me actually out into the field to see those zebras. [00:17:16] They kept on saying, you gotta see your data. You gotta see your data. I'm like, no, my data looks perfectly fine on a computer screen and a CSV file. They're like, no, no. You gotta see your data. You'll get really the, what we're asking. And so three years later I did go to Kenya and saw the zebras and got completely different understanding of the questions that they were asking, the implicit assumptions that they were making and why they were asking these questions in that particular way. [00:17:42] It's because in some cases it's because that's the data that they could get. And so we got into all these conversations. What are the questions that you would like to ask? Mm-hmm. And realizing that in some cases they didn't have enough data, they didn't have enough tools. So I am not a biologist. I do understand the questions. I do understand the biology and ecologies. I speak fluently now. But the way I answer questions, the way I pose questions is still very, from very computational perspective. And so when I ask them, how do you know who is who's zebra friend? we were standing in the field and they're saying, okay, well you see those zebras interacting? This is a social social group and this is a social network. This is what, how we're represented. I'm like, wait, wait, wait. How do you know who's who here? And they showed me. They showed me this. How they take very carefully. They take a photograph it has to be un obscure. It has to be from a particular side. [00:18:38] They bring it back to the lab. They have to click on the outset of the zebra, and it has to be very carefully fit into the particular model and match stripe for stripe. The whole process took 20 minutes. I'm an inpatient engineer. Two minutes into it, I was like, how long is this gonna take? Five minutes later, I'm like, this is taking forever. [00:18:57] How long is this gonna take? And they're like, patience, Tanya. Patience. I'm like, this is, this is nuts. This is insane. They should took two take two clicks in a couple of seconds. Like, oh, if you're saying that this is what it should take, so why don't you do it? I'm like, you wanna bet? Uh oh. Yep. And so this is, this was a be, I went to my PhD student at the time Mayank Lahiri. [00:19:20] And I said, I just bet my reputation that we can recognize individual zebras from photographs with two clicks. Luckily, well, luckily we also had an idea of how we might go about it and, They also very luckily that my colleague Dan Rubenstein, the zebra expert of the world and I were planning to teach a course because, We thought, okay, our collaboration is fantastic, but we need to train the next generation of interdisciplinary scientists, of computational ecologists, and we need to take them to the field to see their data. [00:19:58] So we planned what we then called Field Computational Ecology Course, where we took computer science and ecology students. Hmm. Ecology and evolution and biology students gave them a background in both disciplines. Took them to the field in Kenya, told them you have to work together. You gotta do something amazing, something that neither of you can answer on your own. [00:20:23] Go. And they did. But the first project, and that in that course was identifying individual zebras from photographs and, that kind of gave the foundation to a lot of work. [00:20:35] David Staley: I'm interested in your current project. I'm gonna make certain I got the name right. Imageomics. Am I saying this right? [00:20:40] Tanya Berger Wolf: Perfect. [00:20:41] David Staley: Tell us about imageomics. [00:20:44] Tanya Berger Wolf: Well, this is a perfect segue because this ability to identify individual zebras from photographs was both fundamental computer vision research that had implications for fundamental biological research. [00:20:59] Mm-hmm. And we also discovered once we published a paper in an obscure computer vision conference, that it was very, very useful for conservation. Within two months of publishing a paper in this obscure conference, we get requests for about. 70 -- can you identify my species? Can you identify my species? [00:21:22] Everything from Hawaiian snails to whale sharks. And so we realized that we needed to build something that could be used. That's the translational aspect of research. Mm-hmm. Build something that could be used by practitioners, conservation, biology practitioners, managers in the field, conservation managers in the field. [00:21:41] Not only, , used for publishing computer vision papers, or even doing fundamental biological research because the ability to identify an individual animal from photograph allows you to track them, count them even through their social network from photos without putting colors on them or other sensors. [00:22:01] It also allows you to bring a lot of data that wasn't. Useful before, such as all the social media data that people go on vacations, take pictures of, animals that they see there and post on all these social media platforms. This is the ability to bring in more things to look at for scientists, right, and also to let them look at it more carefully. [00:22:26] Images goes beyond this. The new field of science that we start establishing with the help of the National Science Foundation grant. In the program of Harnessing Data revolution, which we received a year and a half ago, imageomics is the field of science that extracts biological information from images like genomics before. [00:22:51] It is the field of science, of extracting biological information from sequences. From text using quantitative approaches. For the most part, imageomicsis in a similar way, going from images to biological traits to phenotypes, and for the first time, giving scientists this ability to look and look carefully, right at the natural world. [00:23:17] Now with the help of. Recording devices at different scales in different contexts, whether it's camera traps, autonomous vehicles underwater on the ground and in the air or our phones that, you know, everybody just takes pictures when they go on vacation. Mm-hmm. It doesn't matter. Or satellites, remote sensing or the microscopes we have now. [00:23:42] The ability or we are developing, this is the hope to go from looking to finding the biological traits. Something that we may have even missed. So not only to be able to say, this is the species of bird, which we can do right now. Mm-hmm. But this is the species of bird because it has a yellow belly, black behind the big, and it wobbles when it walks. [00:24:07] So for wobbling, you obviously need videos, not just images, but still images. But yes, and maybe not only these are the, the traits that we'll already know. But because we've missed Something due to our inability to see maybe in the red orange spectrum, we missed some traits, because our inability to quantify them or even inattention, we didn't even think they could be important. [00:24:32] Computationally, we actually can help scientists think, see things that they may have not seen before. Hmm. Or have no ability to see at all, [00:24:41] David Staley: Like a microscope or telescope, [00:24:43] Tanya Berger Wolf: Like a microscope or a telescope, or, there was a paper that showed up, in December that showed that humans could not differentiate between a particular Phenotype of a moth because humans cannot see, do not have enough acuity in the right orange spectrum. [00:24:59] Well, that's because, you know, this, coloration did not evolve for human benefit. It evolved for birds, right? Because these are the predators of the moths. But computationally, algorithmically, we have no problem differentiating between those phenotypes at all. When we tell kids, oh, we can identify individual zebras from photographs, using stripes as a fingerprint, essentially as a body print, right, anything, in fact, we can identify any stripes spotted, wrinkled, notched animals using even the shape of a whale's fluke or the dorsal fin of a dolphin, and now moving on to facial recognition for primates and bears and other animals. But when we talk about zebras, a lot of kids will ask this question. Oh, so baby zebra stripes similar to its moms, Fantastic question. [00:25:48] David Staley: It really is, [00:25:48] Tanya Berger Wolf: Right? It really is fantastic question. But the thing is that until now we had no ability to answer it because we as humans are really good at quantifying similarity of faces. This is what we evolved for, Right? Because , we use faces for kin recognition, for recognizing who is our related to us versus not. [00:26:10] We're really good at saying, oh look, the baby looks like it's mom. The baby looks like it's dead, or that person looks like that celebrity. But if I* *show you pairs of zebra pictures and I ask, you know, are those more similar to each other than these two? Not only, you can't*, *you cannot learn to do this, but computationally, no problem whatsoever. So now, if for the first time can start asking and answering questions like this, are zebra stripes and leopard spots, are they random or are they heritable? Is there some genetic component in that pattern? Are they used in any way for kin recognition, for recognizing, do zebras use it? [00:26:52] Or maybe it's just, altogether, just genetic component that has nothing to do with recognizing your relatives, but still is inherited from parents to kids. [00:27:03] David Staley: You heard it here first, folks. A new science: imageomics. You are also a director and co-founder of the Artificial Intelligence for Wildlife Conservation Software, nonprofit Wild Me. And I'd like to, I'd like to end by talking a little bit about what Wild Me is and what the goals of the project are. [00:27:22] Tanya Berger Wolf: So we've come full circle to this translational aspect of science, going from the foundational fundamental research of extracting biological traits from images to that ability to, for example, recognize individual animal from photographs being useful for conservation. [00:27:40] And so Wild ME is an artificial intelligence for wildlife conservation, nonprofit. It takes these techniques going really fast cycle the AI techniques that are being developed. As a research [00:27:54] David Staley: Hmm. [00:27:54] Tanya Berger Wolf: Project and applying them directly as developing well-engineered with good interface tools that can be used for by conservation community. So the Prime platform the main project of Wild ME is wild Book, which is a platform that uses individual animal id from images. For about 70 species now and growing everything from, um, marine to terrestrial from, yes, now those Hawaiian snails are coming up, but also the weedy, leafy sea dragons and the biggest fish on earth. [00:28:31] well shark as well as the zebras and giraffes and leopards and even seals and many others. The ability to identify an individual animal from photograph, as I mentioned, allows not only to count animals for fun, it provides metrics for conservation because conservation today has a data problem. We're in the middle of what has been termed as the sixth extinction. [00:29:01] David Staley: Mm-hmm. [00:29:02] Tanya Berger Wolf: We're losing biodiversity at an unprecedented scale and rate, more than 10%. Estimated 10% of the Earth's species are threatened with extinction, and yet we actually don't know very well what we're losing and how fast. UN says that biodiversity has a data problem, and while I'm not a biologist who will go out in the field and implement conservation policy or will really help protect the endangered species, data problem I can do something about, and this is what we're doing. We're bringing in lots and lots and lots more data sources. Every image ever. Today, images are probably the most abundant, readily available source of information about anything, including animals and providing tools to extract information from it so that those can then lead to metrics that can be used not only to understand how many animals are there, what's their range, where they're going, but also estimate whether our conservation policies are actually working, whether the interventions that are being put in place by conservation and wildlife managers are doing the job that we thought they would to protect the biodiversity of the world. [00:30:19] David Staley: When you say artificial intelligence, that sort of covers a lot. What sort of artificial intelligence are we talking about here? [00:30:25] Tanya Berger Wolf: Machine learning to be specific. So this is learning. It's very statistical learning. [00:30:30] So this is the type of learning that we talk about today , in the context of chat GPT and DALL·E and all of these, occasionally scary models [00:30:39] mm-hmm. [00:30:40] that take a lot of data and find patterns in those data. In this case, specifically learning from images where we learn the types of methods that we have are basically, the types of methods that we have, and they already exist off the shelf. We can take an image and find all the other animals in it, other objects of interest: that's detection. We can put a bounding box around each animal, even in very complex photographs where the animals are one behind the other, like a baby elephant hiding behind its mom, or a group of zebras where you have to start counting legs to figure out how many there are. [00:31:17] And then going to species classification. So saying, oh, this is a Grevy zebra, this is a Savannah elephant, and this is a whale shark, and this is a Hawksbill turtle. And then going beyond that to individual identification and even beyond that today to being able to say, this is the head, this is an eye within the head, and these are the limbs, and this is maybe a shell of the turtle or a wing of a bird and starting to learn associations, like it's a belly and it's an orange belly, right? Or it's a beak, and then there is a black spot behind the beak. These are semantic kinds of relationships that we as humans learn pretty early on in our childhood and take for granted throughout our adulthood. We need to teach computers how to do it, but these are the types of approaches, the types of methods that today are routinely used already, end up being developed for these models like ChatGPT and DALL·E and others. [00:32:23] David Staley: Tanya Berger Wolf. Thank you. [00:32:25] Tanya Berger Wolf: Thank you. [00:32:28] [00:32:47]