Michael recently sat down with Aaron Edell, President and CEO of GrayMeta, to help us make sense of all the new AI / ML technology entering our industry. They discuss the evolution of AI in the workplace, the limitations of this technology, and how you can use it to make your life easier.
Michael: Michael Kammes with Shift Media here at NAB 2023, and today we’re joined by Aaron Edell from GrayMeta. I’ve been looking forward to this interview, this entire NAB, because everyone’s been asking about AI and machine learning, and when I have questions about AI and machine learning, this is who I go to. So, Aaron, thanks for being here today. Let’s start out very simply. Tell me just what the heck GrayMeta is.
Aaron: Wow. That is actually a loaded question. So, when I started in tech in 2008, my first job was at a company called SAMMA Systems, a little startup based out of New York. It made a robot that moved videotapes into VTS, and we digitized it. So, okay, set that aside. Keep that in your memory. Years later, we founded GrayMeta, based on the idea that we wanted to extract metadata from files and just make them available separately from the files. Now that I’m back at GrayMeta, we have three core products. We have the SAMMA system back, and it’s so much better than it used to be. We used to have eight massive servers and all this equipment. Now it’s, you know, one server, a little bit of equipment, and we can plow through hundreds of thousands, if not millions, of video cassette tapes on customers’ facilities.
And we work with partners. If they wanna do it with OPEX, they can buy the equipment from us. And it’s an Emmy award-winning technology. So there are a lot of really, really wonderful proprietary things that archivists love about migrating. So you’re not just digitizing. GrayMeta also has, I think, the world’s most accurate reference player in QC product software tool that runs both in the cloud and on-prem, which is pretty cool. The cloud thing is magic, as far as I’m concerned. I don’t know how you play back MFX-wrapped, LP1, or, you know, encrypted DCP files off the cloud. Somehow we figured it out. And then we have Curio, which is our metadata, sort of creation data platform that uses machine learning to, as I kind of explained in their original vision was to take all these files and just create more metadata. So we really are across the media supply chain. And if you were to diagram it out, you would find GrayMeta’s products at different points.
Michael: That’s gotta mean that you have a bunch of announcements along the entire supply chain for NAB. So let’s hear about those.
Aaron: Yes. Well, I think the most exciting is that when I first came back to GrayMeta, which was really not long ago, one of the things I really pushed hard for was a new product or a repositioning of our product. So we were happy to actually be able to announce it at NAB and not just announce the product, but that we signed a deal with the Public Media Group of Southern California to buy it. So we’re announcing Curio Anywhere, our metadata machine learning management data platform, which is now available on-prem and can run on an appliance or a local cluster as well as in the cloud. So there are hybrid applications, there are on-prem applications, but I think the most important thing is that all the machine learning now can just run locally to where you’re processing the metadata, and that saves a lot of money, a lot of time.
You know, our product, and we’re gonna be kind of expanding on this in the future, allow you to train these models further. We kind of use the word tune the models further to be more accurate with your content, using your own content as training data. So we’re really excited to announce that at NAB. We’ve got a whole lot of other features that we’ve added to Iris as well. We can support down to the line. You can get some QC data, used to be the whole frame, but now we can actually look at individual lines in a video. Curio now also supports sidecar audio files. It’s frame-accurate timecode, which was really important, obviously, for a lot of customers. So you can export a shot list or an EDL right out of Curio of, you know, a shot with all of, let’s say, the gun violence locations on a timecode-accurate place, or all of the places where a certain celebrity or known face appears in the time code, accurate timeline, which you can just then pull into your nonlinear editor.
Michael: We’ll talk about this more later, but I want anyone watching right now to understand just how important the ability to localize machine learning and AI is. That keeps your content secure. You don’t have to pay the tax of using a cloud provider to do their cognitive services. So we’ll talk about more of that later, but you need to understand just how important that is. So the, the main product agreement is offering that. Can you explain some of the features that Curio has in terms of AI and ML?
Aaron: Yes. So the way I like to describe it is you tell Curio where your storage locations are, and it walks through these storage locations. And for every file, really, I mean, it doesn’t just have to be video file, but video is kind of the most obvious one. It will apply all of these different machine-learning models to that video file. So face recognition, logo detection, speech-to-text, OCR, natural language processing, you know, there are other models like tech cues, simple things like that. You know, tech cues is a really interesting one because detecting color bars, right? Color bars come in all shapes and sizes, ironically, which it shouldn’t cause its color bars.
Michael: Well, NTSC, no, never the same color.
Aaron: Exactly. Yes. But the kind of general concept of color bars is something that, for machine learning, it’s so easy for that to detect. But I think what’s really my favorite aspect is what we’re doing with faces right now. And this is, again, going to expand. Let’s say you process a million hours of content, like you’re a public television station in Los Angeles, and there are scientists and artists who you’ve interviewed in the past, maybe not part of a global celebrity recognition database that you get from the big cloud vendors or other vendors, but they’re important. And you want to be able to search by them. So Curio will process all of that content, and it’ll say, I found all these faces. Who is this? Right? You just type in the name, and it immediately trains the model.
So you don’t have to reprocess all 1 million hours of content. It will just update right there on the spot instantly. So that’s really powerful, I think, because a lot of folks are concerned that they need to, that the machine learning model needs to tell you who it is. But it doesn’t. It just needs to tell a person that you need to tag this. It’s about helping people do their jobs better. So we also have a lot of customers who have some great use cases. I think reality television is one of the big ones.
Aaron: They have 24 cameras running 24 hours a day, every day for a week. And that’s thousands and thousands and thousands of hours of content. One use case I heard recently was we have a reality show where people try not to laugh, right? I guess things that are funny happen, and they’re not supposed to laugh. And so when they were trying to put together a trailer, they wanted to find all the moments that people were laughing amongst hundreds of thousands of hours of content. So we could solve that immediately. That’s very easy. Just here are all the points where people were smiling. So I’m really excited about some of the simpler things, some of the simpler use cases, which involve not just tagging everything perfectly a hundred percent the first time but helping people do their jobs better and saving them so much time.
Because imagine you’re an editor, and you’re trying to find that moment where Brad Pitt is holding a gun or something like that amongst your entire archive, or really just any moment of an interview. Let’s say you’re a news organization; You’ve interviewed folks in the past, and maybe somebody passed away, and you need to pull together the footage you have quickly. Machine learning can help you find those moments. So customers use Curio, and they just search for a person. It pulls it up wherever it is, right? It could be stored anywhere in any place, as long as Curio has had a pass at it. It pulls those moments up. Here’s the bit in that moment, in that file, you can watch it and make sure it’s what you want, and then pull it down. It’s a simple use case, but it’s really powerful.
Michael: Some of the other use cases I’ve talked at length about are things like Frankenbiting. Being able to take something that takes 30 seconds to say, getting it down to 10 seconds by using different words that that person has spoken through different places. That used to be a tedious procedure where you’d have to go back through transcripts, which you had to pay someone to do. Now you can type in those words into something like Curio, find those timestamps in a video, localize that section of video, and string together a Frankenbite without having to spend hours trying to find those words.
Aaron: Yes. There’s a term for not doing that, which is called Watchdown. I just learned this recently. It’s where you as an editor, and I hope this is the right term, but I read about it in an article from Netflix editors, but they’re trying to put together trailers. They just have to watch every hour of everything they own for the moments that they want. And, yeah, nobody should have to do that. You don’t need to do that.
Michael: There’s that great line. It’s kind of cliche at this point, but when an editor’s putting something together, and the producer or director isn’t thrilled with the shot, and they say, “Didn’t we do a better take of that?” or “Didn’t we have a take of someone saying that,” and like, no, because you’ve sorted everything, and here is everything that was absolutely done despite your memory. So there are a lot of misconceptions, right? AI has been really hyped right now. I almost wish we had used the word AI five or six years ago because it was cognitive services. Which is not really a sexy term, but there are a lot of misconceptions about AI and what AI and ML are. Can you maybe shed some light on what those misconceptions are and the truth, obviously?
Aaron: Yeah, absolutely. So, first of all, the word artificial intelligence is quite old and can literally apply to anything. Your kids are artificially intelligent. Think about that. You’ve created your children, and they are intelligent, right? So it’s, I mean, just like any buzzword that gets thrown around a lot. There are a lot of different meanings. The one misconception that is my favorite is that AI is going to take over the world, or general artificial intelligence is, you know, ten years away, five years away, and they’re gonna kill us all, and that’s it—end of humans. I cannot tell you how far away we are from that. Think about how hard it is to, like, find an email sometimes, right? I mean, computers, you have to tell them what to do.
There’s no connection between the complexity of a computer and the complexity of a human brain, right? There just isn’t. One of my favorite examples is the course I took that ended up being very important to my career, but I had no idea at the time, back in college a million years ago, which was called, ironically, What Computers Can’t Do. And my favorite example was, imagine you’ve trained a robot with all the knowledge in the world. You then tell the robot to go make coffee. It will never be able to do that because it doesn’t know how to ignore all the knowledge in the world. It’s at the same time thinking, oh, that pen over there is blue, and you know, the date that America was founded, and, um, all of these facts and just information that it has built into it. It doesn’t know how to just make coffee. It doesn’t know how to filter all that stuff out.
Michael: No pun intended.
Aaron: No pun intended. Yes. I think that’s something that’s unique to humans: our ability to actually ignore and to just say, yeah, I’m just gonna make coffee. And you barely even think about it, right?
I think even the most advanced artificial network supercomputers are probably the equivalent of maybe 1% of a crow’s brain, right? So in terms of complexity, again, we’re not talking about the contextual things that humans learn. So that’s my favorite misconception – that artificial intelligence is going to destroy us all and be as smart or smarter than us.
Now, the difference is machine learning. So the term machine learning, it’s a subset of artificial intelligence. It’s usually solved by using neural networks. These are all different things, right? So we’re drilling down now. Now, a neural network is specifically designed to mimic how a human thinks and how a brain works. And it is a bit mysterious. We had a machine learning model in the past where it would learn what you liked and it was based on what things you clicked on a website or something like that.
It would surface more things, and we built this very clever UI that would show it learning as it went along. So let’s say you have millions of people on your website, and they’re clicking things. We have no idea how it’s working. We don’t know. The neural network is drawing connections between nodes and just trying to get from when input is x, I want the output to be y. And you’re just saying, figure out how to do that in the fastest, most efficient way in between. And that’s what humans do. When we learn new words as babies, the first thing we do is make a sound. And then we get feedback from people, “Nope, that’s not bath, that’s ball.” And your brain goes, okay, and tries again, and it’s a little bit better. It tries again. And that’s our neural network building. So in that sense, machine learning models can operate similarly, but they’re so much less complex. The most complex thing in the universe is the human brain. There’s just nothing like it. And I don’t think we’re anywhere close to that. So I don’t think anybody needs to worry about artificial general intelligence taking over, Skynet, taking over and launching nuclear missiles and killing us all in spite of it making for good movies.
Michael: I think you put a lot of people’s minds at ease, but there are a lot of creatives in our industry who are seeing things like stable diffusion. And reporters are seeing something like ChatGPT being a front end to create factual articles. Folks are worried about their jobs being eliminated. And I think one point to remember is that everyone’s job is constantly evolving, right? It, there’s always been change. But what would you say to the creatives in our industry who are concerned about AI taking their jobs?
Aaron: Your concerns are valid in the sense that, you know, I would never presume to go and tell somebody you should not be worried about anything. It’ll be fine. I don’t know. But throughout my whole career in AI, it’s been true that it should make your job easier. I mean, it’s supposed to be used to make your life easier. So I’ll give you some examples. When I took over as CEO of GrayMeta, one of the things I wanted to do was some marketing, right? And I didn’t have a whole lot of time, and I wanted to get some catchphrases for the website or write things in a succinct way. So I used ChatGPT, and I said, Hey, here’s all the things our products do. These are all the things I wanna talk about. Help me summarize it. Give me ten sentences, bam, bam, bam, bam, bam. They were great.
Now, that could have been somebody’s job, I suppose, but that was also my job. So I could have sat there all day and tried to come up with that myself, but I didn’t have time. So it made my job a lot easier as a marketing person. I used Midjourney to create interesting images to post on LinkedIn and those sorts of things. But there is no person at my company whose job it was to create interesting images. Our only alternative was to go and find some license-free images that you can find on the website and post them.
So there were no jobs being lost for us. It only made us, as a small company, more productive. Now, I think that the other part of what we’ve always talked about with machine learning is scale. And, you know, even Midjourney, Midjourney’s very cool. But if you look, if you zoom in a little bit, eyeballs are [off center], fingers are weird. And I’m sure that will improve over time. And we all need to be very careful about understanding what we see on the internet, when we see images that are fake, when there’s music that’s fake, and when there is artificially created or content that’s written by artificial intelligence. But I still think that humans are needed because I don’t think we’re ever going to get the creativity that humans have. It’s the same kind of example I was talking about earlier.
You can’t train a robot to make coffee if you give it all the knowledge in the world. I don’t think you can train an AI to be an artist in the same way that humans can. There’s just something about the human experience and the way that we process information that just can’t be replicated. But it can make an artist’s job easier, you know? So, you know, add some snow or change the color of this image. Or, as somebody who needs to acquire art or creative images, it helps me to at least give a creative person some examples. Like, can I have an image that sort of looks like these things? As a prompt.
I do think there’s evolution in the jobs. I think that if we all try to think about it as a way that makes our jobs more efficient, saving us time, and maybe I like to think of it as just taking over the laborious parts of our job. I mean, think about logging. Editors logging, that was my first job as an intern at KGO Television, watching tapes and logging every second, right?
Michael: And to some extent, that’s still done. Unscripted still does that.
Aaron: Yes. Yes, and they shouldn’t. I don’t think they have to. I think any logger would, aside from like an internship, would appreciate editing a log that’s been created by AI instead of actually creating it. So imagine you have your log, and all your job is to do, is to make sure it’s right. That’s such a better use of human brains. We’re so good at seeing something and then saying, yes, that’s correct, or that’s not correct. Or that’s a, or that’s b, instead of having to come up with the information ourselves from scratch. It just takes so much more time and is so much more annoying, laborious aspects of our brain that could just be put to better use. So that’s how I see it. I’m sure there are examples of people losing their job because the team just wants to use Midjourney or something like that. And, yeah, I mean, that sucks. Nobody should have to lose their job over that. But I think that’s the same thing, like we no longer use horses, right? We started having cars, like, you don’t need horses or people to take care of your horses anymore.
Michael: Right. But 90 mechanics.
Aaron: 90 mechanics. Exactly. So now we need people who can make good use of these machine learning models, and now we need people who know how to train them and understand them. And it should all kind of float around and work out towards a future where our jobs are different.
Michael: You used the term laborious, and I think that’s what most people need to realize is that the AI, the tasks that AI are going to do are the things that we don’t really want to do, right? At HPA, we saw that Avatar: The Way of Water had something like a thousand different deliverable packages, which each had its own deliverables inside of that. And that’s tens of thousands of hours of work that, at some point, we can automate. So we don’t need to do that. So you can move on and make another film. So if there were jobs, let’s say for the next 12 months, because the AI landscape is changing so quickly until NAB 2024, what tasks would you say, AI do this, humans do this?
Aaron: Let’s take content moderation, for example. So you’re distributing your titles to a location where there can’t be nudity, or there can’t be gun violence, or there can’t be something, or even certain words can’t be spoken. I would tell AI to go and process all that with content moderation that is trained to detect those things. But I wouldn’t just trust it. I would tell the human to review it. So now your job, instead of going through every single hour of every single file that you’re sending over there and checking, is to just review and check for false positives or false negatives should save you 80% of your time, assuming that the models are 80% accurate. Right? So that’s an example of something that I would say – humans review, machines go and process. That’s simple. I would say that’s the best human-in-the-loop aspect of any kind of machine-learning pipeline or ecosystem.
Michael: I am immensely grateful that you’ve come here today. I wanna do another webinar, another video with you. We’ll find a way to make that happen. If you’re interested in AI and ML as it pertains to our industry – M&E, check out GrayMeta. He’s Aaron. I’m Michael Kammes with Shift Media here at NAB 2023. And thanks for watching.
Miss our interview with Mark Turner, Project Director of Production Technology at MovieLabs? Watch it now to learn more about their 2023 Vision.
For tips on post-production, check out MediaSilo’s guide to Post Production Workflows.
MediaSilo allows for easy management of your media files, seamless collaboration for critical feedback and out of the box synchronization with your timeline for efficient changes. See how MediaSilo is powering modern post production workflows with a 14-day free trial.