AI vision technology enables machines to perceive and understand the visual world much like how humans see. A combination of computer vision and AI techniques, it can detect and recognize visual elements and analyze attributes like color, shape, motion, and context within images and videos.
By leveraging Microsoft solutions like Azure Cloud and Azure OpenAI Service, California-based Chooch provides AI vision capabilities for a wide range of applications across various industries, enabling machines to accurately interpret and understand visual data. Their recently launched Imagechat infuses large language models (LLMs) with AI vision, which clients can use to connect with image and video data lakes for forensic, training, and analytic needs across live and stored visual content.
I spoke with Chooch’s co-founder and CEO Emrah Gultekin about the staggering amount of visual data we face every day, how AI can help us make sense of it, and what other startups can learn from the advancements in computer and AI vision.
Capitalizing on an explosion of visual data
Emrah doesn’t mince words when it comes to explaining the technological conundrum Chooch is tackling.
“The problem is there’s an explosion of cameras and visual data in the world today,” Emrah tells me. “If you had everyone on Earth reviewing this data, there wouldn’t be enough people to do it. What we’re doing is automating the detection and recognition of events in live streams and historic content by using computer vision AI.”
“This is no longer about just one piece of AI, it’s about audio, language, transcription, translation, tabular data, computer vision—we all have to come together because the impact on the client is so much bigger.”
To accomplish this, Chooch integrates large-scale generative AI vision models and fuses them with LLMs to enable new reasoning and more accurate contextual comprehension for edge- and cloud-hosted applications.
“Our journey with computer vision AI has mainly been around building software infrastructure, but our main innovations have been this ability to place lightweight inference engines in self-hosted and edge environments and fuse the traditional computer vision models with LLMs,” Emrah explains. “The same explosion you see on the language front is also happening with computer vision, and the complex problem of fusing the two is what we’re solving.”
Entrepreneurs can find limitless avenues to utilize computer vision in today’s increasingly monitored world. Emrah points out the technology’s power to enable security and safety officers to analyze images and data from public spaces, workplaces, airports, and industrial sites, aiding in threat detection and response. Industries such as manufacturing and distribution are leveraging computer vision to improve efficiency and mitigate human error. The Chooch AI platform enhances accuracy and speed in visual processes, including defect analysis and quality control, ensuring safer workplace conditions.
Building AI products responsibly
To build successful AI vision solutions, Emrah encourages other startups that cooperation between the visual and language sides of AI is key. The two fields are closely related, as they both rely on the ability to extract meaning from data. A visual AI system that is trying to extract meaning from visuals in a scene or series of frames will need to understand the context of the objects’ names and descriptions. Similarly, a language AI system that is trying to understand a sentence will need to understand the meaning of the words in the sentence and the relationships between them.
“Vision isn’t as impactful without language,” Emrah says. “My advice to startups is to experiment with the multimodal aspect of AI because now we have the capability. Getting technical people together on the computer vision side and the LLM side is a challenge, however, because they have traditionally not spoken the same language. But this is no longer about just one piece of AI, it’s about audio, language, transcription, translation, tabular data, computer vision—we all have to come together because the impact on the client is so much bigger.”
Partnering with Microsoft to focus on building the best solution
Prior to embarking on a new AI era, Chooch had to overcome some of the traditional AI startup issues such as lack of both initial infrastructure and tech stack. Emrah says they had to build a lot of their stack, as well as take an iterative, trial-and-error approach to inferencing and analyzing their progress in this uncharted territory.
Partnering with Microsoft has been critical, Emrah tells me, because of their leadership in the industry with computational power. Chooch uses Azure Machine Learning, Azure Cognitive Services and Azure IoT Hub and Edge to ingest data from edge devices.
“We are intrinsically aligned in terms of doubling down on the AI market and AI for Good,” Emrah says. “Compared to Microsoft’s competitors, we received a lot of support on what we were building. We were also able to leverage many infrastructures and GTM resources Microsoft provided as soon as our relationship began.”
As a member of the Microsoft for Startups Pegasus Program since late 2022, he says he appreciates how Microsoft gives companies the flexibility to focus on developing top-tier solutions that benefit their entire partner ecosystem.
“Microsoft’s CTO, Kevin Scott, said it perfectly,” Emrah recalls. “’Don’t worry about your infrastructure, please—just build good products.’”
Microsoft for Startups Founders Hub members receive Azure cloud credits that can be used toward Azure OpenAI Service or OpenAI to help build their product. Sign up now to become a member.