applied cs lab, making a real impact in academia
ABOUT US
Who we are: Our story
Cyrion Labs was founded to bridge the gap between cutting-edge AI research and real-world applications. Our team of researchers, engineers, and innovators work at the intersection of artificial intelligence, machine learning, and computational sciences to drive ethical and impactful technological advancements.
We focus on computational research for social good, partnering with institutions to create AI-driven solutions that address societal challenges, such as safer internet access for students, bias reduction in AI, and accessibility tools for underserved communities. Our research also extends into natural language processing and generative AI, exploring the frontiers of AI-generated content, language models, and human-computer interaction.
Collaboration is at the core of our mission. We work with universities, independent researchers, and industry partners to produce high-impact studies, with plans to publish in leading conferences such as NeurIPS, CVPR, and ICLR. Our work has already contributed to large-scale projects, including a partnership with a public school district of 66,000 students to develop safer internet access solutions.
Our Specialities
What we do
PROJECTS
Explore our repository of projects by our talented team here at Cyrion
GenECA: A General-Purpose Framework for Real-Time Adaptive Multimodal Embodied Conversational Agents
We present GenECA, a general-purpose framework for real-time multimodal interaction with embodied conversational agents. GenECA provides the first ECA system able to deliver context-aware speech and well-timed animations in real-time without reliance on human operators. Through modular design, it can support a wide variety of applications, such as education, customer service, and therapy.
Accepted IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
GenECA: A General-Purpose Framework for Real-Time Adaptive Multimodal Embodied Conversational Agents
We present GenECA, a general-purpose framework for real-time multimodal interaction with embodied conversational agents. GenECA provides the first ECA system able to deliver context-aware speech and well-timed animations in real-time without reliance on human operators. Through modular design, it can support a wide variety of applications, such as education, customer service, and therapy.
Accepted IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
GenECA: A General-Purpose Framework for Real-Time Adaptive Multimodal Embodied Conversational Agents
We present GenECA, a general-purpose framework for real-time multimodal interaction with embodied conversational agents. GenECA provides the first ECA system able to deliver context-aware speech and well-timed animations in real-time without reliance on human operators. Through modular design, it can support a wide variety of applications, such as education, customer service, and therapy.
Accepted IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
Towards Leveraging Semantic Web Technologies for Automated UI Element Annotation
This paper details a Chrome extension for auto-mated web UI element annotation using Semantic Web technolo- gies. This system is primarily designed in order to accommodate Visual Language Model (VLM) based web-agents, and provide them with distilled information about the current web landscape semantically. The system uses NLP models, vector embeddings, and FAISS for fast similarity searches to produce structured JSON annotations from unstructured visual web content. Im- plementation logic includes DOM observation, API interactions for semantic annotation and embedding generation, and BLEU evaluation. Evaluations show near-real-time performance, show- casing its potential for assisting VLM-first web development.
Accepted IEEE International Conference on Inventive Computation Technologies 2025
Towards Leveraging Semantic Web Technologies for Automated UI Element Annotation
This paper details a Chrome extension for auto-mated web UI element annotation using Semantic Web technolo- gies. This system is primarily designed in order to accommodate Visual Language Model (VLM) based web-agents, and provide them with distilled information about the current web landscape semantically. The system uses NLP models, vector embeddings, and FAISS for fast similarity searches to produce structured JSON annotations from unstructured visual web content. Im- plementation logic includes DOM observation, API interactions for semantic annotation and embedding generation, and BLEU evaluation. Evaluations show near-real-time performance, show- casing its potential for assisting VLM-first web development.
Accepted IEEE International Conference on Inventive Computation Technologies 2025
Towards Leveraging Semantic Web Technologies for Automated UI Element Annotation
This paper details a Chrome extension for auto-mated web UI element annotation using Semantic Web technolo- gies. This system is primarily designed in order to accommodate Visual Language Model (VLM) based web-agents, and provide them with distilled information about the current web landscape semantically. The system uses NLP models, vector embeddings, and FAISS for fast similarity searches to produce structured JSON annotations from unstructured visual web content. Im- plementation logic includes DOM observation, API interactions for semantic annotation and embedding generation, and BLEU evaluation. Evaluations show near-real-time performance, show- casing its potential for assisting VLM-first web development.
Accepted IEEE International Conference on Inventive Computation Technologies 2025
VIZ: Virtual & Physical Navigation System for the Visually Impaired
While existing navigation tools—both physical and digital—are common, they remain fragmented and often lack proper accessibility support. VIZ addresses these gaps using generative AI to mimic human reasoning, enabling complex digital tasks through voice input and human-like memory via vector databases and similarity search. VIZ offers comprehensive scene descriptions, real-time collision avoidance, voice and vibration-based navigation, and memory-based task execution. Built with a custom 3D-printed chassis, it features LiDAR for obstacle detection, a microphone and speaker for voice interaction, motors for tactile feedback, and a Raspberry Pi for processing. Its digital assistant follows a three-step model: determining user intent, leveraging a modified ReAct architecture, and dynamically refining responses—effectively mimicking human decision-making to enhance digital accessibility.
Accepted IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
VIZ: Virtual & Physical Navigation System for the Visually Impaired
While existing navigation tools—both physical and digital—are common, they remain fragmented and often lack proper accessibility support. VIZ addresses these gaps using generative AI to mimic human reasoning, enabling complex digital tasks through voice input and human-like memory via vector databases and similarity search. VIZ offers comprehensive scene descriptions, real-time collision avoidance, voice and vibration-based navigation, and memory-based task execution. Built with a custom 3D-printed chassis, it features LiDAR for obstacle detection, a microphone and speaker for voice interaction, motors for tactile feedback, and a Raspberry Pi for processing. Its digital assistant follows a three-step model: determining user intent, leveraging a modified ReAct architecture, and dynamically refining responses—effectively mimicking human decision-making to enhance digital accessibility.
Accepted IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
VIZ: Virtual & Physical Navigation System for the Visually Impaired
While existing navigation tools—both physical and digital—are common, they remain fragmented and often lack proper accessibility support. VIZ addresses these gaps using generative AI to mimic human reasoning, enabling complex digital tasks through voice input and human-like memory via vector databases and similarity search. VIZ offers comprehensive scene descriptions, real-time collision avoidance, voice and vibration-based navigation, and memory-based task execution. Built with a custom 3D-printed chassis, it features LiDAR for obstacle detection, a microphone and speaker for voice interaction, motors for tactile feedback, and a Raspberry Pi for processing. Its digital assistant follows a three-step model: determining user intent, leveraging a modified ReAct architecture, and dynamically refining responses—effectively mimicking human decision-making to enhance digital accessibility.
Accepted IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
PhysNav-DG is a novel navigation framework combining classical sensor fusion with vision-language models for accurate and transparent decision-making. Its dual-branch architecture predicts navigation actions and generates chain-of-thought explanations. A context-aware Adaptive Kalman Filter refines state estimation using sensor data and semantic cues from models like LLaMA 3.2 11B and BLIP-2. We also introduce MD-NEX, a multi-domain benchmark spanning indoor, driving, and social navigation with ground-truth actions and human-rated explanations. PhysNav-DG boosts navigation success by over 20%, with grounded (score: 0.87) and clear (4.5/5) explanations—bridging semantic reasoning and geometric planning for safer autonomous systems.
Accepted DG-ERBF at IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
PhysNav-DG is a novel navigation framework combining classical sensor fusion with vision-language models for accurate and transparent decision-making. Its dual-branch architecture predicts navigation actions and generates chain-of-thought explanations. A context-aware Adaptive Kalman Filter refines state estimation using sensor data and semantic cues from models like LLaMA 3.2 11B and BLIP-2. We also introduce MD-NEX, a multi-domain benchmark spanning indoor, driving, and social navigation with ground-truth actions and human-rated explanations. PhysNav-DG boosts navigation success by over 20%, with grounded (score: 0.87) and clear (4.5/5) explanations—bridging semantic reasoning and geometric planning for safer autonomous systems.
Accepted DG-ERBF at IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
PhysNav-DG is a novel navigation framework combining classical sensor fusion with vision-language models for accurate and transparent decision-making. Its dual-branch architecture predicts navigation actions and generates chain-of-thought explanations. A context-aware Adaptive Kalman Filter refines state estimation using sensor data and semantic cues from models like LLaMA 3.2 11B and BLIP-2. We also introduce MD-NEX, a multi-domain benchmark spanning indoor, driving, and social navigation with ground-truth actions and human-rated explanations. PhysNav-DG boosts navigation success by over 20%, with grounded (score: 0.87) and clear (4.5/5) explanations—bridging semantic reasoning and geometric planning for safer autonomous systems.
Accepted DG-ERBF at IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
we're always looking for talented researchers, no matter your background, education, or age
we're always looking for talented researchers, no matter your background, education, or age