A color-coded breakdown of an AK-47 asset from Counter-Strike: Global Offensive, highlighting its computational efficiency and label accuracy.

Automatic Mask Labeling with Segment Anything and Visual Question Answering

Software that combines the Segment Anything Model (SAM) with Visual Question Answering (VQA) models for applications in the world of gaming, specifically to identify and label the parts of multiple assets from different video games. The software incorporating Large Language Models (LLMs) identifies those parts.

Project Overview

Can you provide a brief overview of the project you've been working?

The research thoroughly evaluated the performance of these models, focusing on their integration and the seamless automation they bring to image segmentation and labeling tasks. Through comprehensive testing and application within real video game scenarios, the system sets the first step for the automated labeling of gaming assets. This not only enhances the development process but also paves the way for more creative and detailed customization in video games, enriching the gaming experience and opening new possibilities for developers and players alike.

Purpose of the project

What inspired or motivated you to choose this particular project?

The demand for customizable and visually striking assets is evergrowing. Traditionally, studios provide 2D representations of these assets, which undergo UV mapping to facilitate adjustments in the 3D space. This process often requires manual intervention and can be time-consuming, hindering the creative flow of developers. The current pipeline for identifying and labeling these parts entails a meticulous and manual multi-step process.

A diagram illustrating a proposed pipeline for processing with the FastSAM system in a technical presentation.

Technical Details

Could you explain the technical aspects of your project? What software, tools do you use?

Combination of the Segment Anything Model (SAM) with Visual Question Answering (VQA) models.

This image shows a flowchart diagram describing a process for analyzing uploaded assets, producing black and white masks, cropping parts, running a model, and labeling results.

Challenges and Solutions

Were there any significant challenges you encountered during the project, and how did you overcome them? Can you share a specific problem-solving moment that stands out in your project?

Applying these foundational models to UV mapping images was the most difficult part because I used data that these models had not been previously trained on. The absence of similar data in the training sets of these models leads to an increased rate of false positives and identification failures. This needed targeted training or significant adaptation of these models to handle the unique features of UV mapping images effectively.

Learning and Takeaways

What key lessons or skills have you gained from working on this project?

AI cannot yet substitute humans in multiple fields: there is a necessity for dedicated research to extend the capabilities of AI in specialized fields, paving the way for substantial enhancements and broader applications in future gaming developments, and other specialized fields.

Future Development

Do you have plans for further development or improvement of your project in the future?

GPT-4 Integration, Merge masks for the same part, Fine-tuning.