Phi Silica's compact language model has become multimodalized

Phi Silica's compact language model has become multimodalized

Machine intelligence

Microsoft's compact language model (SLM) Phi Silica can now understand not just text but also images, making it multimodal. This is a significant update as the model is integrated into Windows 11 on Copilot+ PCs and runs on the local Neural Processing Unit (NPU) built into the processor. Currently, the model only works with Snapdragon X processors, but support for AMD and Intel chips is coming soon.

Rather than adding a separate bulky model for image analysis, Microsoft engineers combined existing components. The foundation remains the already deployed Phi Silica model. For computer vision, they're using the Florence image encoder, which is also used in the preview versions of Windows Recall and AI-enhanced search. Finally, they developed a small additional "Projector" module (just 80 million parameters) that converts Florence's visual data into a format Phi Silica can understand.

This approach allowed Microsoft to avoid deploying another large model on users' devices, saving disk space and reducing computational load. Reusing existing components also reduced training costs and time.

The primary practical application of this new technology is generating image descriptions for blind or visually impaired users. The multimodal Phi Silica can create image descriptions at varying levels of detail for the Narrator feature.

Phi Silica generates brief image descriptions (135 characters on average) in about 4 seconds, while more detailed descriptions (400-500 characters) take about 7 seconds. The current version is optimized for English, but support for other languages is planned.

Example description of an image showing Oahu island map

Brief description:

The image shows a map of Hawaii's Oahu island, marking various locations including Kahuku Point, Kawela Bay, Kahuku, and other points and settlements like Pearl City and Ewa Beach.

Detailed description:

The image is a map of Hawaii's Oahu island showing various locations and geographical features. Key places include Kahuku Point, Kawela Bay, Kahuku and Kauai. Other notable locations: Laie, Koolau, Hauula, Punaluu, Mokulēia, Anahulu, Waialua, Kaaawa and Kaukonahua. The Pacific Ocean is also visible. Among other mentioned places are Whitmore Village, Mt. Kala, Kualoa Point, Wahiawa and Kaneohe. The image also shows various towns and cities like Mililani, Ahuimanu, Pearl City and Waipio. Other notable locations include Kahulu, Nānākuli, Koolau and Waimanalo.

Additional technical information about the Phi Silica model update can be found in Microsoft's official blog at this link.

Source
Laptop Motherboard Schematics
Laptop Motherboard Schematics

No comments.

What is your operating system?
Social media
Loading...

Did you find a mistake?
You can report it to the administration.
Press CTRL+Enter while highlighting text