<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[PeperNoten]]></title><description><![CDATA[Obsidian digital garden]]></description><link>http://github.com/dylang/node-rss</link><image><url>site-lib/media/favicon.png</url><title>PeperNoten</title><link></link></image><generator>Webpage HTML Export plugin for Obsidian</generator><lastBuildDate>Mon, 06 Apr 2026 20:55:55 GMT</lastBuildDate><atom:link href="site-lib/rss.xml" rel="self" type="application/rss+xml"/><pubDate>Mon, 06 Apr 2026 20:55:54 GMT</pubDate><ttl>60</ttl><dc:creator></dc:creator><item><title><![CDATA[Vision-Speech Models: Teaching Speech Models to Converse about Images]]></title><description><![CDATA[MoshiVis adapts a speech LLM (Moshi) to understand images using lightweight cross-attention modules trained on mixed image-text and image-speech data. 🎙️ Demonstrates text-to-audio knowledge transfer despite distribution shift, enabling real-time visual conversations without speech-paired training data.]]></description><link>vision-speech-models-teaching-speech-models-to-converse-about-images.html</link><guid isPermaLink="false">Research/Vision-Speech Models Teaching Speech Models to Converse about Images.md</guid><pubDate>Wed, 19 Mar 2025 00:00:00 GMT</pubDate><enclosure url="thumbnails/fig_2503.15633.png" length="0" type="image/png"/><content:encoded>&lt;figure&gt;&lt;img src=&quot;thumbnails/fig_2503.15633.png&quot;&gt;&lt;/figure&gt;</content:encoded></item></channel></rss>