Facebook Twitter Instagram
    Trending
    • VOOPOO DRAG X2 vs DRAG X3: A Head-to-Head Comparison
    • Vaporesso ECO NANO PLUS vs Vaporesso iMate OS: Comparison Review
    • Potensic ATOM 2 RC Drone Review: A 10KM FPV Powerhouse That Fits in Your Palm
    • SKE BAR 15K Disposable Vape-More Puffs, Same Iconic SKE Flavour
    • AIRITON AI-T3 Thin Walkie Talkie: Redefining Portability with an Ultra-Thin, Feature-Packed Walkie-Talkie
    • Diatone SNT PK118 PokeGT RC Car Review at 26.99usd: Your Pocket-Sized Portal to RC Drifting Fun
    • JNR Sellarc 100K Review: The Ultra-Endurance Refillable Experience
    • DWi H10 RC Helicopter Review at $22.99: Elevating the Beginner Experience with Smart Tech
    Facebook YouTube
    Login Register
    IGeeKphone China Phone, Tablet PC, VR, RC Drone News, Reviews
    • HOME
      • NEWS
        • DeepSeek
        • ChatGPT
        • Minecraft
    • Amazon
    • CHRISTMAS
    • PHONE
      • Top Phones For Your First Choice
      • Phone Comparison
      • Xiaomi
      • Blackview
      • Unihertz
      • Doogee
      • Black Shark
      • Geekbuying
      • Banggood
      • TEMU
      • TikTok
      • Aliexpress
      • Walmart
      • Newegg
      • MercadoLibre
      • Lazada
    • TOP VAPE Awards for 2025
    • VAPES
      • E-CIGAR Upcoming
      • Vape News
      • Vape Market Trend
      • Vape Deals
      • Expo News
      • Vape Comparison
      • Vape Guide
        • Guide For Beginners
        • Guide for Best Users
      • Giveaway
    • BEST VAPE
      • Best Vape Stores
      • Best Starter Vape Kits
      • Best Vapes for Beginners
      • Best Disposable Vapes
      • Best Pod Systems
      • Best Pod Mod Vapes
      • Best Mods
      • Best Nicotine Pouches
      • Best Clearomizers/Tanks
      • Best E-Liquid
      • Best EGO/Pens
      • Best Vapes for Nic Salt E-Juice
      • Best Vapes to Quit Smoking
      • RDA vs. RDTA vs. RTA
    • Best Vape Brand 2025
      • VAPORESSO
      • VOOPOO
      • OXVA
      • NEXA BAR
      • ORIONBARTECH
      • MASKKING VAPE
      • VEIIK
      • MEMERS
      • SP2S
      • JNR
      • TODOO
      • MRFOG
    • REVIEW
      • E-cigar Review
      • Phones
      • Tablet PC
      • TV Box
      • RC Drone
      • Wearables
      • Camera
      • Accessories
      • VR Headset
    • MORE
      • 3D PRINTER
        • 3D Printer Review
        • Anycubic
        • FLSUN
        • Xtool
        • LONGER
        • Top 3D printer to Choose First
      • TREND
      • CLOTHES
      • AUTO CAR
      • POWER STATION
        • Oukitel
        • FOSSIBOT
      • GAMING
        • Top Gaming Products
      • E-BIKE
        • Samebike
        • Happyrun
        • ENGWE
      • TABLET
        • Chuwi
        • INNOCN
        • Teclast
        • Top Tablet for Your First Choice
        • Tablet/Laptop Comparison
      • WEARABLES
        • OneOdio
        • BlitzWolf
        • Top Smartwatch for First Choice
      • SMART HOME
      • TV BOX
        • Chuwi mini pc
        • Beelink
        • GMKTEC
        • MOREFINE
      • RC DRONE
        • DJI
        • MJX
        • JJRC
        • Hubsan
        • Top RC Drone
      • CAMERA
        • Gopro
        • Insta360
        • Andoer
      • ACCESSORIES
      • VR HEADSET
      • ROM
        • SAMSUNG
        • XIAOMI
        • ASUS
        • MEIZU
        • LENOVO
        • HUAWEI
        • ONEPLUS
        • ZTE
        • UMIDIGI
        • DOOGEE
        • HOMTOM
        • ELEPHONE
        • ULEFONE
        • BLACKVIEW
        • VERNEE
        • LEAGOO
        • CHUWI
        • TECLAST
        • PIPO
        • TV BOX ROM
    • DEAL
    • Shop
    IGeeKphone China Phone, Tablet PC, VR, RC Drone News, Reviews
    You are at:Home»ChatGPT»Visual ChatGPT: It Absorb All the AI Drawing Skills
    ChatGPT

    Visual ChatGPT: It Absorb All the AI Drawing Skills

    Farrukh AhmadBy Farrukh AhmadMarch 11, 2023
    Facebook Twitter Pinterest LinkedIn Tumblr Email

    ChatGPT Visual version connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting. For example,  Ask it: Can you generate a cat slice for me? As we can see, Immediately there are all the text and pictures.

    Besides, it can also adjust the picture according to the new text command: replace the cat with a dog.

    At the same time, I can understand pictures and have the ability to understand. For example, send a picture to it, and then ask what color is the motorcycle? It answers black.

    As mentioned above, it is the visual version of ChatGPT proposed by senior researchers of MSRA. By combining multiple visual models for ChatGPT and utilizing a prompt manager (Prompt Manager), they successfully enabled ChatGPT to handle various visual tasks.

    This work became popular as soon as it was released, and the number of stars on GitHub has exceeded 1.5k. A brief summary is the feeling of merging GPT and Dall-E~

    Knowing words and drawing pictures… Someone said:

    Isn’t this the ultimate meme maker?

    The trick is that the prompt works?

    Visual ChatGPT, in fact, enables ChatGPT to handle multimodal information. But training a multimodal model from scratch is a lot of work. The researchers thought that some visual models could be combined on the basis of ChatGPT.

    To achieve this goal, the key needs an intermediate station. As a result, they proposed the concept of a Prompt Manager.

    It has three main functions:

    First, clearly tell ChatGPT the role of each visual model, and specify the input and output formats.

    Second, convert different visual information, such as converting PNG images, depth images, mask matrices, etc. into language formats, which is convenient for ChatGPT to understand.

    Third, process the historical generation results of the visual model, as well as the call priority of different models, avoid conflicts, etc., so that ChatGPT can receive the generated content of the visual model in an iterative manner until the output is satisfactory to the user.

    In this way, the workflow of Visual ChatGPT looks like this:

    If the user enters a picture, the model will first send the content to the prompt manager, and then convert it into language for ChatGPT to judge. When it finds that the question does not need to call the visual model, it will directly give the output (the first answer).

    For the second question, ChatGPT needs to use a visual model to analyze the content of the question, so the visual model will start to execute, and then iterate until ChatGPT judges that it no longer needs to call the visual model, and then the result will be output.

    According to the paper, Visual ChatGPT contains 22 different visual models. Including Stable Diffusion, BLIP, pix2pix, etc. In order to verify the ability of Visual ChatGPT, they also conducted a large number of zero-shot experiments (zero-shot experiments).

    Results As shown at the beginning, Visual ChatGPT has a strong ability to understand images. The pictures can be continuously generated and modified according to the needs of people.

    Of course, the researchers also mentioned that there are still some limitations in this work. For example, the quality of generated results mainly depends on the performance of the visual model. And the use of a large number of hint projects will affect the speed of generating results to a certain extent.

    And it is also possible to call multiple models simultaneously, which will also affect real-time performance. Finally, in terms of the privacy and security of input images, further upgrade protection is needed.

    MSRA Veteran Goes Out

    The results of this research come from the team of Microsoft Research Asia. The corresponding author is Duan Nan.

    He is the chief researcher of MSRA, the research manager of the Natural Language Computing Group, a part-time doctoral supervisor of the University of Science and Technology of China, a part-time professor of Tianjin University, and an outstanding member of CCF.

    Mainly engaged in research on natural language processing, code intelligence, multimodal intelligence, machine reasoning, etc. He joined MSRA in 2006 and has served for more than 16 years.

    The first author, Chenfei Wu, is also a senior researcher. According to LinkedIn information, he joined Microsoft in 2012 and has worked for 11 years. He is currently a software engineer.

    Read Also: OnePlus Ace 2V Launched With Dimensity 9000, Starts at $330

    Do not forget to follow us on our Facebook group and page to keep you always aware of the latest advances, News, Updates, review, and giveaway on smartphones, tablets, gadgets, and more from the technology world of the future.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

    Related Posts

    2025 Guide: Best Free Windows Remote Desktop Download

    OpenAI has officially released GPT-5.2! More practical: Productivity increases significantly when making tables, writing PPTS, coding, etc

    DJI Neo 2 drone has updated its firmware to v01.00.0500, unlocking the control and video transmission capabilities of Apple Watch

    Leave A Reply Cancel Reply

    You must be logged in to post a comment.

    voopoo drag s3
    oxva xlim go 2
    sp2s sen x disposable vape
    jnr 100k
    • Popular
    • 3D Printer REVIEW
    • XIAOMI
    December 9, 2025

    OXVA Xlim Go vs. OXVA Xlim Go 2: Hands-On Review

    December 9, 2025

    OXVA XLIM PRO 3 VS XLIM PRO 2 Review: Hands-on Comparison

    December 9, 2025

    OXVA XLIM 3 Ultra vs. XLIM Pro 2 DNA Review: Hands-on Comparison

    December 8, 2025

    OXVA XLIM SQ PRO 2 vs XLIM SQ PRO Hands on Comparison Review

    June 23, 2024

    ACMER P2 20W Laser Engraver Fixed Focus Engraving: Hands on Review

    May 30, 2024

    xTool F1 Ultra Review: World’s First 20W Fiber & 20W Diode Laser Engraver

    May 30, 2024

    Anycubic Kobra 3 Combo Review: The Multicolor Masterpiece?

    May 15, 2024

    SCULPFUN SF-A9 40W Laser Engraver Cutting Machine: Hands On Review

    December 11, 2025

    Xiaomi 17 Ultra will officially announce next week: it will be the first to feature a brand-new Leica 1-inch main camera

    December 10, 2025

    The official version of Xiaomi Hyper OS 3 covers more models. MIX Fold 3 and others will start to be rolled out in gray scale and gradually from today

    December 9, 2025

    The Appearance Renderings of Xiaomi 17 Ultra are here: The classic large round lens returns without a back screen

    December 8, 2025

    Comeback: Xiaomi triple foldable screen was once aborted due to the overheating of the Snapdragon 8 Gen1

    fc 26 coins
    New Arrivals
    • Arc Hypertine 35K Zero Nicotine Disposable Vape Arc Hypertine 35K Zero Nicotine Disposable Vape
    • Oppo Reno 15c Oppo Reno 15c
    • lost vape le bar 40k lost vape le bar 40k
    • RAZ RX 50K DISPOSABLE RAZ RX 50K DISPOSABLE
    • Xiaomi Poco C85 Xiaomi Poco C85
    • Innokin Foretex Pod System Kit Innokin Foretex Pod System Kit
    • Geekvape Aegis Mini 5 Vape Mod Kit Geekvape Aegis Mini 5 Vape Mod Kit
    • Freemax Albar MX 20K Disposable Vape Freemax Albar MX 20K Disposable Vape
    • Innokin EZ Leva Pod System Kit with PCC 2400mAh Innokin EZ Leva Pod System Kit with PCC 2400mAh
    About
  • Igeekphone.com provides the first global tech news and reviews about smartphone, vapes, e-cigar, smart home, 3D printers, e-bike,tablets, RC drones, VR headset, and other accessories. It's the best platform to improve your brand and product.
  • Contact us: info@igeekphone.com
  • Check Our Privacy Policy Here.
  • Note: *Right now we have US editor and EU editors for review, especially for Amazon US and EU.
  • *Shop and Compare Price Here*
  • Facebook
  • Youtube
  • OUR BEST VAPE PARTNERS
  • VAPE ONLINE STORE
  • HAYATI PRO MAX PLUS
  • VAPORESSO
  • VOOPOO
  • OXVA
  • NEXA
  • MASKKING
  • LOSTVAPE ORIONBAR
  • VEIIK
  • MEMERS
  • TODOO
  • SP2S
  • JNR
  • OTHER BEST PARTNERS
  • SVBONY
  • Chuwi
  • Blackview
  • Fossibot
  • Unihertz
  • Flsun
  • Anycubic
  • Xtool
  • Oukitel
  • Mukkpet Ebike
  • Ugreen
  • Copyright © 2025 igeekphone

    Type above and press Enter to search. Press Esc to cancel.