Boosting Open-Source Unified Multimodal Understanding and Generation
Boogu-Image-0.1 is a strongly competitive Apache-2.0 open-source unified image generation and editing model family, including Base, Turbo, Edit, and other variants that provide stable, practical capabilities for high-quality text-to-image generation, fast generation, image editing, and Chinese-English text rendering, with performance that matches top closed-source models in many scenarios.
Closed-source multimodal understanding and generation systems like Nano Banana Pro and GPT-Image-2 achieve remarkable performance not because of a single model, but through a highly unified suite of system capabilities. However, under training compute that is very limited compared with closed-source systems, we find that systematically improving a model's understanding ability, data quality, and training pipeline can still significantly improve image generation and editing performance. Specifically, compared with some existing open-source models, our training data scale is roughly one order of magnitude smaller. We hope our empirical study and open-source release will help advance the open-source ecosystem for unified multimodal understanding and generation.
Boogu accurately understands photography prompts and generates high-quality images with natural lighting, coherent composition, and faithful details. Even in more complex real-world scenes, it preserves coherent subject, background, and spatial relationships. We want text-to-image generation to go beyond being merely “correct” and produce visuals that feel more realistic and engaging.
Boogu supports a wide range of text-heavy visual designs, from posters and stamps to documents, interfaces, brand guides, and handwritten boards. It aims for readable structure, stable typography, and robust bilingual rendering across diverse layouts.
Boogu handles diverse stylized generation scenarios. The goal is not just style transfer, but stable, attractive, and prompt-aware creative generation.
Different models have their own strengths, and it is difficult to make an objective single-answer judgment about which model is better. Even across different benchmarks, the relative performance of models is not always the same. Still, Boogu demonstrates competitive performance across many scenarios and benchmarks.
Boogu Arena. Since we could not evaluate on LM Arena directly, we created Boogu Arena. The leaderboard below reports Arena-style preference results across leading closed-source and open-source image generation systems. Across all evaluated models, the Boogu-Image-0.1 family ranks among the very top. We welcome teams with questions about the results to contact us so that we can work toward more objective, fair, and reproducible evaluation.
We believe evaluation of image generation systems should also take inference time into account. However, because different models run on different hardware platforms and serving environments, we do not provide a direct latency comparison here. Notably, on high-performance hardware, the raw Boogu-Image-0.1-Turbo model can run a single inference in under 1 second.
Evaluation setup. Boogu Arena follows the spirit of LM Arena-style evaluation. We use an LLM to generate a large set of diverse user personas, then ask each persona to produce a number of image generation prompts, resulting in more than 1K test prompts in total. We will release these prompts publicly for community reproduction and review.
Our result in the Boogu Arena visual comparison.
Open-source baseline from a leading text-to-image evaluation setting.
Open-source baseline from a leading arena-style setting.
Strong proprietary baseline for visual preference comparison.
Our result in the Boogu Arena visual comparison.
Open-source baseline from a leading text-to-image evaluation setting.
Open-source baseline from a leading arena-style setting.
Strong proprietary baseline for visual preference comparison.
Qwen-Image-Bench. Qwen-Image-Bench is a recently released high-quality benchmark, released after we froze our T2I training data. Compared with long-standing benchmarks, it is less affected by common issues such as data leakage, making it a useful testbed for modern image generation models. On this benchmark, Boogu-Image-0.1 achieves top-tier performance among the evaluated open-source models. Due to time constraints, the evaluation does not yet cover all available open-source baselines.
Parameter efficiency on Qwen-Image-Bench. Boogu-Image-0.1 (10B) achieves the highest final score (53.58) among the compared models, outperforming larger counterparts such as Qwen-Image-2512 (20B, 52.06) and Hunyuan-Image-3.0 (80B, 50.81). This suggests that competitive benchmark performance can be obtained without scaling to substantially larger parameter counts.
| Model | Open Source | Quality ↑ | Aesthetics ↑ | Alignment ↑ | Real-world Fidelity ↑ | Creative Generation ↑ | Overall ↑ |
|---|---|---|---|---|---|---|---|
| GPT Image 2 | Closed | 58.65 | 67.53 | 65.85 | 57.38 | 75.23 | 64.69 |
| Nano Banana 2.0 | Closed | 54.77 | 61.08 | 62.40 | 54.28 | 67.05 | 59.82 |
| GPT Image 1.5 | Closed | 55.14 | 60.88 | 61.72 | 53.95 | 66.35 | 59.65 |
| Nano Banana Pro | Closed | 55.67 | 60.26 | 61.25 | 54.07 | 66.23 | 59.45 |
| Qwen Image 2.0 Pro | Closed | 54.39 | 58.67 | 59.28 | 51.83 | 64.94 | 57.84 |
| Seedream 5.0 | Closed | 52.55 | 58.40 | 58.90 | 51.92 | 65.29 | 57.22 |
| Seedream 4.5 | Closed | 54.41 | 58.72 | 57.31 | 51.69 | 60.64 | 56.78 |
| Seedream 4.0 | Closed | 54.01 | 58.81 | 56.64 | 51.05 | 58.15 | 56.21 |
| FLUX 2 Max | Closed | 53.64 | 56.85 | 57.35 | 49.35 | 56.50 | 55.33 |
| FLUX 2 Pro | Closed | 52.30 | 56.94 | 57.01 | 47.29 | 56.18 | 54.57 |
| GPT Image 1 | Closed | 52.34 | 55.09 | 56.28 | 48.14 | 55.78 | 54.07 |
| Boogu-Image-0.1 | Apache-2.0 | 51.19 | 55.42 | 55.78 | 48.01 | 55.55 | 53.58 |
| Qwen Image 2512 | Apache-2.0 | 51.76 | 54.74 | 52.72 | 47.00 | 50.19 | 52.06 |
| Imagen 4.0 Ultra | Closed | 50.90 | 54.25 | 54.02 | 45.59 | 51.14 | 51.99 |
| HunyuanImage 3.0 | Other | 50.35 | 53.57 | 52.00 | 44.31 | 49.12 | 50.81 |
| Imagen 4.0 | Closed | 50.16 | 52.68 | 51.64 | 44.84 | 47.94 | 50.29 |
| Qwen Image | Apache-2.0 | 48.44 | 52.25 | 50.72 | 43.16 | 47.30 | 49.23 |
| Kling Image 2.1 | Closed | 49.11 | 50.15 | 49.18 | 44.74 | 44.67 | 48.26 |
| GLM Image | Apache-2.0 | 49.26 | 50.64 | 47.90 | 44.69 | 45.23 | 48.19 |
About ImgEdit. We include ImgEdit_O as a supplementary reference. In our observations, this benchmark does not always align well with human visual judgment and has limited coverage of In-Context Generation scenarios. As a result, it may not fully reflect the real user experience of current image editing models and may underestimate the performance of some closed-source models in interactive use cases. Whether ImgEdit should be used as a primary benchmark going forward should therefore be considered carefully; the results are kept here mainly for comparison with prior work.
| Model | Open Source | ImgEdit_O ↑ |
|---|---|---|
| Boogu-Image-0.1-Edit | ✓ | 4.64 |
| JoyAI | ✓ | 4.57 |
| FireRed-Image-Edit | ✓ | 4.56 |
| Qwen-Image-Edit-2511 | ✓ | 4.51 |
| LongCat-Image-Edit | ✓ | 4.50 |
| Nano Banana Pro | ✗ | 4.37 |
| FLUX.2 [Dev] | ✓ | 4.35 |
| Seedream 4.5 | ✗ | 4.32 |
| Qwen-Image-Edit-2509 | ✓ | 4.31 |
| Seedream 4.0 | ✗ | 4.30 |
| Nano Banana | ✗ | 4.29 |
| Step1X-Edit-v1.2 | ✓ | 3.95 |
The Boogu-Image-0.1 family offers a full suite of models covering generation, editing, and versatile foundation use cases. We look forward to growing the family together with the open-source community.
Our report focuses on a set of practical observations that are already familiar to strong image-generation teams, yet are still under-discussed in public technical reports.
The full technical report is coming soon.