Style of Studio Ghibli π Flux.1-D

Create Viral Videos with Style of Studio Ghibli π Flux.1-D
Transform your social media presence with Style of Studio Ghibli π Flux.1-D on ClipDreamer. This powerful model is perfect for generating unique content that will make your videos stand out. Simply set your creative direction, and let ClipDreamer handle the rest.
Why Use Style of Studio Ghibli π Flux.1-D for Your Videos?
- Consistent style that matches Style of Studio Ghibli π Flux.1-D's unique aesthetic
- High-quality visuals that capture viewers' attention
- Perfect for creating viral-worthy short-form content
- Seamless integration with ClipDreamer's video generation pipeline
Get started with Style of Studio Ghibli π Flux.1-D today and transform your creative vision into engaging videos that your audience will love to watch and share.
See Style of Studio Ghibli π Flux.1-D in Action
Watch how Style of Studio Ghibli π Flux.1-D transforms basic prompts into engaging video content. Our AI-powered platform makes it easy to create professional-quality videos in minutes.
Creator Description
Overview There's no need to introduce Studio Ghibli and its world-famous art style. There are already some great Flux models that reproduce this style (I especially love and recommend this one ). Here is my attempt to create a Ghibli-style LoRA. Although I didn't succeeded in making the best Ghibli LoRA ever as I planned originally π, the result still isn't too bad (sometimes). However, there is definitely room for improvement. I'm not satisfied with the current anatomy error rate, and I'm working on the next version, and though I can't provide a precise timeline, it will definitely be updated and improved. Usage The trigger phrase is " In style of Studio Ghibli " But it also works without any trigger words, although I didn't extensively test the effects in this case. Using "anime" or "Miyazaki" also triggers style changes. Recommended inference settings are as follows: Model: flux1-dev (fp8e4m3fn) Text Encoder: t5pxxl_fp16 Sampler: euler Scheduler: 24 steps (normal) Flux Guidance: 4 LoRA Strength: 1 All images that I am publishing in the gallery are created using simple text-to-image (without inpainting, ControlNet, upscaling, etc.), for the purpose of showcasing the raw capabilities (as well as limitations and weak points) of the model. Training The LoRA was fine-tuned on a single RTX 3090 with 954 high-quality images (1080p resolution) sourced from the official Ghibli site . The images were captioned using Joy Caption Pre Alpha (later versions didn't exist at the time), hosted locally. The Joy Caption prompt used was: "A descriptive caption for this image:\n". All captions were prefixed with the phrase "In style of Studio Ghibli." Then captions were manually reviewed: I corrected errors (there were a lot...), added some missing details, etc. Characters or locations were not tagged. Images from specific films were given an additional caption: "Scene from '...' film" (see below for details). Out of the 954 images, there were the following: 50 images from "NausicaΓ€ of the Valley of the Wind" - additionally prefixed with "Scene from 'Nausicaa' film." 50 images from "Castle in the Sky" - additionally prefixed with "Scene from 'Laputa' film." 50 images from "My Neighbor Totoro" - additionally prefixed with "Scene from 'Totoro' film." 50 images from "Kiki's Delivery Service" - additionally prefixed with "Scene from 'Kiki's Delivery Service' film." 50 images from "Only Yesterday" - additionally prefixed with "Scene from 'Only Yesterday' film." 50 images from "Porco Rosso" - additionally prefixed with "Scene from 'Porco Rosso' film." 50 images from "Ocean Waves" - additionally prefixed with "Scene from 'Ocean Waves' film." 50 images from "Pom Poko" - additionally prefixed with "Scene from 'Pom Poko' film." 28 images from "On Your Mark" - additionally prefixed with "Scene from 'On Your Mark' film." 50 images from "Whisper of the Heart" - additionally prefixed with "Scene from 'Whisper Of The Heart' film." 50 images from "Princess Mononoke" - additionally prefixed with "Scene from 'Mononoke' film." 50 images from "Spirited Away" - additionally prefixed with "Scene from 'Spirited Away' film." 50 images from "Howl's Moving Castle" - additionally prefixed with "Scene from 'Howl's Moving Castle' film." 50 images from "Tales from Earthsea" - additionally prefixed with "Scene from 'Earthsea' film." 50 images from "Ponyo on the Cliff by the Sea" - additionally prefixed with "Scene from 'Ponyo' film." 50 images from "Arrietty" - additionally prefixed with "Scene from 'Arrietty' film." 50 images from "From Up on Poppy Hill" - additionally prefixed with "Scene from 'Poppy Hill' film." 50 images from "The Wind Rises" - additionally prefixed with "Scene from 'Wind Rises' film." 50 images from "When Marnie Was There" - additionally prefixed with "Scene from 'Marnie' film." 26 images from "The Boy and the Heron" - additionally prefixed with "Scene from 'The Boy And The Heron' film." I plan to reconsider the structure of the dataset and recapture it from scratch for version 0.2. LoRA training ran for 26000 steps (weights were saved every 250 steps). At this point, the model stopped improving, and anatomy errors (like phantom limbs) became noticeable. Then I spent several days selecting the best LoRA version. My goal was to find the perfect balance between style, variety, and minimizing errors. I mostly tested using long, complex prompts with multiple characters and detailed interactions - ones that were likely to fail - and observed which LoRA failed less π€. I deliberately (and wrongly π ) didn't automate my testing and relied on a "click-wait-n-hate" pipeline. Eventually, I ended up on the model at 16250 steps . The 6000 and 9000-step LoRAs were not bad too, but the 16250-step one felt more "mature", "vintage" and "diverse" (I didn't want to get "too cozy" Ghibli LoRA). Just for reference, here is comparison ( of how LoRAs at different steps performed (at the same seed). The prompt was: "In style of Studio Ghibli. Scene from 'Totoro' film. This image is a digitally created scene from a Japanese animated film. The scene features three characters: two young girls and an elderly woman, sitting on a woven mat under a large tree with dense foliage. The background is lush with greenery, including tall trees and vibrant flowers, creating a serene, natural setting. One girl, who appears to be about four years old, wears a yellow dress with white accents and has pigtails tied with red ribbons. She holding a corn cob and smiling happily. Another girl, slightly older, in a white shirt and blue shorts, sits beside her to the left. She has dark hair and a calm expression. The elderly woman, seated to the right, wears a traditional Japanese kimono with a lavender pattern. She has white hair and a gentle smile, holding a bunch of leafy greens. In front of them, on the woven mat, are various vegetables like carrots, tomatoes, and cucumbers, arranged in a basket. The scene exudes a sense of peaceful coexistence with nature, emphasizing simplicity and harmony." After testing, I realized I made a lot of mistakes πΆ, including: - I left too much unnecessary clutter in the captions generated by Joy Caption (like "This image is a digitally created scene...", "The scene exudes a sense of peaceful..." etc.) I think CogVLM2 or Qwen2 might be better suited for captioning images for a style LoRA, but more testing is needed. (But I still believe captioning with complex natural prompts is better for style LoRAs.) - I wanted to make the LoRA a bit unpredictable and varied, perhaps even 'weird,' so it can generate images with high artistic variety. I kind of succeeded, but I feel that this weirdness sometimes may negatively affect coherence and anatomy (mangled hands, etc.). I thought this might be due to overfitting, but even lower-step LoRAs (6000-9000 steps) had these errors. Maybe I shouldn't have included "Pom Poko" screencaps into the dataset (even though I carefully checked captions to prevent feature blending of human and supernatural creatures). - I should and will explore other trainers besides AI-Toolkit. While it "just works" and produces great models, relying solely on it may be due to status quo bias. - And even if it is not, sticking to default settings might not have been the best choice anyway. The LoRA was trained in Windows 11 using AI-Toolkit with the following hyperparameters (actually, all defaults except for the resolution): Rank: 32 Alpha: 32 Batch Size: 1 Steps: 16250 Learning Rate: 1e-4 Save every: 250 Resolution: 1024, 768 Optimizer: adamw8bit Thanks for using this LoRA or for reading this text to the end! As mentioned in the beginning, I hope to improve this model as I gain more experience with fine-tuning Flux.