Merci d'avoir envoyé votre demande ! Un membre de notre équipe vous contactera sous peu.
Merci d'avoir envoyé votre réservation ! Un membre de notre équipe vous contactera sous peu.
Plan du cours
Introduction to Gemini 3 Multimodality
- Capabilities across text, images, audio, and video
- Model selection and endpoint overview
- Key concepts in multimodal reasoning
Working with Text and Structured Inputs
- Prompting strategies for text generation
- Metadata, context windows, and embeddings
- Text-based orchestration of multimodal tasks
Image Understanding and Visual Workflows
- Image analysis and interpretation with Gemini 3
- Creating visual search and tagging tools
- Building image-to-text and text-to-image interactions
Audio Input Processing
- Speech recognition and transcription workflows
- Audio event detection and interpretation
- Integrating audio with text and visual inputs
Video Intelligence and Scene Analysis
- Frame-by-frame and continuous video reasoning
- Building summarization and highlight extraction tools
- Video-based automation and content workflows
Designing Multimodal Application Architectures
- Combining multiple input types in a single pipeline
- Latency, cost, and computational considerations
- Best practices for scalable multimodal systems
Prototyping Multimodal Applications
- Hands-on creation of multimodal prototypes
- Rapid iteration with prompt engineering
- Testing and refining user experience flows
Deploying Multimodal Solutions
- Deployment strategies and environment setup
- Monitoring real-world performance
- Security and compliance considerations
Summary and Next Steps
Pré requis
- An understanding of modern AI concepts
- Experience with Python or JavaScript
- Familiarity with REST APIs
Audience
- Designers
- Content creators
- Technical product teams
14 Heures
Nos clients témoignent (1)
Fluidez, ambiance et sujet de la présentation
Lukasz Kowalczyk - Allegro Sp. z o.o.
Formation - Google Gemini AI for Data Analysis
Traduction automatique