Edit and enhance images based on descriptive instructions
Generate audio responses from text or audio