Human–AI Co-Creation in Melody Writing

Role: Principal Investigator
Tools: Music Transformer, Anticipatory Transformer (Stanford CRFM), Python (TensorFlow, PyTorch, Jupyter), HuggingFace, pretty_midi, FluidSynth, GitHub, MIDI evaluation tools, DAW
Status: In progress. Targeting submission to ISMIR 2026

Abstract

This project investigates real-time human–AI co-creation in symbolic melody composition, focusing on how interactive, loop-based workflows influence creative flow, stylistic variety, and perceptions of authorship. Building on advances in symbolic sequence modeling such as the Music Transformer (Huang et al., 2018) and more recent anticipatory architectures (Stanford CRFM), the system enables musicians to iteratively guide AI-generated melody continuations without requiring programming expertise. Rather than treating AI as a passive generator, the prototype functions as an anticipatory musical partner, responding to both recent material and projected future structure. Pilot sessions suggest that this co-creation loop fosters faster ideation, broader stylistic exploration, and a stronger sense of creative agency than AI-only or human-only approaches.

Research Focus

The aim is to understand how artificial intelligence can function as a collaborative partner in melody writing. Specifically, the project explores the co-creative dynamic between human musical intuition and AI-generated suggestion, emphasizing systems that support musical flow while preserving stylistic intent. This work draws on the Music Technology Group’s principles of human-centered music system design (Serra, 2011) and incorporates anticipatory modeling techniques to deepen interactive potential.

Goals

Prototype a loop-based melody co-writing system using symbolic generation, playback, and user steering
Examine how structured back-and-forth with an anticipatory AI model affects musical flow, authorship perception, and stylistic iteration
Lay the technical foundation for future musician-friendly tools that support creative control without requiring programming expertise

Methods

A co-creation environment was built using a modified Music Transformer pipeline and Stanford’s music-small-800kanticipatory model. The system extends seed melodies by predicting continuations that align not only with past content but with plausible future phrasing. Implemented using PyTorch and HuggingFace Transformers, the Python-based interface includes seed-based generation, accept/reject loop regeneration, and real-time playback using FluidSynth for audio rendering.

The prototype (run_anticipation.py) allows musicians to:

Input a seed melody (MIDI)
Generate multiple AI continuations
Accept, reject, or regenerate suggestions
Audition and save results in audio or MIDI format

Pilot testing planned following a within-subjects design. Three trained musicians will completed identical 4-bar melody prompts under three conditions:

Human-only composition
AI-only generation
Human–AI co-creation using the prototype

Key Challenges

Capturing and preserving musical intent through symbolic prompts that translate human expression into machine-readable form, especially in melodic phrasing and stylistic nuance
Balancing the anticipatory model’s autonomy with the musician’s creative control, allowing space for surprise while supporting intentional direction
Develop a working prototype that enables symbolic co-creation, with future research planned for an accessible interface to support musician-friendly playback, iteration, and regeneration.
Optimize the prototype for local CPU-based performance and reproducible results, while supporting responsive auditioning of AI-generated continuations

Exploratory Pilot Evaluation and Early Results

Exploratory pilot sessions with three trained musicians suggest that the co-creative prototype encourages musical exploration and supports early signs of creative flow. Participants described the system as “responsive,” “suggestive,” and “generative without taking over.” At times, however, the AI-generated continuations strayed from the user’s original intent, occasionally obscuring the musical identity of the seed. These informal impressions are consistent with Louie et al. (2020), who emphasize the importance of iterative human–AI loops in preserving creative agency. Although comparative testing is still underway, preliminary outputs showed varied pitch-class use and phrasing variety, echoing Serra’s (2011) call for expressive, user-guided systems in music technology.

Artifacts

GitHub Repository: https://github.com/jaredwatkins/music-transformer-ai-writer

Research Context and Contribution

This project contributes to ongoing research in symbolic music generation and interactive AI systems. By centering on co-creative workflows and user-guided iteration, it complements prior work in generative modeling while emphasizing human-centered design. The primary contribution is an interactive symbolic melody co-writing framework that operationalizes human–AI back-and-forth as a measurable design variable in co-creative music systems, aligning with the evaluation and design principles championed by Serra (2011).

Next Steps

Future work will expand testing to a larger participant pool, incorporate non-musicians to assess accessibility, and conduct computational analysis of melodic structure. Metrics such as pitch-class variety, interval n-grams, and rhythmic variance will be applied to quantify stylistic differences across conditions.

Research Trajectory

This project forms the foundation of a longer-term research trajectory.

Phase 1 (Current Work):

Build a symbolic melody co-creation prototype
Conduct pilot sessions with trained musicians
Generate initial findings on creative flow, authorship, and stylistic variation

Phase 2:

Expand participant variety across genres and skill levels
Refine interface to include user-controlled musical parameters
Apply computational metrics to analyze melodic structure

Phase 3:

Evaluate system in real-time performance and collaborative contexts
Develop and validate an evaluation framework for co-creative systems
Publish findings on human–AI interaction and design methodology in music generation

References

Huang, C. A., Vaswani, A., Uszkoreit, J., Shazeer, N., Simon, I., Hawthorne, C., … & Eck, D. (2018). Music Transformer: Generating music with long-term structure. International Conference on Learning Representations (ICLR).
PDF: https://arxiv.org/abs/1809.04281

Louie, R., Coenen, A., Huang, C. Z. A., Terry, M., & Carter, S. (2020). Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems(pp. 1–13). Association for Computing Machinery. https://doi.org/10.1145/3313831.3376739

Serra, X. (2011). A Multicultural Approach in Music Information Research. Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR).
PDF: https://ismir2011.ismir.net/papers/OS2-1.pdf

Appendix A: Exploratory Pilot Summary

An initial exploratory pilot was conducted to gather early feedback on the co-creation tool and to begin observing how trained musicians interact with symbolic melody generation in practice.

Participants and Procedure

Three trained musicians were asked to compose a short, original 2-bar melody and input it into the prototype co-creation tool. The tool then generated a continuation using the adapted Music Transformer model. Participants listened to and reflected on the AI-generated output but did not complete comparative conditions (e.g., human-only or AI-only workflows).

Observations and Preliminary Takeaways

Participants noted that the tool produced outputs that were stylistically aligned with their seed melodies but often introduced surprising phrasings.
Informal comments suggested that using the tool sparked new compositional ideas and led to unexpected variations.
The musicians engaged with the playback loop to explore multiple outputs, suggesting early signs of creative flow.

These takeaways are informal and not derived from a structured evaluation. They will inform the next phase of pilot testing.

Appendix B: Planned Pilot Test Design

A formal pilot will follow a within-subjects design, enabling structured comparison across three workflow conditions. Each participant will be given the same 4-bar melody prompt and asked to complete the following tasks:

Condition 1: Human-only Composition

The participant expands on the 4-bar prompt using their own compositional process, with no AI involvement.
Output: A completed melody created entirely by the human.

Condition 2: AI-only Generation

The 4-bar prompt is input directly into the Music Transformer model without any changes or human intervention.
Output: A melody continuation generated entirely by the AI.

Condition 3: Human–AI Co-Creation (Iterative)

The participant modifies or elaborates on the 4-bar prompt and submits this version to the co-creation tool. The AI returns a continuation, which the participant may then revise and resubmit. This back-and-forth loop continues for up to three iterations, simulating an interactive co-compositional workflow.
Output: A melody shaped through iterative exchanges between human and AI contributions.

The outputs and participant reflections from each condition will be compared to assess differences in creative flow, perceived authorship, and stylistic variation. The structured comparison will be supported by a post-session questionnaire and brief reflections gathered after each iteration in Condition 3.

This structure aligns with the project’s Phase 1 goals and is designed to evaluate not just the quality of musical output, but also the collaborative process of co-creation. It lays the groundwork for more advanced studies of human–AI musical interaction in later phases.