| --- |
| title: CoDA Fine-tuning |
| emoji: π |
| colorFrom: blue |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 4.44.0 |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| hf_oauth: true |
| hf_oauth_scopes: |
| - read-repos |
| - write-repos |
| --- |
| |
| # CoDA Model Fine-tuning Space |
|
|
| This Space allows you to fine-tune the **Salesforce/CoDA-v0-Instruct** text generation diffusion model on the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset. |
|
|
| ## Features |
|
|
| - π― **Full Fine-tuning**: Complete parameter fine-tuning (not LoRA) |
| - π¬ **ChatML Format**: Processes conversation data with question-answer pairs |
| - π **Auto Upload**: Automatically uploads trained model to your Hugging Face account |
| - π **Progress Tracking**: Real-time training progress updates |
| - π **OAuth Integration**: Secure authentication via Hugging Face login |
|
|
| ## How to Use |
|
|
| 1. **Login**: Click the "Sign in with Hugging Face" button |
| 2. **Configure**: Adjust training parameters (epochs, batch size, learning rate) |
| 3. **Train**: Click "Start Training" (requires GPU - upgrade Space to GPU tier) |
| 4. **Resume**: If training is interrupted, check "Resume from last checkpoint" and restart |
| 5. **Upload**: After training completes, click "Upload to Hugging Face Hub" |
|
|
| ### Persistence |
|
|
| This Space supports checkpoint persistence: |
| - Training checkpoints are saved every 500 steps |
| - If interrupted, you can resume from the last checkpoint |
| - For Docker deployment: Mount `/data` volume for full persistence |
| - On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier |
|
|
| ## Requirements |
|
|
| - **Hardware**: GPU (T4, A10G, or better) strongly recommended |
| - **Account**: Hugging Face account with write permissions |
| - **Time**: Training takes several hours depending on configuration |
|
|
| ## About the Model |
|
|
| **CoDA (Code Diffusion with Autoregressive)** is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data. |
|
|
| ### Model Configuration |
|
|
| ```json |
| { |
| "architectures": ["CoDALanguageModel"], |
| "hidden_size": 2048, |
| "num_hidden_layers": 28, |
| "num_attention_heads": 16, |
| "vocab_size": 151936, |
| "max_position_embeddings": 40960 |
| } |
| ``` |
|
|
| ## Dataset |
|
|
| The training uses the **baseten-admin/gpt-oss120b-generated-perfectblend** dataset: |
| - **Format**: Conversational data in ChatML format |
| - **Column**: `conversations` (list of role-content pairs) |
| - **Split**: Uses `train` split with 90/10 train/eval split |
|
|
| ## Training Details |
|
|
| - **Optimizer**: AdamW |
| - **Precision**: FP16 (on GPU) |
| - **Gradient Accumulation**: 4 steps |
| - **Gradient Checkpointing**: Enabled for memory efficiency |
| - **Max Sequence Length**: 2048 tokens |
|
|
| ## Citation |
|
|
| If you use this Space or the CoDA model, please cite: |
|
|
| ```bibtex |
| @article{coda2023, |
| title={CoDA: Bidirectional Code Diffusion}, |
| author={Salesforce AI Research}, |
| journal={arXiv preprint}, |
| year={2023} |
| } |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|