Spaces:

llaa33219
/

train3

Paused

App Files Files Community

train3 / README.md

llaa33219

Upload 4 files

6d15327 verified 7 months ago

preview code

raw

history blame contribute delete

3.1 kB

	---
	title: CoDA Fine-tuning
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	hf_oauth: true
	hf_oauth_scopes:
	- read-repos
	- write-repos
	---

	# CoDA Model Fine-tuning Space

	This Space allows you to fine-tune the Salesforce/CoDA-v0-Instruct text generation diffusion model on the baseten-admin/gpt-oss120b-generated-perfectblend dataset.

	## Features

	- 🎯 Full Fine-tuning: Complete parameter fine-tuning (not LoRA)
	- 💬 ChatML Format: Processes conversation data with question-answer pairs
	- 🔄 Auto Upload: Automatically uploads trained model to your Hugging Face account
	- 📊 Progress Tracking: Real-time training progress updates
	- 🔐 OAuth Integration: Secure authentication via Hugging Face login

	## How to Use

	1. Login: Click the "Sign in with Hugging Face" button
	2. Configure: Adjust training parameters (epochs, batch size, learning rate)
	3. Train: Click "Start Training" (requires GPU - upgrade Space to GPU tier)
	4. Resume: If training is interrupted, check "Resume from last checkpoint" and restart
	5. Upload: After training completes, click "Upload to Hugging Face Hub"

	### Persistence

	This Space supports checkpoint persistence:
	- Training checkpoints are saved every 500 steps
	- If interrupted, you can resume from the last checkpoint
	- For Docker deployment: Mount `/data` volume for full persistence
	- On Spaces: Checkpoints persist within the same session and across rebuilds if using persistent storage tier

	## Requirements

	- Hardware: GPU (T4, A10G, or better) strongly recommended
	- Account: Hugging Face account with write permissions
	- Time: Training takes several hours depending on configuration

	## About the Model

	CoDA (Code Diffusion with Autoregressive) is a 1.7B parameter bidirectional diffusion model developed by Salesforce AI Research. Unlike traditional autoregressive models, CoDA uses discrete denoising for text generation. The Instruct version is pre-tuned for instruction following, making it ideal for fine-tuning on conversational data.

	### Model Configuration

	```json
	{
	"architectures": ["CoDALanguageModel"],
	"hidden_size": 2048,
	"num_hidden_layers": 28,
	"num_attention_heads": 16,
	"vocab_size": 151936,
	"max_position_embeddings": 40960
	}
	```

	## Dataset

	The training uses the baseten-admin/gpt-oss120b-generated-perfectblend dataset:
	- Format: Conversational data in ChatML format
	- Column: `conversations` (list of role-content pairs)
	- Split: Uses `train` split with 90/10 train/eval split

	## Training Details

	- Optimizer: AdamW
	- Precision: FP16 (on GPU)
	- Gradient Accumulation: 4 steps
	- Gradient Checkpointing: Enabled for memory efficiency
	- Max Sequence Length: 2048 tokens

	## Citation

	If you use this Space or the CoDA model, please cite:

	```bibtex
	@article{coda2023,
	title={CoDA: Bidirectional Code Diffusion},
	author={Salesforce AI Research},
	journal={arXiv preprint},
	year={2023}
	}
	```

	## License

	Apache 2.0