--- license: apache-2.0 language: - en base_model: - laituan245/molt5-base tags: - chemistry --- **EMNLP 2025 main** "Bridging the Gap Between Molecule and Textual Descriptions via Substructure-aware Alignment" [GitHub](https://github.com/Park-ing-lot/MolBridge) [Paper](https://arxiv.org/abs/2510.26157) MolBridge-Gen model before training on the ChEBI-20 dataset. ```python from transformers import AutoTokenizer, T5ForConditionalGeneration tokenizer = AutoTokenizer.from_pretrained("laituan245/molt5-base", model_max_length=512) model = T5ForConditionalGeneration.from_pretrained('PhTae/MolBridge-Gen-Base') canonicalized_smiles = 'CC(=O)N[C@@H](CCCN=C(N)N)C(=O)[O-]' canonicalized_smiles = 'Provide a whole descriptions of this molecule: ' + canonicalized_smiles token = tokenizer(canonicalized_smiles, return_tensors='pt', padding='longest', truncation=True) gen_results = model.generate(input_ids=token['input_ids'], attention_mask=token['attention_mask'], num_beams=5, max_new_tokens=512) gen_results = tokenizer.decode(gen_results[0], skip_special_tokens=True) print(gen_results) ```