Paul S PRO
SuperPauly
AI & ML interests
None yet
Recent Activity
liked a model 2 days ago
talkie-lm/talkie-1930-13b-it liked a model 2 days ago
openai/privacy-filter liked a model 4 days ago
BidirLM/BidirLM-Omni-2.5B-EmbeddingOrganizations
None yet
Evaluation Methods & Metrics
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 121 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 96 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 88
Py
Demixing Models & Datasets
-
Moisesdb: A dataset for source separation beyond 4-stems
Paper • 2307.15913 • Published -
Music Source Separation with Band-Split RoPE Transformer
Paper • 2309.02612 • Published • 1 -
Hybrid Transformers for Music Source Separation
Paper • 2211.08553 • Published • 1 -
nvidia/RE-USE
Audio-to-Audio • Updated • 6.94k • 65
Agent Loops, Character, Work Ethics & Behavior
-
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing
Paper • 2512.23611 • Published • 7 -
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Paper • 2512.22087 • Published • 4 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper • 2407.17789 • Published • 41
Sample Upscaling & Denoising.
Demixing Models & Datasets
-
Moisesdb: A dataset for source separation beyond 4-stems
Paper • 2307.15913 • Published -
Music Source Separation with Band-Split RoPE Transformer
Paper • 2309.02612 • Published • 1 -
Hybrid Transformers for Music Source Separation
Paper • 2211.08553 • Published • 1 -
nvidia/RE-USE
Audio-to-Audio • Updated • 6.94k • 65
Evaluation Methods & Metrics
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 121 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 96 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 88
Agent Loops, Character, Work Ethics & Behavior
-
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing
Paper • 2512.23611 • Published • 7 -
Context as a Tool: Context Management for Long-Horizon SWE-Agents
Paper • 2512.22087 • Published • 4 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 61 -
Very Large-Scale Multi-Agent Simulation in AgentScope
Paper • 2407.17789 • Published • 41
Py