Mindscape-Aware RAG (MiA-RAG)

This repository provides the inference implementation for MiA-Emb (Mindscape-Aware Embedding), the retriever component in the MiA-RAG framework.

MiA-RAG introduces explicit global context awareness via a Mindscape—a document-level semantic scaffold constructed by hierarchical summarization. By conditioning both retrieval and generation on the same Mindscape, MiA-RAG enables globally grounded retrieval and more coherent long-context reasoning.

✨ Key Features

Mindscape as Global Semantic Scaffold
Builds a Mindscape through hierarchical bottom-up summarization (chunk summaries → global summary) and uses it as persistent global memory.
Mindscape-Aware Capabilities
Supports the three core benefits for long-context understanding:
- Enriched Understanding: fill in missing context and resolve underspecified meanings
- Selective Retrieval: bias retrieval toward the active topic’s semantic frame
- Integrative Reasoning: interpret retrieved evidence within a coherent global context
Dual-Granularity Retrieval
- Chunk Retrieval for narrative passages (standard RAG)
- Node Retrieval for knowledge graph entities (GraphRAG-style)
State-of-the-Art Retrieval Performance
Strong results on long-context benchmarks such as NarrativeQA and DetectiveQA, outperforming strong baselines including Qwen3-Embedding and SitEmb.

🚀 Usage

Installation

pip install torch transformers>=4.53.0

1) Initialization

MiA-Emb-0.6B is initialized from Qwen3-Embedding-0.6B.

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

# Configuration

device = "cuda" if torch.cuda.is_available() else "cpu"

# Inference Parameters
residual = True         # Enable residual connection logic
residual_factor = 0.5   # Balance between local and global
node_delimiter = "<|repo_name|>"  # Special token for Node tasks

# Load Tokenizer (base)
tokenizer = AutoTokenizer.from_pretrained(
    "Qwen/Qwen3-Embedding-0.6B",
    trust_remote_code=True,
    padding_side="left"
)

# Load Model
model = AutoModel.from_pretrained(
    "MindscapeRAG/MiA-Emb-0.6B",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map={"": 0}
)

2) Chunk Retrieval

Use this mode to retrieve narrative text chunks. A Global Summary is injected into the prompt as the “Mindscape”.

def get_query_prompt(query, summary="", residual=False):
    """Construct input prompt with global summary (Eq. 5 in paper)."""
    task_desc = "Given a search query with the book's summary, retrieve relevant chunks or helpful entities summaries from the given context that answer the query"
    summary_prefix = "\n\nHere is the summary providing possibly useful global information. Please encode the query based on the summary:\n"

    # Insert PAD token to capture residual embedding before the summary
    middle_token = tokenizer.pad_token if residual else ""

    return (
        f"Instruct: {task_desc}\n"
        f"Query: {query}{middle_token}{summary_prefix}{summary}{node_delimiter}"
    )

def encode_chunk(texts, is_query=False, residual=False):
    batch = tokenizer(
        texts,
        max_length=4096,
        padding=True,
        truncation=True,
        return_tensors="pt"
    ).to(model.device)

    outputs = model(**batch)

    # 1) Main Embedding (Last Token)
    emb_main = last_token_pool(outputs.last_hidden_state, batch["attention_mask"])

    # 2) Residual Embedding (PAD Token)
    emb_res = None
    if residual and is_query:
        emb_res = extract_residual_token(outputs, batch, tokenizer.pad_token_id)

    emb_main = F.normalize(emb_main, p=2, dim=-1)
    emb_res = F.normalize(emb_res, p=2, dim=-1) if emb_res is not None else None
    return emb_main, emb_res


# --- Example ---
query = "Who is the protagonist?"
global_summ = "A summary of the entire book..."
chunk = "Harry looked at the scar on his forehead."

# Encode
q_emb, q_res = encode_chunk(
    [get_query_prompt(query, global_summ, residual=True)],
    is_query=True,
    residual=True
)
c_emb, _ = encode_chunk([chunk], is_query=False)

# Score Fusion
score = q_emb @ c_emb.T
if q_res is not None:
    score = (1 - residual_factor) * score + residual_factor * (q_res @ c_emb.T)

print(f"Chunk Similarity: {score.item():.4f}")

3) Node Retrieval

MiA-Emb can retrieve knowledge graph entities (Nodes). This mode extracts embeddings from the <|repo_name|> token position.

Candidate format: Entity Name : Entity Description

Example: Mary Campbell Smith : Mary Campbell Smith is mentioned as the translator...

def encode_node_query(texts, residual=True, node_delimiter="<|repo_name|>"):
    batch = tokenizer(texts, padding=True, return_tensors="pt").to(model.device)
    outputs = model(**batch)

    # 1) Node Main Embedding: extract from <|repo_name|> position
    node_id = tokenizer.encode(node_delimiter, add_special_tokens=False)[0]
    q_emb_node = extract_specific_token(outputs, batch, node_id)

    # 2) Residual Embedding: extract from [PAD] position
    q_emb_res = extract_residual_token(outputs, batch, tokenizer.pad_token_id) if residual else None

    q_emb_node = F.normalize(q_emb_node, p=2, dim=-1)
    q_emb_res = F.normalize(q_emb_res, p=2, dim=-1) if q_emb_res is not None else None
    return q_emb_node, q_emb_res


# --- Example ---
query = "Who is the protagonist?"
global_summ = "A summary of the entire book..."

# 1) Encode Query (Node Token)
q_emb_node, q_emb_res = encode_node_query(
    [get_query_prompt(query, global_summ, residual=True)],
    residual=True
)

# 2) Encode Entity Candidate
entity_text = "Harry Potter : The main protagonist of the series..."
n_emb, _ = encode_chunk([entity_text], is_query=False)

# 3) Score Fusion
final_score = (1 - residual_factor) * (q_emb_node @ n_emb.T)
if q_emb_res is not None:
    final_score = final_score + residual_factor * (q_emb_res @ n_emb.T)

print(f"Node Similarity: {final_score.item():.4f}")

📜 Citation

If you find this work useful, please cite:

@misc{li2025mindscapeawareretrievalaugmentedgeneration,
      title={Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding}, 
      author={Yuqing Li and Jiangnan Li and Zheng Lin and Ziyan Zhou and Junjie Wu and Weiping Wang and Jie Zhou and Mo Yu},
      year={2025},
      eprint={2512.17220},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.17220}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MindscapeRAG/MiA-Emb-0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Finetuned

(91)

this model