Prompt-Tuning Privacy Gains Inconsistent, Dedicated Training Essential for Task Privacy, Study Reveals
June 18, 2026
Prompt-tuning aimed at deterring leaking queries yields inconsistent privacy gains and may reduce task performance, showing that explicit privacy-focused training is more effective for protection.
The work is published with Gurung et al., 2026, and can be found on arXiv at 2605.30727.
Researchers built a controlled benchmark of 1,001 multi-hop chains combining private local facts and web documents to study leakage and safer behavior.
PA-DR uses situational rewards for precise credit assignment, achieving strong sample efficiency with 5–6x fewer generated samples than outcome-only RL to reach similar task performance.
Leakage is measured via the mosaic effect in three dimensions—intent leakage, answer leakage, and full-information leakage—based on what an observer can infer from logs and responses.
PA-DR (Privacy-Aware Deep Research) combines task rewards with a learned privacy reward, reducing leakage from 34.0% to 9.9% while improving strict chain success from 48.7% to 58.7%.
MosaicLeaks highlights a privacy risk in agents that mix private local documents with external web queries, where outbound queries can reveal sensitive information through clues.
While MosaicLeaks is a controlled benchmark, broader real-world deployments require further study, and the work argues that prompting alone cannot ensure privacy without dedicated training.
Baseline results show that chasing higher task performance alone can increase leakage, underscoring the trade-off between solving tasks and protecting privacy.
Experiments show privacy-focused training changes the content of queries to avoid exposing private fragments, maintaining accuracy while cutting leakage rather than simply reducing web querying.
Summary based on 1 source
Get a daily email with more AI stories
Source

Hugging Face • Jun 18, 2026
MosaicLeaks: Can your research agent keep a secret?