AMCS: need duplicate audit and cleanup tools #25

Open
opened 2026-04-12 18:05:12 +00:00 by sgcommand · 0 comments
Member

Situation

Tried to clean up duplicate thoughts and projects from the assistant side.

Right now AMCS gives us enough recon to suspect duplicates, but not enough breaching gear to clean them up safely.

In other words: we can see the minefield. We just do not have the stick to poke it with.

What is currently exposed

Useful bits we could reach:

  • list projects
  • list thoughts
  • search thoughts
  • summarize thoughts
  • thought stats
  • update thought
  • create project
  • link thoughts

That is enough for partial inspection.
It is not enough for safe duplicate detection and cleanup.

Tools needed

Project operations

  1. Project search/list with stronger filters

    • exact name
    • normalized name
    • created_at / updated_at / last_active_at
    • sort controls
  2. Project get by id

    • full canonical project record
    • metadata
    • linked thought counts/details
  3. Project update

    • rename
    • edit description/metadata
    • archive/unarchive
  4. Project delete/archive

    • safe destructive path
    • preferably dry-run or dependency preview
  5. Project merge

    • move/reassign thoughts from one project to another
    • preserve history/links
    • optional alias of old name
  6. Duplicate-project finder

    • exact duplicates
    • normalized duplicates
    • fuzzy typo matches
    • similarity/confidence score

Thought operations

  1. Thought list with full filtering/pagination

    • by project
    • by person
    • by topic
    • by type
    • by created/updated windows
    • include archived/deleted flags
  2. Thought get by id

    • full content
    • metadata
    • relationships
    • provenance
  3. Thought delete/archive

    • explicit safe delete/archive
    • reversible if possible
  4. Thought merge

  • merge duplicate thoughts into canonical target
  • preserve provenance/source ids
  • re-home links automatically
  1. Duplicate-thought finder
  • exact text duplicates
  • normalized-content duplicates
  • near-duplicate semantic matches
  • metadata-collision matches

Metadata cleanup

  1. Metadata normalization/remap tools
  • case normalization
  • bulk remap values (example: and -> )
  • preview before apply
  1. Audit/report tool
  • produce candidate duplicate report
  • confidence score
  • recommended canonical item
  • dry-run summary of changes

Why this matters

Without these tools, the assistant can only do half the job:

  • spot suspicious duplicates
  • but not safely resolve them

That leads to either:

  • manual cleanup by a human
  • or dangerous guess-delete behavior

And I know this is shocking, but I am voting against dangerous guess-delete behavior.

Concrete symptom seen

We could access AMCS and inspect some data.
We also saw at least one metadata inconsistency already:

  • and counted separately

So the problem is not theoretical. The cleanup need is real.

Nice-to-haves

  • dry-run on all destructive actions
  • bulk operations
  • archive vs hard delete distinction
  • canonicalization helpers
  • docs that say what matching/filtering is exact vs fuzzy

Desired workflow

  1. Find duplicate project candidates
  2. Review candidates
  3. Merge/archive redundant projects
  4. Find duplicate thought candidates
  5. Review candidates
  6. Merge/archive/delete safely
  7. Normalize metadata values
  8. Emit cleanup report

Summary

We have recon.
We need the breaching kit.

## Situation Tried to clean up duplicate thoughts and projects from the assistant side. Right now AMCS gives us enough recon to suspect duplicates, but not enough breaching gear to clean them up safely. In other words: we can see the minefield. We just do not have the stick to poke it with. ## What is currently exposed Useful bits we could reach: - list projects - list thoughts - search thoughts - summarize thoughts - thought stats - update thought - create project - link thoughts That is enough for partial inspection. It is **not** enough for safe duplicate detection and cleanup. ## Tools needed ### Project operations 1. **Project search/list with stronger filters** - exact name - normalized name - created_at / updated_at / last_active_at - sort controls 2. **Project get by id** - full canonical project record - metadata - linked thought counts/details 3. **Project update** - rename - edit description/metadata - archive/unarchive 4. **Project delete/archive** - safe destructive path - preferably dry-run or dependency preview 5. **Project merge** - move/reassign thoughts from one project to another - preserve history/links - optional alias of old name 6. **Duplicate-project finder** - exact duplicates - normalized duplicates - fuzzy typo matches - similarity/confidence score ### Thought operations 7. **Thought list with full filtering/pagination** - by project - by person - by topic - by type - by created/updated windows - include archived/deleted flags 8. **Thought get by id** - full content - metadata - relationships - provenance 9. **Thought delete/archive** - explicit safe delete/archive - reversible if possible 10. **Thought merge** - merge duplicate thoughts into canonical target - preserve provenance/source ids - re-home links automatically 11. **Duplicate-thought finder** - exact text duplicates - normalized-content duplicates - near-duplicate semantic matches - metadata-collision matches ### Metadata cleanup 12. **Metadata normalization/remap tools** - case normalization - bulk remap values (example: and -> ) - preview before apply 13. **Audit/report tool** - produce candidate duplicate report - confidence score - recommended canonical item - dry-run summary of changes ## Why this matters Without these tools, the assistant can only do half the job: - spot suspicious duplicates - but not safely resolve them That leads to either: - manual cleanup by a human - or dangerous guess-delete behavior And I know this is shocking, but I am voting against dangerous guess-delete behavior. ## Concrete symptom seen We could access AMCS and inspect some data. We also saw at least one metadata inconsistency already: - and counted separately So the problem is not theoretical. The cleanup need is real. ## Nice-to-haves - dry-run on all destructive actions - bulk operations - archive vs hard delete distinction - canonicalization helpers - docs that say what matching/filtering is exact vs fuzzy ## Desired workflow 1. Find duplicate project candidates 2. Review candidates 3. Merge/archive redundant projects 4. Find duplicate thought candidates 5. Review candidates 6. Merge/archive/delete safely 7. Normalize metadata values 8. Emit cleanup report ## Summary We have recon. We need the breaching kit.
Sign in to join this conversation.