LLM VS MCP where definitely should the tester focus on ? Since the LLM in the backend uses a MCP is it enough to test the LLM or should test both?
What strategies exist to systematically test prompt variations and ensure consistent agent behavior?
As AI takes over more parts of the testing lifecycle, should we start treating test engineers as prompt engineers?
How should we prioritize evaluation metrics for LLM agents, accuracy versus safety, coherence versus relevance?
Do existing benchmarks (e.g., BLEU, ROUGE) still matter, or do we need new ones for agent behavior?
In 2 years, will QA roles evolve more into AI supervisors than testers? Is that a future you’re ready for?
For teams new to testing LLM agents, which practices should be prioritized first?
Would the training data be given to the QA’s by the DEV’s, or will the QA’s be responsible for creating the test data?