Discussion on Ensuring Quality in Data & AI by Bharath Hemachandran | Testμ 2023

In this fast-paced digital landscape of artificial intelligence (AI) and machine learning (ML), ensuring the quality of AI systems has become paramount. :robot:

Join in to learn the unique challenges in Data and AI, the difference between traditional software quality and that in Data & AI, and much more. :gear:

Still not registered? Hurry up and grab your free tickets: Register Now!

If you have already registered and up for the session, feel free to post your questions in the thread below :point_down:

Check out some of the Q&As picked during the session:

How are we able to have the required traceability for related testing to requirements?

Bharath provided a concise approach to ensuring traceability of testing to requirements. He outlined five key considerations: regulatory compliance, ethical implications, infrastructure suitability, explainability of models, and comprehensive definitions of ‘done’. By doing these steps, the speaker showed how to ensure testing matches what’s needed.

How do you see the interplay between model quality and overall system quality and what strategies can be employed to ensure both are maintained?

Bharath answered this question by discussing model quality’s effect on the system. He mentioned three aspects of model quality, i.e., algorithms used, the quality of data for training, and the system’s explainability. To ensure model and system quality, Bharath recommended using Oracles to test and track model versions over time. This helps maintain good models and understand any changes.

What are some of the most common challenges that arise when applying the traditional software quality methodologies to the systems?

Bharath mentioned about the different challenges when it comes to applying traditional software quality methodologies to data and AI systems. Shifting from a tester-centric approach to involving all stakeholders in quality ownership is crucial. Addressing governance and ethical concerns upfront is essential, as these can’t be changed later.

Choosing the right approach and not overcomplicating solutions is important. Data quality must be a priority, even for seemingly simple problems. Effective communication with new roles like data scientists and analysts is key.

Now, let’s have a look at a few unanswered questions from the session.

What are some common pitfalls while creating a data pipeline for an AI data model, which can be used later for input/output validation?

What is the essential road map to start the journey in data science?

What metrics should one use to create non functional tests for AI/ML?

Are there specific cloud services or tools that are particularly well-suited for ensuring AI quality throughout the development lifecycle?

What should be our approach to start testing of generative AI tool (i.e. tool based on ChatGPT)?

What are the steps needed to maintain quality in all phases of life cycle?

What is the role of data and data quality in the development of AI systems?

In ETL testing , what approach a QA should follow to insure he is not replicating same data processing on his side just for the validation.

What should be mindset of QA in the present day trends of AI-powered software solution?

How do we do mine for data from applications which are complex in nature using AI?

How the data quality is ensured, especially when we are getting data from different sources/formats and transformation applied to form a golden data?

How are we able to have the required traceability for AI-related testing to requirements?

What are the different tools that would be used in ensuring the quality from different role perspectives?

Hi there,

If you couldn’t catch the session live, don’t worry! You can watch the recording here:

Additionally, we’ve got you covered with a detailed session blog:

Hey LambdaTest,

I have been part of this actively engaging session, It’s important to address common pitfalls when constructing a data pipeline for AI data models, particularly in terms of input/output validation.

These pitfalls encompass challenges such as overlooking data quality issues, which may involve missing or inconsistent data. Additionally, failing to account for data drift as distributions evolve over time can lead to issues. Proper documentation of data transformations is another crucial aspect often neglected. Furthermore, it’s essential to consider potential biases or fairness concerns within the data, as they can significantly affect the performance and reliability of the model.

I trust this explanation clarifies these considerations.

Hey LambdaTest,

Having been an engaged participant in this enlightening session, I’m pleased to share insights on behalf of the speaker.

The steps outlined indeed form a fundamental roadmap to embark on a journey in data science.

To begin, it’s essential to establish a strong foundation by learning fundamental concepts in statistics, mathematics, and programming. Subsequently, honing skills in data handling, including manipulation, cleaning, and analysis, becomes crucial. Exploring machine learning algorithms and techniques follows suit.

However, practical application is key. Applying your knowledge through hands-on projects with real-world datasets solidifies your understanding and proficiency. And remember, the journey is continuous. Staying updated with evolving industry trends and consistently expanding your skills will ensure your progress and success in the field of data science.

I hope this roadmap is a valuable guide for your data science journey.