Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of Plot Results #447

Open
senbjoern opened this issue Oct 15, 2024 · 1 comment
Open

Validation of Plot Results #447

senbjoern opened this issue Oct 15, 2024 · 1 comment
Labels
question 🙋 Further information is requested

Comments

@senbjoern
Copy link

Congrats to all contributors on this great python library.

I found the qq-plot very helpful but for usage I need to confirm results are reproducible with other softwares.

Can anyone give me a hint about possible ways to confirm qq-plot results with confidence interval (e.g. 0.95 confidence level) ?
Or even enhance the library with sample datasets and settings to confirm results with other softwares ?

With this dataset I get different plots from minitab & pingouin:

dataset=[8.60, 6.35, 6.26, 7.62, 6.29, 10.61, 6.46, 6.12, 7.68, 10.35]

It seems to be a general difficulty:

https://stackoverflow.com/questions/75795580/i-got-slightly-different-qq-plots-in-r-and-python-for-the-same-data-which-one-s

Many thanks

@raphaelvallat raphaelvallat added the question 🙋 Further information is requested label Oct 15, 2024
@raphaelvallat
Copy link
Owner

Hi @senbjoern,

Thank you for opening the issue.

Pingouin uses the scipy.stats.probplot function to generate the QQ-plot. As indicated in the Stack Overflow post that you shared, Scipy and R use different methods for calculating quantiles, which in turn impacts the calculation of the regression line and the confidence intervals. However, it's hard to say which one is more accurate or reliable — I think this comment from the post explained it very nicely:

The fitted lines are different because they use different methods to calculate quantiles. I don't know that we can say which one is "better" or "more reliable" - they're just different.

Now if you'd like, feel free to provide an example of the output of Minitab (preferentially on at least 20 or 30 samples) so that we can better understand the differences and compare it to both R and Python.

Thanks,
Raphael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question 🙋 Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants