Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external/error_report: Add Infinity wait error reporting feature to T… #2329

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dr-venkman
Copy link
Contributor

Add Infinity wait error reporting feature, where callstack of threads in a seemingly endless wait are reported to a server. This can also be used to inspect threads that are possibly deadlocked.

Following are the summarized details of the feature add-on:

  1. Definition of an appropriate Kconfig for the infinity wait feature. Also, turning on active cpu load monitoring when this feature is requested, which is necessary.
  2. Implementation of the infinity wait policy under external/error_report
  3. Replacing all scheduler-based APIs at external/error_report with apporiate ioctls, and implementing a os/drivers/error_report to place the scheduler API calls instead.
  4. Implementing a scenario for an infinitely waiting thread pair in the sample error_report_demo application
  5. Moving the error_report_init API call to os_bringup function from its previous location.

@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch 2 times, most recently from 9127a4b to 332af3c Compare October 11, 2018 10:41
@seinfra
Copy link

seinfra commented Oct 11, 2018

Target : [9127a4bbcd96fe3f775c36dc8698367a40cc24b8] -

1 similar comment
@seinfra
Copy link

seinfra commented Oct 11, 2018

Target : [9127a4bbcd96fe3f775c36dc8698367a40cc24b8] -

@seinfra
Copy link

seinfra commented Oct 11, 2018

Target : [332af3c61c068f3cf2ca9b6b69d3aeec2ba4e447] - Code Rule Check (C++) OK.

@seinfra
Copy link

seinfra commented Oct 11, 2018

Target : [332af3c61c068f3cf2ca9b6b69d3aeec2ba4e447] - Code Rule Check Result:
apps/examples/err_report_demo/err_report_main.c:281: ERROR: [BRC_M_FTN] open brace '{' following function declarations go on the next line
apps/examples/err_report_demo/err_report_main.c:287: ERROR: [BRC_M_FTN] open brace '{' following function declarations go on the next line
apps/examples/err_report_demo/err_report_main.c:293: ERROR: [BRC_M_FTN] open brace '{' following function declarations go on the next line
os/drivers/error_report/error_report_drv.c:73: ERROR: [SPC_M_KWD] space required before the open parenthesis '('
os/drivers/error_report/error_report_drv.c:82: ERROR: [SPC_M_KWD] space required before the open parenthesis '('
os/drivers/error_report/error_report_drv.c:85: ERROR: [BRC_M_FTN] open brace '{' following function declarations go on the next line
os/drivers/error_report/error_report_drv.c:122: ERROR: [BRC_M_FTN] open brace '{' following function declarations go on the next line
os/drivers/error_report/error_report_drv.c:144: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:174: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:178: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:193: ERROR: [BRC_M_SMT] else should follow close brace '}'

@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch from 332af3c to 02c140c Compare October 11, 2018 10:56
@seinfra
Copy link

seinfra commented Oct 11, 2018

Target : [02c140ca1275e20cb97dcfacc371e6cc6d6096c1] - Code Rule Check Result:
os/drivers/error_report/error_report_drv.c:146: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:176: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:180: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:195: ERROR: [BRC_M_SMT] else should follow close brace '}'

@seinfra
Copy link

seinfra commented Oct 11, 2018

Target : [02c140ca1275e20cb97dcfacc371e6cc6d6096c1] - Code Rule Check (C++) OK.

wifi_manager_deinit();
error_report_deinit();
pthread_cancel(thread1);
pthread_cancel(thread2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should call pthread_join or pthread_detach to prevent memory leak.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I will implement as you suggest.

@@ -285,6 +349,10 @@ static void error_report_test(const char *endpoint)
/* Verify multiple errors across WiFi Manager */
error_report_multiple(endpoint);

/* Verify Infinite Wait */
#if CONFIG_ERROR_REPORT_INFINITE_WAIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed below. Thank you.

ERR_REPORT_TEST_WAIT;
sem_init(&g_sem1, 0, 0);
sem_init(&g_sem2, 0, 0);
if ((r = pthread_create(&thread1, NULL, (pthread_startroutine_t) prv_thread1, NULL)) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove a space after type cast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Will modify as suggested.

config ERROR_REPORT_INFINITE_WAIT
bool "Report error for infinitely waiting threads"
default n
select SCHED_CPULOAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you make a dependancy with SCHED_CPULOAD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done because the feature needs CPULOAD monitoring for it to work.

@@ -43,40 +46,147 @@ typedef struct {
uint16_t front;
int8_t q_pending;
pthread_mutex_t err_mutex;
#if CONFIG_ERROR_REPORT_INFINITE_WAIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef
Please check all of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. Will carry out the changes.

#include <stdint.h>
#include <errno.h>
#include <debug.h>
#include <apps/builtin.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this isn't needed. Thanks for pointing out.

if (taskname == NULL) {
return -1;
}
for (i = 0; i < sizeof(g_exceptional_waits)/sizeof(g_exceptional_waits[0]); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, spaces added for readability.

waitdata->ncalls++;
}
waitdata->task_state = tcb->task_state;
waitdata->entry = prv_fetch_taskaddr(waitdata->pid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sched_gettcb is called twice in this case. Line 252 and here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I factored the prv_fetch_taskaddr to include a macro RETURN_ADDR_FROM_TCB for fetching task addr given a tcb pointer. So, the above statement can be replaced by a macro call to RETURN_ADDR_FROM_TCB.

static void prv_tcb_handler(FAR struct tcb_s *tcb, void *args)
{
int pos = 0;
if (tcb->pid < 5) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 5? for idle, hpwork, lpwork, and so on? Can you guarantee those are always enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the condition check, as these tasks are now considered as exceptional cases, and will have a larger time window for reporting purposes. Thanks again.

if ((tcb->flags & TCB_FLAG_TTYPE_MASK) == TCB_FLAG_TTYPE_PTHREAD) {
task_addr = (unsigned long)e.pthread;
} else {
task_addr = (unsigned long)e.main;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use prv_fetch_taskaddr which you made?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you already got the tcb structure so that it is little different between here and prv_fetch_taskaddr.
Then, you had better refactor codes to make a function for duplicated operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored using the RETURN_ADDR_FROM_TCB macro call.

} else {
task_addr = (unsigned long)e.main;
}
saved_pc = tcb->xcp.regs[13];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it enough to debug only with regs[13]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. The tcb structure provides many fields to debug, of which regs[13] points to the stack. Assuming that developers would prefer something similar to a gdb> backtrace, we figured this would be a great place to start with. Please feel free to share any additional fields in tcb structure that can simplify debugging.

@sunghan-chang
Copy link
Contributor

@dr-venkman Can you find the root cause with provided information by error report?

} else if (g_thd_waitq[pos].pid) {
/* Collision resolution, find the slot using linear search */
int i;
for (i = pos+1; i != pos; i = (i+1)%CONFIG_MAX_TASKS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert spaces between operator and operands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. Will modify as suggested.

else {
/* Collision resolution, find the next empty slot */
int i;
for (i = pos+1; i != pos; i = (i+1)%CONFIG_MAX_TASKS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please insert spaces between operator and operands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. Will modify as suggested.

{
int ret = -EINVAL;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra empty line can be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed empty lines for readability. Thank you.

*
* Description:
* Register task management path, ERROR_REPORT_DRVPATH
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description for parameter could be added here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added detailed description for the function, including the parameter.

@dr-venkman dr-venkman requested a review from pillip8282 October 12, 2018 01:23
@dr-venkman
Copy link
Contributor Author

dr-venkman commented Oct 12, 2018

@dr-venkman Can you find the root cause with provided information by error report?

That is a broader area of work, as root cause analysis might involve more pieces of information, and possibly fault diagnosis algorithms as well. For now, this commit provides the callstack of threads that are inferrred to be waiting infinitely (a large elapsed time interval, in practice) long in time. I hope that answers sufficiently. Please let me know if you need further details.

@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch from 02c140c to 5e4b199 Compare October 12, 2018 05:14
Copy link
Contributor Author

@dr-venkman dr-venkman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed most, if not all of the comments. Thanks a lot for your inputs.

@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [5e4b199422c6ff46071e28d137e4144cced26c4f] - Code Rule Check Result:
os/drivers/error_report/error_report_drv.c:94: ERROR: [SPC_M_KWD] space required before the open parenthesis '('
os/drivers/error_report/error_report_drv.c:148: ERROR: [BRC_M_SMT] that open brace { should be on the previous line
os/drivers/error_report/error_report_drv.c:148: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:179: ERROR: [BRC_M_SMT] that open brace { should be on the previous line
os/drivers/error_report/error_report_drv.c:179: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:184: ERROR: [BRC_M_SMT] that open brace { should be on the previous line
os/drivers/error_report/error_report_drv.c:184: ERROR: [BRC_M_SMT] else should follow close brace '}'
os/drivers/error_report/error_report_drv.c:200: ERROR: [BRC_M_SMT] that open brace { should be on the previous line
os/drivers/error_report/error_report_drv.c:200: ERROR: [BRC_M_SMT] else should follow close brace '}'

@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [5e4b199422c6ff46071e28d137e4144cced26c4f] - Code Rule Check (C++) OK.

@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch from 5e4b199 to 3a1891a Compare October 12, 2018 05:28
@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [3a1891a81abbe6845f40f94301b427c7359ffa3f] - Code Rule Check OK.

@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [3a1891a81abbe6845f40f94301b427c7359ffa3f] - Code Rule Check (C++) OK.

@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch 2 times, most recently from 26bf49d to 9e23482 Compare October 12, 2018 06:32
@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [9e23482628431960553b8d26a4a2fa565e5af6af] - Code Rule Check OK.

@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [9e23482628431960553b8d26a4a2fa565e5af6af] - Code Rule Check (C++) OK.

@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch from 9e23482 to ae8ecb8 Compare October 12, 2018 08:12
@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [ae8ecb84561dd9ccfe2f6fc6bb14750be02dede2] - Code Rule Check OK.

@seinfra
Copy link

seinfra commented Oct 12, 2018

Target : [ae8ecb84561dd9ccfe2f6fc6bb14750be02dede2] - Code Rule Check (C++) OK.

…izenRT

Add Infinity wait error reporting feature, where callstack of threads in a seemingly endless wait are reported to a server. This can also be used to inspect threads that are possibly deadlocked.

Following are the summarized details of the feature add-on:
1. Definition of an appropriate Kconfig for the infinity wait feature. Also, turning on active cpu load monitoring when this feature is requested, which is necessary.
2. Implementation of the infinity wait policy under `external/error_report`
3. Replacing all scheduler-based APIs at `external/error_report` with apporiate `ioctls`, and implementing a `os/drivers/error_report` to place the scheduler API calls instead.
4. Implementing a scenario for an infinitely waiting thread pair in the sample `error_report_demo` application
5. Moving the `error_report_init` API call to `os_bringup` function from its previous location.
@dr-venkman dr-venkman force-pushed the add_infinite_thread_wait_errreport branch from ae8ecb8 to bb3c4ed Compare October 15, 2018 01:46
@seinfra
Copy link

seinfra commented Oct 15, 2018

Target : [bb3c4ed] - Code Rule Check (C++) OK.

@seinfra
Copy link

seinfra commented Oct 15, 2018

Target : [bb3c4ed] - Code Rule Check OK.

uint32_t *ptr;
ptr = (uint32_t *)tcb->xcp.regs[13];
while ((void *)ptr < tcb->adj_stack_ptr && (waitdata->ncalls < CONFIG_ERROR_REPORT_BACKTRACE_MAX_DEPTH)) {
if (*ptr >= 0x04000000 && *ptr <= 0x04800000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's cypress's DDR address right?
do not use numbers directly.

@juitem
Copy link
Contributor

juitem commented Oct 16, 2018

I think this method has several restrictions.

  1. This thread should be top most priority.
  2. Kernel / platform / network lockups cannot be reported. Only Application level lock ups can be detected. Which can be obtained with 'ps' command in TASH

Thread is newly created and checking CPU loads which seems too much overhead compared with adding implentation on scheduler.
I think we can discuss convenient methods

@seinfra
Copy link

seinfra commented Apr 24, 2019

Target : [bb3c4ed] - Code Rule Check OK.

@seinfra
Copy link

seinfra commented Apr 24, 2019

Target : [bb3c4ed] - Code Rule Check (C++) OK.

1 similar comment
@seinfra
Copy link

seinfra commented Apr 24, 2019

Target : [bb3c4ed] - Code Rule Check (C++) OK.

@seinfra
Copy link

seinfra commented Nov 12, 2019

Target : [bb3c4ed] - Code Rule Check OK.

@seinfra
Copy link

seinfra commented Nov 12, 2019

Target : [bb3c4ed] - Code Rule Check (C++) OK.

@tizen-build
Copy link

Target : [bb3c4ed] - Code Rule Check OK.

1 similar comment
@tizen-build
Copy link

Target : [bb3c4ed] - Code Rule Check OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants