-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external/error_report: Add Infinity wait error reporting feature to T… #2329
base: master
Are you sure you want to change the base?
external/error_report: Add Infinity wait error reporting feature to T… #2329
Conversation
9127a4b
to
332af3c
Compare
Target : [9127a4bbcd96fe3f775c36dc8698367a40cc24b8] - |
1 similar comment
Target : [9127a4bbcd96fe3f775c36dc8698367a40cc24b8] - |
Target : [332af3c61c068f3cf2ca9b6b69d3aeec2ba4e447] - Code Rule Check (C++) OK. |
Target : [332af3c61c068f3cf2ca9b6b69d3aeec2ba4e447] - Code Rule Check Result: |
332af3c
to
02c140c
Compare
Target : [02c140ca1275e20cb97dcfacc371e6cc6d6096c1] - Code Rule Check Result: |
Target : [02c140ca1275e20cb97dcfacc371e6cc6d6096c1] - Code Rule Check (C++) OK. |
wifi_manager_deinit(); | ||
error_report_deinit(); | ||
pthread_cancel(thread1); | ||
pthread_cancel(thread2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should call pthread_join
or pthread_detach
to prevent memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I will implement as you suggest.
@@ -285,6 +349,10 @@ static void error_report_test(const char *endpoint) | |||
/* Verify multiple errors across WiFi Manager */ | |||
error_report_multiple(endpoint); | |||
|
|||
/* Verify Infinite Wait */ | |||
#if CONFIG_ERROR_REPORT_INFINITE_WAIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed below. Thank you.
ERR_REPORT_TEST_WAIT; | ||
sem_init(&g_sem1, 0, 0); | ||
sem_init(&g_sem2, 0, 0); | ||
if ((r = pthread_create(&thread1, NULL, (pthread_startroutine_t) prv_thread1, NULL)) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove a space after type cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. Will modify as suggested.
config ERROR_REPORT_INFINITE_WAIT | ||
bool "Report error for infinitely waiting threads" | ||
default n | ||
select SCHED_CPULOAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you make a dependancy with SCHED_CPULOAD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done because the feature needs CPULOAD
monitoring for it to work.
external/error_report/error_report.c
Outdated
@@ -43,40 +46,147 @@ typedef struct { | |||
uint16_t front; | |||
int8_t q_pending; | |||
pthread_mutex_t err_mutex; | |||
#if CONFIG_ERROR_REPORT_INFINITE_WAIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#ifdef
Please check all of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. Will carry out the changes.
#include <stdint.h> | ||
#include <errno.h> | ||
#include <debug.h> | ||
#include <apps/builtin.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this isn't needed. Thanks for pointing out.
if (taskname == NULL) { | ||
return -1; | ||
} | ||
for (i = 0; i < sizeof(g_exceptional_waits)/sizeof(g_exceptional_waits[0]); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, spaces added for readability.
waitdata->ncalls++; | ||
} | ||
waitdata->task_state = tcb->task_state; | ||
waitdata->entry = prv_fetch_taskaddr(waitdata->pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sched_gettcb is called twice in this case. Line 252 and here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I factored the prv_fetch_taskaddr
to include a macro RETURN_ADDR_FROM_TCB
for fetching task addr given a tcb pointer. So, the above statement can be replaced by a macro call to RETURN_ADDR_FROM_TCB
.
static void prv_tcb_handler(FAR struct tcb_s *tcb, void *args) | ||
{ | ||
int pos = 0; | ||
if (tcb->pid < 5) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 5? for idle, hpwork, lpwork, and so on? Can you guarantee those are always enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the condition check, as these tasks are now considered as exceptional cases, and will have a larger time window for reporting purposes. Thanks again.
if ((tcb->flags & TCB_FLAG_TTYPE_MASK) == TCB_FLAG_TTYPE_PTHREAD) { | ||
task_addr = (unsigned long)e.pthread; | ||
} else { | ||
task_addr = (unsigned long)e.main; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use prv_fetch_taskaddr
which you made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you already got the tcb structure so that it is little different between here and prv_fetch_taskaddr
.
Then, you had better refactor codes to make a function for duplicated operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored using the RETURN_ADDR_FROM_TCB
macro call.
} else { | ||
task_addr = (unsigned long)e.main; | ||
} | ||
saved_pc = tcb->xcp.regs[13]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it enough to debug only with regs[13]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the question. The tcb structure provides many fields to debug, of which regs[13] points to the stack. Assuming that developers would prefer something similar to a gdb> backtrace
, we figured this would be a great place to start with. Please feel free to share any additional fields in tcb structure that can simplify debugging.
@dr-venkman Can you find the root cause with provided information by error report? |
} else if (g_thd_waitq[pos].pid) { | ||
/* Collision resolution, find the slot using linear search */ | ||
int i; | ||
for (i = pos+1; i != pos; i = (i+1)%CONFIG_MAX_TASKS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please insert spaces between operator and operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. Will modify as suggested.
else { | ||
/* Collision resolution, find the next empty slot */ | ||
int i; | ||
for (i = pos+1; i != pos; i = (i+1)%CONFIG_MAX_TASKS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please insert spaces between operator and operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. Will modify as suggested.
{ | ||
int ret = -EINVAL; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra empty line can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed empty lines for readability. Thank you.
* | ||
* Description: | ||
* Register task management path, ERROR_REPORT_DRVPATH | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description for parameter could be added here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added detailed description for the function, including the parameter.
That is a broader area of work, as root cause analysis might involve more pieces of information, and possibly fault diagnosis algorithms as well. For now, this commit provides the callstack of threads that are inferrred to be waiting infinitely (a large elapsed time interval, in practice) long in time. I hope that answers sufficiently. Please let me know if you need further details. |
02c140c
to
5e4b199
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed most, if not all of the comments. Thanks a lot for your inputs.
Target : [5e4b199422c6ff46071e28d137e4144cced26c4f] - Code Rule Check Result: |
Target : [5e4b199422c6ff46071e28d137e4144cced26c4f] - Code Rule Check (C++) OK. |
5e4b199
to
3a1891a
Compare
Target : [3a1891a81abbe6845f40f94301b427c7359ffa3f] - Code Rule Check OK. |
Target : [3a1891a81abbe6845f40f94301b427c7359ffa3f] - Code Rule Check (C++) OK. |
26bf49d
to
9e23482
Compare
Target : [9e23482628431960553b8d26a4a2fa565e5af6af] - Code Rule Check OK. |
Target : [9e23482628431960553b8d26a4a2fa565e5af6af] - Code Rule Check (C++) OK. |
9e23482
to
ae8ecb8
Compare
Target : [ae8ecb84561dd9ccfe2f6fc6bb14750be02dede2] - Code Rule Check OK. |
Target : [ae8ecb84561dd9ccfe2f6fc6bb14750be02dede2] - Code Rule Check (C++) OK. |
…izenRT Add Infinity wait error reporting feature, where callstack of threads in a seemingly endless wait are reported to a server. This can also be used to inspect threads that are possibly deadlocked. Following are the summarized details of the feature add-on: 1. Definition of an appropriate Kconfig for the infinity wait feature. Also, turning on active cpu load monitoring when this feature is requested, which is necessary. 2. Implementation of the infinity wait policy under `external/error_report` 3. Replacing all scheduler-based APIs at `external/error_report` with apporiate `ioctls`, and implementing a `os/drivers/error_report` to place the scheduler API calls instead. 4. Implementing a scenario for an infinitely waiting thread pair in the sample `error_report_demo` application 5. Moving the `error_report_init` API call to `os_bringup` function from its previous location.
ae8ecb8
to
bb3c4ed
Compare
Target : [bb3c4ed] - Code Rule Check (C++) OK. |
Target : [bb3c4ed] - Code Rule Check OK. |
uint32_t *ptr; | ||
ptr = (uint32_t *)tcb->xcp.regs[13]; | ||
while ((void *)ptr < tcb->adj_stack_ptr && (waitdata->ncalls < CONFIG_ERROR_REPORT_BACKTRACE_MAX_DEPTH)) { | ||
if (*ptr >= 0x04000000 && *ptr <= 0x04800000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's cypress's DDR address right?
do not use numbers directly.
I think this method has several restrictions.
Thread is newly created and checking CPU loads which seems too much overhead compared with adding implentation on scheduler. |
Target : [bb3c4ed] - Code Rule Check OK. |
Target : [bb3c4ed] - Code Rule Check (C++) OK. |
1 similar comment
Target : [bb3c4ed] - Code Rule Check (C++) OK. |
Target : [bb3c4ed] - Code Rule Check OK. |
Target : [bb3c4ed] - Code Rule Check (C++) OK. |
Target : [bb3c4ed] - Code Rule Check OK. |
1 similar comment
Target : [bb3c4ed] - Code Rule Check OK. |
Add Infinity wait error reporting feature, where callstack of threads in a seemingly endless wait are reported to a server. This can also be used to inspect threads that are possibly deadlocked.
Following are the summarized details of the feature add-on:
external/error_report
external/error_report
with apporiateioctls
, and implementing aos/drivers/error_report
to place the scheduler API calls instead.error_report_demo
applicationerror_report_init
API call toos_bringup
function from its previous location.