Skip to content

bluetooth/service: fix crash by removing g_mutex destroy in cleanup#492

Open
zhongzhijie1 wants to merge 875 commits into
open-vela:devfrom
zhongzhijie1:global_mutex_fix
Open

bluetooth/service: fix crash by removing g_mutex destroy in cleanup#492
zhongzhijie1 wants to merge 875 commits into
open-vela:devfrom
zhongzhijie1:global_mutex_fix

Conversation

@zhongzhijie1
Copy link
Copy Markdown
Contributor

bug: v/87524

g_mutex is a statically initialized global mutex shared across processes in NuttX flat build. When bluetoothd exits, manager_cleanup() called uv_mutex_destroy(&g_mutex), which invokes nxsem_destroy and clears sem.flags (SEM_TYPE_MUTEX). Subsequently, vappxms_main calls manager_find_instance() -> uv_mutex_lock(), triggering DEBUGASSERT(NXSEM_IS_MUTEX) failure in nxmutex_trywait and crashing the system.

Remove uv_mutex_destroy() from manager_cleanup() since the mutex is shared across process boundaries and must remain valid for the lifetime of the system.

expliyh and others added 30 commits December 23, 2025 23:48
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80279

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80279

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81606

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80270

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80268

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80268

Rootcause: unref conn before use
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81589

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80268

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80258

Rootcause: ARRAY_SIZE redefined
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80268

Rootcause: profile direct connect api has changed
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80268

Rootcause: should not call unref at this time
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80269

Rootcause: add logs to print call number when sync
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81504

return -EINPROGRESS dial callback and send OK after dial response

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81567

remove unused functions

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81589

remove global var g_sal_ag_sync_conn and sync  call by addr

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81504
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/80268

Rootcause: Not unregister callbacks when cleanup.
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81522

When event_id == BT_AVRCP_EVT_VOLUME_CHANGED, flag = true is set only if both CONFIG_BLUETOOTH_AVRCP_ABSOLUTE_VOLUME and CONFIG_BLUETOOTH_AVRCP_CONTROL are enabled.
Otherwise in the else branch, flag = true is set only if CONFIG_BLUETOOTH_AVRCP_TARGET is enabled. All configurations are enabled, treating flag as always true, making the if (!flag) condition unreachable.

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81702

Fixed memory leak caused by premature return.

Signed-off-by: liuxiang18 <liuxiang18@xiaomi.com>
bug: v/80258

Rootcause: attributes in bt_sdp_discover_params may be modified by ZBlue SDP. Using const could cause a crash in some cases
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81752

Rootcause: audio_connect should not be called in bluetoothd task
Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
bug: v/81968

Rootcause: disconnected_callback not called caused connect info not cleared in connection manager module

Signed-off-by: YuhengLi <liyuheng@xiaomi.com>
… an asynchronous API.

bug: v/81682

When priv dynamically allocates memory successfully but fails later due to other reasons before reaching the assignment ins->priv = priv;, the memory allocated to priv cannot be freed in bt_socket_async_client_deinit, leading to a resource leak.

Signed-off-by: jialu <jialu@xiaomi.com>
…se functions

bug: v/80811

Rootcause: In certain scenarios, users of `euv_pipe` must ensure all UV requests have completed execution before releasing resources. Consequently, it is necessary to notify users that `euv_pipe` has been fully released after its close operation is completed, thereby permitting subsequent operational procedures to proceed. Support for the close callback has therefore been added.

Signed-off-by: chejinxian1 <chejinxian1@xiaomi.com>
zhongzhijie1 and others added 5 commits March 20, 2026 21:47
bug: v/87917

Map BT_LE_ADDR_TYPE_UNKNOWN to BT_ADDR_LE_PUBLIC instead of
BT_ADDR_LE_RANDOM to maintain compatibility with legacy stack
behavior where the default address type was public.

Signed-off-by: zhongzhijie1 <zhongzhijie1@xiaomi.com>
bug: v/-88168

Rootcause: Adjusting the volume in sniff mode is relatively slow; need to exit sniff before adjusting the volume.

Signed-off-by: zhangyuan20 <zhangyuan20@xiaomi.com>
bug: v/84601

rootcause:
1: The old interface forcibly disconnects ACLs, causing profile
connection compatibility issues.
2: The old interface port forcibly releases and cleans up local
resources before disconnecting, especially in sniffing scenarios,
ending before entering active mode, causing a 30-second
timeout on the phone.

in turn, increases the connection disconnection time by 300 milliseconds.

Signed-off-by: Kai Cheng <chengkai@xiaomi.com>
bug: v/82380

The previous call operation path could block receiving and eventually lead to a
crash.
Signed-off-by: liyuheng <liyuheng@xiaomi.com>
bug: v/88151

In zblue_on_connected, when the device acts as ACP (acceptor), it only creates the a2dp_info and waits passively. It never initiates bt_a2dp_discover, so if the remote side also does not initiate the discover/set_config flow, the A2DP connection stalls -- signaling channel is up but no stream is ever configured.

Additionally, in bt_a2dp_discover_endpoint_cb, the a2dp_info->role remains SEP_INVALID for ACP connections because the role was only assigned during the connect initiation path. The SNK set_config branch also had an unnecessary int_acp == A2DP_INT guard, which blocked ACP-initiated discover from completing the configuration.

Fix:

Added a 2-second service_loop_timer in the ACP path of zblue_on_connected. If the remote does not send set_config within 2s, the local side proactively initiates bt_a2dp_discover.
The timer is cancelled in zblue_on_config_req (remote drove the flow first) and in a2dp_info_destroy (cleanup on disconnect).
In bt_a2dp_discover_endpoint_cb, the local role is now derived from the remote's sep_info->tsep: BT_AVDTP_SOURCE -> local SEP_SNK, BT_AVDTP_SINK -> local SEP_SRC.
Removed the int_acp == A2DP_INT restriction on the SEP_SNK set_config branch so ACP-triggered discover can also complete codec negotiation.

Signed-off-by: jialu <jialu@xiaomi.com>
chengkai15
chengkai15 previously approved these changes Mar 26, 2026
Copy link
Copy Markdown
Contributor

@chengkai15 chengkai15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #492 Review — fix crash by removing g_mutex destroy

概述: 1 文件 1+/2-,移除 manager_cleanup() 中的 uv_mutex_destroy(&g_mutex)

✅ 优点

  • 根因准确:静态初始化的全局 mutex 不应在进程退出时销毁
  • 修复正确:g_mutex 跨进程共享,必须在系统生命周期内有效
  • 影响最小:仅移除一行销毁调用

📋 结论

修复正确。APPROVE

jialu522 and others added 4 commits March 27, 2026 14:13
…lf sent).

bug: v/87852

This commit performs an architecture-level refactoring of cs_ras.c and cs_ras.h. The core change replaces the shared global buffer and static arrays used for Real-time and On-demand mode data storage with independent dynamically-allocated linked list queues, resolving memory safety issues and concurrent processing defects in the original architecture.

Signed-off-by: jialu <jialu@xiaomi.com>
bug: v/88558

Add conditional compilation for Bluetooth LE Channel Sounding (CS)
feature across framework API, socket IPC, Zephyr SAL, CS profiles,
and le_cs tool.

Signed-off-by: jialu <jialu@xiaomi.com>
bug: v/87941

bt_sal_a2dp_source_send_data calls net_buf_add_mem(media_packet_buf, &buf[AVDTP_RTP_HEADER_LEN], nbytes) without validating whether nbytes exceeds the available space in media_packet_buf. The buffer is allocated from bt_a2dp_tx_pool with a data size of CONFIG_ZBLUE_A2DP_SOURCE_BUF_SIZE (default 660 bytes). After bt_a2dp_stream_create_pdu reserves protocol headers (STREAM_DATA_RESERVED, i.e. AVDTP_RTP_HEADER_LEN = 12 bytes), the actual usable payload space is CONFIG_ZBLUE_A2DP_SOURCE_BUF_SIZE - STREAM_DATA_RESERVED (648 bytes). Zephyr's net_buf_add_mem only has an __ASSERT_NO_MSG check which is stripped in release builds. If nbytes exceeds the tailroom, a buffer overflow occurs, corrupting adjacent memory and potentially causing hard faults or data corruption.

Fix: Add a length check before buffer allocation using CONFIG_ZBLUE_A2DP_SOURCE_BUF_SIZE - STREAM_DATA_RESERVED as the maximum payload limit. When nbytes exceeds this limit, log an error and return BT_STATUS_PARM_INVALID, avoiding unnecessary buffer allocation and out-of-bounds writes.

Signed-off-by: jialu <jialu@xiaomi.com>
bug: v/85832

Fix defects causing bttool resource leak during BT enable/disable
stress test:

1. Store bttool_t pointer in g_bttool_loop->data so TURNING_OFF callback
   can access the async queue (previously always 0, cleanup was skipped)
2. Replace do_in_thread_loop with bttool_uninit() in TURNING_OFF callback
   to send _uninit command via uv_async_queue_send, ensuring bt_tool_uninit
   runs on g_bttool_loop thread (mirrors bttool_quit pattern). Guard with
   CONFIG_LIBUV_EXTENSION only.
3. Add re-entry guard in bt_tool_uninit to prevent double cleanup on
   repeated BT disable cycles

Signed-off-by: chejinxian1 <chejinxian1@xiaomi.com>
@github-actions
Copy link
Copy Markdown

❌ CLA Signature Required

@zhongzhijie1 Some contributors need to sign the CLA:

  • guanyi3@xiaomi.comNeeds to sign CLA

Please:

  1. Sign the CLA at: https://www.openvela.com/#/community/cla
  2. After signing, comment /check-cla to recheck

📋 View detailed check results: Action Run #23779300704


💡 Tip: All contributors must sign the CLA before the PR can be merged.

expliyh and others added 7 commits March 31, 2026 14:02
bug: v/87902

Rootcause: When ACL disconnects during SDP discovery, the SDP client
only invokes the disconnected callback, not the func callback. HFP HF
SAL did not register a disconnected callback, causing sal_conn to be
orphaned in g_sal_hf_conn_list and the upper layer state machine to
get stuck. Add the disconnected callback to clean up sal_conn and
notify the upper layer of the disconnection.

Signed-off-by: liyuheng <liyuheng@xiaomi.com>
bug: v/87902

Rootcause: When ACL disconnects during SDP discovery, the SDP client
only invokes the disconnected callback, not the func callback. HFP AG
SAL did not register a disconnected callback, so the upper layer never
received a PROFILE_STATE_DISCONNECTED notification and could get stuck.
Add the disconnected callback to notify the upper layer of the
disconnection.

Signed-off-by: liyuheng <liyuheng@xiaomi.com>
…nnected

bug: v/87902

Use sal_conn->addr directly instead of an intermediate bt_address_t*
pointer variable. The sal_conn object remains valid until bt_list_remove
at the end of the function, so direct access is safe and cleaner.

Signed-off-by: liyuheng <liyuheng@xiaomi.com>
bug: v/87902

Refactor the HFP AG SAL connection establishment to align with the
HFP HF implementation:

- Remove global g_conn_params that serialized all connections through
  a single slot, preventing parallel outgoing connections.
- Split do_ag_connect into do_ag_sdp_discover (SDP phase) and
  do_ag_slc_connect (SLC phase), matching HF's do_hf_sdp_discover
  and do_hf_slc_connect.
- Create sal_conn early at SDP discover time so the connection is
  tracked from the start, enabling proper cleanup on SDP failure.
- Use service_loop_work to dispatch do_ag_slc_connect from
  zblue_on_sdp_done, instead of direct synchronous call.
- Add find_connection_by_context to look up sal_conn by bt_conn*.
- Fix zblue_on_ag_disconnected to use sal_conn->addr directly and
  call bt_list_remove after callbacks.
- Fix zblue_on_ag_connected to handle incoming connections and
  simultaneous connection collision consistently with HF.

Signed-off-by: liyuheng <liyuheng@xiaomi.com>
bug: v/5823

CM_RECONNECT_INTERVAL was changed from 8s to 12s for power optimization,
but CM_RECONNECT_TIMES was not updated accordingly, resulting in a 45-minute
reconnect window instead of the designed 30 minutes.

Fix by deriving CM_RECONNECT_TIMES from CM_RECONNECT_INTERVAL directly,
so future interval changes are automatically reflected.

Signed-off-by: liuxiang18 <liuxiang18@xiaomi.com>
bug: v/87639

Keep the ACL link in active mode during SCO connection to reduce
control latency for HFP commands such as call termination.

Also fix audio_on_exit to call bt_pm_idle instead of bt_pm_busy,
ensuring consistent idle state when SCO is not established.

Signed-off-by: Zihao Gao <gaozihao@xiaomi.com>
bug: v/88088

bt_att_create_pdu uses BT_ATT_TIMEOUT (K_SECONDS(30)) for ATT_RESPONSE
and ATT_NOTIFICATION ops, which can block the calling thread up to 30s
when the buf pool is exhausted.

Refactor bt_sal_gatt_server_send_notification,
bt_sal_gatt_server_send_indication and bt_sal_gatt_server_send_response
to copy the payload into a heap-allocated request struct and dispatch
via service_loop_work, so the framework thread returns immediately and
the blocking stack call runs in the service worker thread.

- rootcause: bt_gatt_notify_cb/bt_gatt_indicate/bt_gatt_send_read_rsp
  all call bt_att_create_pdu which may block K_SECONDS(30) on buf pool
  exhaustion, stalling the framework thread
- Add sal_gatts_notify_req_t and sal_gatts_rsp_req_t with flexible array
  member to carry the value copy inline
- First four fields of both structs mirror sal_adapter_req_t so
  sal_invoke_async can cast and dispatch safely

Signed-off-by: zhongzhijie1 <zhongzhijie1@xiaomi.com>
@zhongzhijie1
Copy link
Copy Markdown
Contributor Author

/check-cla

bug: v/87524

g_mutex is a statically initialized global mutex shared across processes
in NuttX flat build. When bluetoothd exits, manager_cleanup() called
uv_mutex_destroy(&g_mutex), which invokes nxsem_destroy and clears
sem.flags (SEM_TYPE_MUTEX). Subsequently, vappxms_main calls
manager_find_instance() -> uv_mutex_lock(), triggering
DEBUGASSERT(NXSEM_IS_MUTEX) failure in nxmutex_trywait and crashing
the system.

Remove uv_mutex_destroy() from manager_cleanup() since the mutex is
shared across process boundaries and must remain valid for the lifetime
of the system.

Signed-off-by: guanyi3 <guanyi3@xiaomi.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 2, 2026

✅ CLA Verification Complete

@zhongzhijie1 All contributors have signed the CLA!

  • guanyi3@xiaomi.com

📋 View detailed check results: Action Run #23881443515

Your pull request can now proceed with the review process! 🎉

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions Bot added the Stale label May 3, 2026
@liujinye-sys liujinye-sys dismissed stale reviews from chengkai15, gzh-terry, and huangyulong3 May 9, 2026 06:44

The merge-base changed after approval.

@github-actions github-actions Bot removed the Stale label May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.