feat: replace Flask backend with Node.js/Express#269
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Replaces the previous Flask-based backend with a Node.js/Express API while introducing supporting configuration/docs and automation around submodule syncing.
Changes:
- Added an Express server (
backend/) implementing the existing API endpoints usingmysql2/promise. - Added containerization and environment variable examples for the new backend.
- Added repo/docs updates (README, third-party notice, submodule + workflow).
Reviewed changes
Copilot reviewed 9 out of 1349 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/index.js | Implements Express API endpoints and MySQL querying logic replacing Flask routes |
| backend/config.js | Defines supported distro/table mapping and paging constant used by the backend |
| backend/Dockerfile | Adds a container build for running the new Node backend |
| backend/.env.example | Example env vars for backend DB + port configuration |
| .env.example | Example env vars at repo root (overlaps with backend example) |
| .gitmodules | Adds distro_data submodule reference |
| .github/workflows/updatesubmodules.yml | Adds a workflow to sync submodule changes via SSH |
| README.md | Updates project introduction and repository tour |
| THIRDPARTY.txt | Adds third-party license notice text |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const DISTRO_BIT_MAP = {}; | ||
| let bitFlag = 1; | ||
| for (const distroName of Object.keys(SUPPORTED_DISTROS)) { | ||
| const versions = Object.keys(SUPPORTED_DISTROS[distroName]).sort(); | ||
| for (const version of versions) { | ||
| if (!DISTRO_BIT_MAP[distroName]) { | ||
| DISTRO_BIT_MAP[distroName] = {}; | ||
| } | ||
| DISTRO_BIT_MAP[distroName][version] = bitFlag; | ||
| bitFlag *= 2; | ||
| } | ||
| } |
There was a problem hiding this comment.
bitFlag is a Number that will exceed Number.MAX_SAFE_INTEGER once enough versions are added, causing silent precision loss. Since you later compare via BigInt(...), build and store the bitflags as BigInt from the start (e.g., initialize bitFlag as 1n and multiply by 2n) so DISTRO_BIT_MAP values remain exact.
|
|
||
| let allRows = []; | ||
| for (const table of tables) { | ||
| let query; | ||
| let params; | ||
| if (exactMatch) { | ||
| query = `SELECT packageName, description, version, osName FROM ${table} WHERE packageName = ?`; | ||
| params = [searchTerm]; | ||
| } else { | ||
| query = `SELECT packageName, description, version, osName FROM ${table} WHERE packageName REGEXP ?`; | ||
| params = [searchTerm]; | ||
| } | ||
| const [rows] = await pool.query(query, params); | ||
| allRows = allRows.concat(rows); | ||
| } | ||
|
|
||
| const totalLength = allRows.length; | ||
| let results = []; | ||
| let lastPage = Math.ceil(totalLength / MAX_RECORDS_TO_SEND); | ||
|
|
||
| if (totalLength <= MAX_RECORDS_TO_SEND) { | ||
| results = allRows; | ||
| } else { | ||
| const startIdx = pageNumber * MAX_RECORDS_TO_SEND; | ||
| let endIdx; | ||
| if (pageNumber === 0) { | ||
| endIdx = MAX_RECORDS_TO_SEND; | ||
| lastPage = 1; // Replicating the weird Python logic where last_page is 1 if page_number is 0? | ||
| // Actually the Python code says: last_page = 1 #math.ceil(totalLength/MAX_RECORDS_TO_SEND) | ||
| } else { | ||
| endIdx = totalLength; | ||
| lastPage = 1; | ||
| } | ||
| results = allRows.slice(startIdx, endIdx); |
There was a problem hiding this comment.
This pulls all matching rows from all tables into memory and only paginates afterward, which can become a major DB+memory bottleneck. Consider pushing pagination down to SQL (e.g., UNION ALL across tables with LIMIT/OFFSET, or apply LIMIT per table plus a merge strategy) so the server never loads an unbounded result set.
| let allRows = []; | |
| for (const table of tables) { | |
| let query; | |
| let params; | |
| if (exactMatch) { | |
| query = `SELECT packageName, description, version, osName FROM ${table} WHERE packageName = ?`; | |
| params = [searchTerm]; | |
| } else { | |
| query = `SELECT packageName, description, version, osName FROM ${table} WHERE packageName REGEXP ?`; | |
| params = [searchTerm]; | |
| } | |
| const [rows] = await pool.query(query, params); | |
| allRows = allRows.concat(rows); | |
| } | |
| const totalLength = allRows.length; | |
| let results = []; | |
| let lastPage = Math.ceil(totalLength / MAX_RECORDS_TO_SEND); | |
| if (totalLength <= MAX_RECORDS_TO_SEND) { | |
| results = allRows; | |
| } else { | |
| const startIdx = pageNumber * MAX_RECORDS_TO_SEND; | |
| let endIdx; | |
| if (pageNumber === 0) { | |
| endIdx = MAX_RECORDS_TO_SEND; | |
| lastPage = 1; // Replicating the weird Python logic where last_page is 1 if page_number is 0? | |
| // Actually the Python code says: last_page = 1 #math.ceil(totalLength/MAX_RECORDS_TO_SEND) | |
| } else { | |
| endIdx = totalLength; | |
| lastPage = 1; | |
| } | |
| results = allRows.slice(startIdx, endIdx); | |
| const whereClause = exactMatch ? 'packageName = ?' : 'packageName REGEXP ?'; | |
| const params = [searchTerm]; | |
| const tableCounts = []; | |
| let totalLength = 0; | |
| for (const table of tables) { | |
| const countQuery = `SELECT COUNT(*) AS totalCount FROM ${table} WHERE ${whereClause}`; | |
| const [countRows] = await pool.query(countQuery, params); | |
| const tableCount = countRows[0]?.totalCount || 0; | |
| tableCounts.push({ table, count: tableCount }); | |
| totalLength += tableCount; | |
| } | |
| let results = []; | |
| let lastPage = Math.ceil(totalLength / MAX_RECORDS_TO_SEND); | |
| if (totalLength > 0) { | |
| let startIdx = 0; | |
| let rowsToFetch = totalLength; | |
| if (totalLength > MAX_RECORDS_TO_SEND) { | |
| startIdx = pageNumber * MAX_RECORDS_TO_SEND; | |
| if (pageNumber === 0) { | |
| rowsToFetch = MAX_RECORDS_TO_SEND; | |
| lastPage = 1; // Replicating the weird Python logic where last_page is 1 if page_number is 0? | |
| // Actually the Python code says: last_page = 1 #math.ceil(totalLength/MAX_RECORDS_TO_SEND) | |
| } else { | |
| rowsToFetch = Math.max(totalLength - startIdx, 0); | |
| lastPage = 1; | |
| } | |
| } | |
| if (rowsToFetch > 0 && startIdx < totalLength) { | |
| let remainingOffset = startIdx; | |
| let remainingToFetch = rowsToFetch; | |
| for (const { table, count } of tableCounts) { | |
| if (remainingToFetch <= 0) { | |
| break; | |
| } | |
| if (count === 0) { | |
| continue; | |
| } | |
| if (remainingOffset >= count) { | |
| remainingOffset -= count; | |
| continue; | |
| } | |
| const tableOffset = remainingOffset; | |
| const tableLimit = Math.min(count - tableOffset, remainingToFetch); | |
| const dataQuery = `SELECT packageName, description, version, osName FROM ${table} WHERE ${whereClause} LIMIT ? OFFSET ?`; | |
| const [rows] = await pool.query(dataQuery, [...params, tableLimit, tableOffset]); | |
| results = results.concat(rows); | |
| remainingToFetch -= rows.length; | |
| remainingOffset = 0; | |
| } | |
| } |
| const totalLength = allRows.length; | ||
| let results = []; | ||
| let lastPage = Math.ceil(totalLength / MAX_RECORDS_TO_SEND); | ||
|
|
||
| if (totalLength <= MAX_RECORDS_TO_SEND) { | ||
| results = allRows; | ||
| } else { | ||
| const startIdx = pageNumber * MAX_RECORDS_TO_SEND; | ||
| let endIdx; | ||
| if (pageNumber === 0) { | ||
| endIdx = MAX_RECORDS_TO_SEND; | ||
| lastPage = 1; // Replicating the weird Python logic where last_page is 1 if page_number is 0? | ||
| // Actually the Python code says: last_page = 1 #math.ceil(totalLength/MAX_RECORDS_TO_SEND) | ||
| } else { | ||
| endIdx = totalLength; | ||
| lastPage = 1; | ||
| } | ||
| results = allRows.slice(startIdx, endIdx); | ||
| } |
There was a problem hiding this comment.
The pagination metadata is internally inconsistent: lastPage is always forced to 1 when totalLength > MAX_RECORDS_TO_SEND, and for pageNumber > 0 you set endIdx = totalLength, which returns all remaining rows rather than one page. This also makes more_available (computed later) misleading. Recommendation: compute endIdx = startIdx + MAX_RECORDS_TO_SEND, calculate last_page consistently from totalLength, and derive more_available from whether endIdx < totalLength.
| res.json({ | ||
| total_packages: totalLength, | ||
| current_page: pageNumber, | ||
| last_page: lastPage, | ||
| more_available: totalLength !== results.length, | ||
| packages: results | ||
| }); |
There was a problem hiding this comment.
more_available: totalLength !== results.length does not reflect whether there are additional pages; it compares the total count to the size of the current response, which will be true on any non-zero page even when you’re already at the end. Compute more_available based on paging indices (e.g., startIdx + results.length < totalLength) so clients can rely on it.
| }); | ||
| } catch (err) { | ||
| console.error(err); | ||
| res.status(500).json({ error: err.message }); |
There was a problem hiding this comment.
Returning err.message can leak internal details (e.g., SQL errors, schema/table names, host info) to clients. Prefer returning a generic error message (and optionally a request correlation id), while keeping the detailed error logged server-side.
| res.status(500).json({ error: err.message }); | |
| res.status(500).json({ error: 'Internal server error' }); |
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: ssh and git module update | ||
| uses: appleboy/ssh-action@master |
There was a problem hiding this comment.
Using an action from a floating ref (@master) is a supply-chain risk because the workflow can change without review. Pin this to a specific release tag or (preferably) a commit SHA to make builds auditable and reduce the risk of unexpected upstream changes.
| uses: appleboy/ssh-action@master | |
| uses: appleboy/ssh-action@v1.0.3 |
| | Repository | Description | | ||
| | -------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | [Software Discovery Tool](https://github.com/openmainframeproject/software-discovery-tool) | The main repository of the Software Discovery Tool contains the core functionalities and codebase of the tool. It enables you to search and discover open source software for `zArchitecture/s390x`. You'll find the source code, documentation, and guidelines for contributing to the project in this repository. | | ||
| | [Software Discovery Tool Data](https://github.com/openmainframeproject/software-discovery-tool-data) | The Software Discovery Tool Data repository holds the necessary data for the software available in the tool. It contains information such as the names and versions of the software that you can discover using the Software Discovery Tool. This data is regularly updated to provide you with the most comprehensive and up-to-date software inventory. | | ||
| | [Software Discovery Tool Deploy](https://github.com/openmainframeproject/software-discovery-tool-deploy) | The Software Discovery Tool Deploy repository serves as the deployed version of the main Software Discovery Tool repository. It utilizes the main repository as a submodule and integrates the data repository as a sub-submodule. This deployment repository enables you to quickly set up and run the Software Discovery Tool in your environment. Read more about it [here](https://gist.github.com/rachejazz/de39c09612788635d5d0f491dcf8571a). | |
There was a problem hiding this comment.
The markdown table rows start with ||, which creates an extra empty column and renders incorrectly in most markdown viewers. Use a single leading | per row (and ensure consistent trailing pipes) so the table displays as intended.
| # Add user credentials to .env file | ||
| DB_HOST=localhost | ||
| DB_USER=sdtreaduser | ||
| DB_PASSWORD=UPDATEME | ||
| DB_NAME=sdtDB |
There was a problem hiding this comment.
There are now two .env.example files (repo root and backend/.env.example) with different values/keys (e.g., backend includes PORT). This is likely to confuse setup and deployment. Recommend consolidating to a single source of truth (preferably under backend/) or clearly documenting which file is used in which context and keeping the variable sets aligned.
| @@ -0,0 +1,11 @@ | |||
| The Angular, Bootstrap and JQuery dependencies are received and made available under the MIT license, as follows: | |||
There was a problem hiding this comment.
Corrected spelling/capitalization of 'JQuery' to 'jQuery' (the official project capitalization).
|
|
||
| Angular (Copyright 2010-2017 Google, Inc.) | ||
| Bootstrap (Copyright 2012-2017 the AngularUI Team) | ||
| JQuery (Copyright JS Foundation and other contributors) |
There was a problem hiding this comment.
Corrected spelling/capitalization of 'JQuery' to 'jQuery' (the official project capitalization).
c89ba4f to
67b9b11
Compare
|
@vmuralictr Can you take a look at this PR from @TusharB-07 to see if it's aligned with what you presented at our last meeting? |
vmuralictr
left a comment
There was a problem hiding this comment.
Hi @TusharB-07 thanks for the contribution! I tested this locally and the backend starts cleanly, distro bitmaps load correctly, and curl-based searches work. Good foundation.
A few things need to be fixed before merging:
The API response shape is not identical to Flask. Flask returned packages as arrays but this returns objects, so search results render empty in the UI because SearchBar.jsx uses array indexing like pkg[0], pkg[1]. Either match Flask's format or update SearchBar.jsx to use object keys.
The database_build.py script hardcodes /opt/software-discovery-tool so local setup doesn't work out of the box. Please add local dev setup instructions to the README.
bitFlag is a JS Number which will silently overflow as more distros are added. Since BigInt is already used in getTables(), initialize bitFlag as 1n and multiply by 2n throughout.
All matching rows are loaded into memory before paginating. With Debian having 60,000+ rows, a broad search can be very heavy. SQL-level LIMIT/OFFSET would be better.
The two .env.example files at repo root and backend/ have different keys which is confusing. Please consolidate to one.
Items 1 to 3 are the blockers. Happy to discuss — thanks again!
036d26b to
54f1ac7
Compare
|
Hi @vmuralictr, thanks for the detailed review. All three blockers are resolved. Please let me know if you’d like any further changes. |
vmuralictr
left a comment
There was a problem hiding this comment.
Please Find the following Points
- React front end is Pointing to http://localhost:3000/
- There is "undefined" string in the endpoint "http://localhost:3000/undefined/getSupportedDistros"
- Can you check all the other End points and Add the swagger documentation for the same ?
|
@TusharB-07 Do you have some time to take a look at the comments from @vmuralictr above? We're working on a strategy to Dockerize bringing up both the front and backends, so we're hoping to have this component finished up soon 😄 |
|
Hi @pleia2 — apologies for the delay; I’ve been busy preparing for my end-semester exams next week. I've now pushed a new commit addressing all of @vmuralictr's review comments and requested a re-review. Here’s a quick summary of the fixes:
Also, I’d be happy to contribute to the Dockerization effort once this is merged — feel free to assign me. |
|
hi @TusharB-07 ,
|
|
@vmuralictr Are these errors in the front-end, rather than the back-end that is being replaced here? This is a big PR and I just want to make sure we're being very specific about what's being fixed in each change, so we don't make it bigger than it has to be 😄 But confirming that the whole thing works is essential to merging this change (warnings are OK if they're front-end only, as long as it works). |
1eec2eb to
f993ab5
Compare
|
hi @vmuralictr, I've pushed the latest fixes:
|
vmuralictr
left a comment
There was a problem hiding this comment.
Approving — a few things worth addressing in a follow-up:
config/supported_distros.py and backend/config.js are duplicates — consolidate to a single source of truth
Wildcard CORS (app.use(cors())) should be restricted before production deployment
database_build.py still requires Python — a JS equivalent would complete the migration
Please resolve the 6 merge conflicts with master before merging
Good work overall. Happy to re-review once conflicts are resolved.
04a6205 to
73b03b8
Compare
- replace Flask backend with Express + MySQL API implementation - move distro mapping to a single JSON source of truth - restrict CORS via ALLOWED_ORIGINS and add backend tests - add Node.js database build script and update docs - remove deprecated Flask source tree Signed-off-by: TusharB-07 <tusharnbiswas07@gmail.com>
73b03b8 to
98b6d00
Compare
|
hi @vmuralictr all the blocking items from your review have been addressed. |
pleia2
left a comment
There was a problem hiding this comment.
Hi @TusharB-07, thanks for hanging in there through this MASSIVE change!
I think we're close to the finish line.
Today I was testing this in a production style environment (real virtual machine, localhost can't be assumed) running Ubuntu 24.04. I've noted the issues I've run into inline.
And please restore bin/config_build.py to automate the process of creating config/distros.json (it'll be very broken because distros.json has a new name, but it's been broken for a long time and we'll just follow up in existing issue #260 later).
| @@ -0,0 +1,13 @@ | |||
| { | |||
| "name": "software-discovery-tool", | |||
| "version": "1.0.0", | |||
There was a problem hiding this comment.
Using the package of node from Ubuntu 24.04, I had to add this line:
"type": "module",
|
|
||
| ```bash | ||
| sudo -u www-data cp src/config/supported_distros.py.example src/config/supported_distros.py | ||
| python3 ./bin/package_build.py RHEL_8_Package_List.json |
There was a problem hiding this comment.
You should add that after these .json files are pulled down, you need to update config/distros.json accordingly (or run bin/config_build.py)
| sudo mysql_secure_installation | ||
| sudo apt install mariadb-server | ||
| sudo mariadb-secure-installation | ||
| ``` |
There was a problem hiding this comment.
Database read only user creation needs to be added back to the documentation here: https://github.com/openmainframeproject/software-discovery-tool/blob/master/docs/Installation.md#log-in-to-mariadb-with-the-root-account-you-set-and-create-the-read-only-user-with-a-password-changed-below-and-database


Summary
Replaces the Flask backend with a lightweight Node.js/Express server.
Since React took over the frontend (#248), Flask was only used as a
thin MySQL interface — this replaces it with a simpler Node.js solution.
Changes
backend/directory with Express + mysql2.env.examplefor environment variablesHow to Run
cd backendcp .env.example .envand fill in DB credentialsnpm installnode index.jsBreaking Changes
None — API response shapes are identical to Flask.
Closes #268