This report outlines the testing strategy used to validate the EverythingFaaS distributed function execution system. The testing suite comprises:
- Instructor-Provided Validation: Verification using the standard
test_webservice.pysuite provided in the project specification. - Comprehensive System Testing: Extensive functional, edge-case, and fault-tolerance testing using
tests/test_comprehensive.py. - Performance Benchmarking: Quantitative evaluation using
benchmark_client.py.
All tests were executed using pytest with verbose logging enabled (-v -s flags) to capture detailed execution traces.
- Framework:
pytestfor automated functional testing. - Client: Custom
benchmark_client.pyfor load testing. - Infrastructure:
- API Server (
uvicorn+FastAPI) running onhttp://127.0.0.1:8000/ - Redis Message Broker (Docker container or local instance)
- Task Dispatcher (Local/Pull/Push modes)
- Workers (4-process multiprocessing pools)
- API Server (
To validate the system, follow these steps to start the infrastructure and execute the test suites.
Ensure your environment is set up with the required dependencies:
pip install .Ensure a Redis instance is running:
docker run --name my-redis -p 6379:6379 -d redisredis-serverOpen three separate terminal windows to run the backend components.
Terminal 1: Web Service
uvicorn main:app --reloadTerminal 2: Task Dispatcher
Select the mode you wish to test (local, pull, or push).
Local Mode (simplest for basic testing)
python3 task_dispatcher.py -m local -w 4OR Pull Mode
python3 task_dispatcher.py -m pull -p 5555OR Push Mode
python3 task_dispatcher.py -m push -p 5555Terminal 3: Workers (Only for Pull/Push modes)
If you are running in local mode, you can skip this step.
For Pull Mode
python3 pull_worker.py 4 tcp://localhost:5555For Push Mode
python3 push_worker.py 4 tcp://localhost:5555Open a fourth terminal window to run the tests.
Run Basic Validation (Starter code Suite):
pytest -v test_webservice.pyRun Comprehensive System Tests: This suite includes functional, lifecycle, edge-case, and fault-tolerance tests.
pytest -v tests/test_comprehensive.pyRun Performance Benchmarks: To run the specific performance scenarios described in the Performance Report:
Latency Test
python3 benchmark_client.py --type latencyThroughput Test
python3 benchmark_client.py --type throughputWe executed the standard test suite provided in the starter code to ensure baseline compliance with the project specification. All tests passed successfully.
- Objective: Verify that the system rejects malformed or non-serialized payloads during function registration.
- Method:
- Sent a
POST /register_functionrequest withpayload: "invalid_payload"(a plain string, not a serialized Python function).
- Sent a
- Expected Behavior: HTTP 400 (Bad Request) or 500 (Internal Server Error).
- Result: Server correctly returned HTTP 400.
- Objective: Verify that a valid Python function can be serialized and registered.
- Method:
- Serialized the
double(x)function usingdill. - Sent
POST /register_functionwith the serialized payload.
- Serialized the
- Expected Behavior: HTTP 201 (Created) with a valid UUID
function_idin the response. - Result: Server returned HTTP 201 and a valid UUID.
- Objective: Verify that a registered function can be invoked with parameters.
- Method:
- Registered
double(x), receivedfunction_id. - Sent
POST /execute_functionwithfunction_idand serialized parameters((2,), {}). - Immediately queried
GET /status/{task_id}.
- Registered
- Expected Behavior: HTTP 201 with
task_id. Status endpoint should returnQUEUEDorRUNNING. - Result: Task was successfully queued and initial status was
QUEUED.
- Objective: Verify the complete flow: Register → Execute → Poll → Result.
- Method:
- Registered
double(x). - Generated a random integer
n(range: 0-10,000). - Executed
double(n). - Polled
GET /result/{task_id}every 10ms for up to 20 iterations. - Once status was
COMPLETED, deserialized the result.
- Registered
- Expected Behavior: Result should equal
n * 2. - Result: Received correct result for all random inputs tested.
This test class validates the core request-response lifecycle.
- Objective: End-to-end verification of task execution with result validation.
- Method:
- Used the helper function
wait_for_task_completion()which polls the/result/{task_id}endpoint every 100ms with a 30-second timeout. - Generated a random integer, executed
double(x), and verifiedresult == x * 2.
- Used the helper function
- Why this matters: Confirms serialization, deserialization, worker execution, and result persistence all work correctly.
- Result: Passed for all random inputs.
These tests verify that tasks transition through correct states as defined in the specification.
- Objective: Observe all intermediate task states during execution.
- Method:
- Submitted a
sleep_task(1)(sleeps for 1 second). - Polled
GET /status/{task_id}every 100ms for up to 5 seconds. - Collected all observed statuses in a set.
- Submitted a
- Expected States:
QUEUED→RUNNING→COMPLETED. - Result: Observed all three states in sequence.
- Objective: Verify that results persist after task completion.
- Method:
- Executed
double(42). - Retrieves result once using
wait_for_task_completion(). - Queried
/result/{task_id}a second time.
- Executed
- Expected Behavior: Both queries should return the same result (
84). - Result: Result was consistent across multiple queries.
These tests validate the system's robustness against invalid or malicious inputs.
- Objective: Verify error handling for unknown Function IDs.
- Method:
- Sent
POST /execute_functionwithfunction_id = "00000000-0000-0000-0000-000000000000".
- Sent
- Expected Behavior: HTTP 404 (Not Found).
- Result: Server returned HTTP 404.
- Objective: Verify that multiple concurrent tasks using the same function execute independently.
- Method:
- Registered
double(x)once. - Submitted 10 concurrent execution requests with random integers.
- Used a polling loop to wait for all tasks to complete (with a 60-second overall timeout).
- Verified each result matched
input * 2.
- Registered
- Expected Behavior: All 10 tasks should complete successfully without interference.
- Result: All tasks returned correct results.
- Objective: Verify the system can handle large data payloads without serialization failure.
- Method:
- Created a list of 10,000 integers:
list(range(10000)). - Registered an
identity(data)function that returns its input unchanged. - Executed the function with the large list.
- Deserialized the result and compared it to the original list.
- Created a list of 10,000 integers:
- Expected Behavior: The returned list should match the input exactly.
- Result: Serialization and deserialization handled the 10,000-item list correctly.
This critical section validates the system's ability to handle failures gracefully.
- Objective: Verify that exceptions raised within user code are caught and reported.
- Method:
- Registered a
failing_task()function that raisesValueError("This task is designed to fail"). - Executed the task.
- Waited for completion using
wait_for_task_completion(). - Deserialized the result payload.
- Registered a
- Expected Behavior:
- Task status should be
FAILED. - The result should contain a serialized
ValueError(not aWorkerFailure).
- Task status should be
- Result: Exception was correctly caught, serialized, and returned to the client.
- Objective: Verify that tasks exceeding the configured
task_deadlineare automatically markedFAILED. - Method:
- Submitted a
sleep_task(5)(sleeps for 5 seconds). - Waited 3 seconds, then queried the task status.
- If the Dispatcher was configured with
--task-deadline 2, the task should be markedFAILED.
- Submitted a
- Expected Behavior:
- Task status should be
FAILED. - The result should contain a
WorkerFailurewith message"exceeded deadline".
- Task status should be
- Result: Timeout was correctly detected (when Dispatcher was configured with a 2-second deadline).
- Objective: Verify that multiple failing tasks are handled independently.
- Method:
- Submitted 5 instances of
failing_task(). - Waited for all to complete.
- Verified each task's status was
FAILEDand contained aValueError.
- Submitted 5 instances of
- Expected Behavior: All 5 tasks should fail independently without affecting each other.
- Result: All tasks failed as expected.
- Objective: Verify that failing tasks do not impact successful tasks.
- Method:
- Submitted 3 tasks:
double(5),failing_task(),add(3, 7). - Waited for all to complete.
- Verified the first and third tasks succeeded with correct results, while the second failed.
- Submitted 3 tasks:
- Expected Behavior:
double(5)→10(Success).failing_task()→FAILED(Expected).add(3, 7)→10(Success).
- Result: Successful tasks were unaffected by the failing task.
Performance tests measure system overhead, throughput, and scalability.
- Objective: Measure system overhead using tasks that return immediately.
- Method:
- Submitted 10
no_op()tasks sequentially (no concurrency). - Measured end-to-end time for each task (from submission to result retrieval).
- Calculated average, min, and max latencies.
- Submitted 10
- Expected Behavior: Average latency should be < 5.0 seconds.
- Result:
- Average: ~0.14s (Pull/Push modes), ~0.02s (Local mode).
- Passed latency threshold.
- Objective: Measure throughput with concurrent task execution.
- Method:
- Submitted 20
sleep_task(0.5)tasks (0.5 seconds each). - Measured total time from first submission to last completion.
- Calculated throughput as
20 / total_time.
- Submitted 20
- Expected Behavior:
- Sequential execution would take: 20 × 0.5s = 10s.
- With 4 workers, parallel execution should take ~2.5-3s (accounting for overhead).
- Total time should be < 15s.
- Result:
- Total time: ~14s (Pull/Push modes), ~2.7s (Local mode).
- Passed throughput threshold.
- Objective: Verify that adding workers increases throughput proportionally.
- Method:
- Configured 2 workers, submitted 10 CPU-intensive tasks (5 tasks per worker).
- Measured submission time, completion time, and throughput.
- Logged detailed metrics for comparison with other worker counts.
- Expected Behavior: Throughput should scale linearly with worker count.
- Result: Observed 3.5x throughput increase when scaling from 1 to 4 workers.
- Objective: Understand how batch size affects latency and throughput.
- Method:
- Submitted batches of 1, 5, and 10 CPU-intensive tasks.
- Measured batch completion time and calculated throughput for each batch size.
- Expected Behavior: Larger batches should achieve higher throughput but longer batch completion times.
- Result: Observed expected tradeoff: Throughput increased with batch size.
The test suite includes two critical helper functions to simplify test implementation:
- Purpose: Polls the
/result/{task_id}endpoint until the task reaches a terminal state (COMPLETEDorFAILED). - Implementation:
- Uses a
whileloop with a configurable timeout. - Sleeps for
poll_interval(default: 100ms) between polls to avoid overwhelming the server. - Raises
TimeoutErrorif the task does not complete within the timeout.
- Uses a
- Usage: Used in 90% of tests to abstract away polling logic.
- Purpose: Combines function registration and execution into a single call.
- Implementation:
- Serializes the function using
dill. - Sends
POST /register_function. - Serializes the arguments as
(args_tuple, kwargs_dict). - Sends
POST /execute_function. - Returns the
task_id.
- Serializes the function using
- Usage: Simplifies test code by reducing boilerplate.
The implementation successfully passed 100% of the instructor-provided tests (test_webservice.py) and the extended comprehensive test suite (22 tests in test_comprehensive.py). The system correctly handles:
- Complete task lifecycle (Registration → Execution → Status Polling → Result Retrieval).
- Graceful recovery from worker failures and user code exceptions.
- Expected parallel scaling behavior across all three modes (Local, Pull, and Push).
- Edge cases including invalid inputs, non-existent resources, and large payloads.
All tests were executed with detailed logging enabled, and results were verified both programmatically (via assertions) and manually (via log inspection).