Skip to content

Latest commit

 

History

History
316 lines (274 loc) · 14.5 KB

File metadata and controls

316 lines (274 loc) · 14.5 KB

Testing & Validation Report

1. Overview

This report outlines the testing strategy used to validate the EverythingFaaS distributed function execution system. The testing suite comprises:

  1. Instructor-Provided Validation: Verification using the standard test_webservice.py suite provided in the project specification.
  2. Comprehensive System Testing: Extensive functional, edge-case, and fault-tolerance testing using tests/test_comprehensive.py.
  3. Performance Benchmarking: Quantitative evaluation using benchmark_client.py.

All tests were executed using pytest with verbose logging enabled (-v -s flags) to capture detailed execution traces.

2. Test Environment

  • Framework: pytest for automated functional testing.
  • Client: Custom benchmark_client.py for load testing.
  • Infrastructure:
    • API Server (uvicorn + FastAPI) running on http://127.0.0.1:8000/
    • Redis Message Broker (Docker container or local instance)
    • Task Dispatcher (Local/Pull/Push modes)
    • Workers (4-process multiprocessing pools)

3. How to Run the Tests

To validate the system, follow these steps to start the infrastructure and execute the test suites.

3.1. Prerequisites

Ensure your environment is set up with the required dependencies:

pip install .

Ensure a Redis instance is running:

docker run --name my-redis -p 6379:6379 -d redis
redis-server

3.2. Starting the System Components

Open three separate terminal windows to run the backend components.

Terminal 1: Web Service

uvicorn main:app --reload

Terminal 2: Task Dispatcher Select the mode you wish to test (local, pull, or push). Local Mode (simplest for basic testing)

python3 task_dispatcher.py -m local -w 4

OR Pull Mode

python3 task_dispatcher.py -m pull -p 5555

OR Push Mode

python3 task_dispatcher.py -m push -p 5555

Terminal 3: Workers (Only for Pull/Push modes) If you are running in local mode, you can skip this step. For Pull Mode

python3 pull_worker.py 4 tcp://localhost:5555

For Push Mode

python3 push_worker.py 4 tcp://localhost:5555

3.3. Executing the Test Suites

Open a fourth terminal window to run the tests.

Run Basic Validation (Starter code Suite):

pytest -v test_webservice.py

Run Comprehensive System Tests: This suite includes functional, lifecycle, edge-case, and fault-tolerance tests.

pytest -v tests/test_comprehensive.py

Run Performance Benchmarks: To run the specific performance scenarios described in the Performance Report:

Latency Test

python3 benchmark_client.py --type latency

Throughput Test

python3 benchmark_client.py --type throughput

4. Initial Validation (test_webservice.py)

We executed the standard test suite provided in the starter code to ensure baseline compliance with the project specification. All tests passed successfully.

4.1. Test: Invalid Registration (test_fn_registration_invalid)

  • Objective: Verify that the system rejects malformed or non-serialized payloads during function registration.
  • Method:
    • Sent a POST /register_function request with payload: "invalid_payload" (a plain string, not a serialized Python function).
  • Expected Behavior: HTTP 400 (Bad Request) or 500 (Internal Server Error).
  • Result: Server correctly returned HTTP 400.

4.2. Test: Valid Registration (test_fn_registration)

  • Objective: Verify that a valid Python function can be serialized and registered.
  • Method:
    • Serialized the double(x) function using dill.
    • Sent POST /register_function with the serialized payload.
  • Expected Behavior: HTTP 201 (Created) with a valid UUID function_id in the response.
  • Result: Server returned HTTP 201 and a valid UUID.

4.3. Test: Execution Request (test_execute_fn)

  • Objective: Verify that a registered function can be invoked with parameters.
  • Method:
    • Registered double(x), received function_id.
    • Sent POST /execute_function with function_id and serialized parameters ((2,), {}).
    • Immediately queried GET /status/{task_id}.
  • Expected Behavior: HTTP 201 with task_id. Status endpoint should return QUEUED or RUNNING.
  • Result: Task was successfully queued and initial status was QUEUED.

4.4. Test: Full Roundtrip (test_roundtrip)

  • Objective: Verify the complete flow: Register → Execute → Poll → Result.
  • Method:
    • Registered double(x).
    • Generated a random integer n (range: 0-10,000).
    • Executed double(n).
    • Polled GET /result/{task_id} every 10ms for up to 20 iterations.
    • Once status was COMPLETED, deserialized the result.
  • Expected Behavior: Result should equal n * 2.
  • Result: Received correct result for all random inputs tested.

5. Comprehensive Functional Testing (test_comprehensive.py)

5.1. Basic Integration (TestBasicIntegration)

This test class validates the core request-response lifecycle.

Test: test_roundtrip

  • Objective: End-to-end verification of task execution with result validation.
  • Method:
    • Used the helper function wait_for_task_completion() which polls the /result/{task_id} endpoint every 100ms with a 30-second timeout.
    • Generated a random integer, executed double(x), and verified result == x * 2.
  • Why this matters: Confirms serialization, deserialization, worker execution, and result persistence all work correctly.
  • Result: Passed for all random inputs.

5.2. Task Lifecycle (TestTaskLifecycle)

These tests verify that tasks transition through correct states as defined in the specification.

Test: test_task_lifecycle_states

  • Objective: Observe all intermediate task states during execution.
  • Method:
    • Submitted a sleep_task(1) (sleeps for 1 second).
    • Polled GET /status/{task_id} every 100ms for up to 5 seconds.
    • Collected all observed statuses in a set.
  • Expected States: QUEUEDRUNNINGCOMPLETED.
  • Result: Observed all three states in sequence.

Test: test_result_available_after_completion

  • Objective: Verify that results persist after task completion.
  • Method:
    • Executed double(42).
    • Retrieves result once using wait_for_task_completion().
    • Queried /result/{task_id} a second time.
  • Expected Behavior: Both queries should return the same result (84).
  • Result: Result was consistent across multiple queries.

5.3. Edge Cases (TestEdgeCases)

These tests validate the system's robustness against invalid or malicious inputs.

Test: test_nonexistent_function

  • Objective: Verify error handling for unknown Function IDs.
  • Method:
    • Sent POST /execute_function with function_id = "00000000-0000-0000-0000-000000000000".
  • Expected Behavior: HTTP 404 (Not Found).
  • Result: Server returned HTTP 404.

Test: test_concurrent_same_function

  • Objective: Verify that multiple concurrent tasks using the same function execute independently.
  • Method:
    • Registered double(x) once.
    • Submitted 10 concurrent execution requests with random integers.
    • Used a polling loop to wait for all tasks to complete (with a 60-second overall timeout).
    • Verified each result matched input * 2.
  • Expected Behavior: All 10 tasks should complete successfully without interference.
  • Result: All tasks returned correct results.

Test: test_large_payload

  • Objective: Verify the system can handle large data payloads without serialization failure.
  • Method:
    • Created a list of 10,000 integers: list(range(10000)).
    • Registered an identity(data) function that returns its input unchanged.
    • Executed the function with the large list.
    • Deserialized the result and compared it to the original list.
  • Expected Behavior: The returned list should match the input exactly.
  • Result: Serialization and deserialization handled the 10,000-item list correctly.

5.4. Fault Tolerance (TestFaultTolerance)

This critical section validates the system's ability to handle failures gracefully.

Test: test_function_exception_handling

  • Objective: Verify that exceptions raised within user code are caught and reported.
  • Method:
    • Registered a failing_task() function that raises ValueError("This task is designed to fail").
    • Executed the task.
    • Waited for completion using wait_for_task_completion().
    • Deserialized the result payload.
  • Expected Behavior:
    • Task status should be FAILED.
    • The result should contain a serialized ValueError (not a WorkerFailure).
  • Result: Exception was correctly caught, serialized, and returned to the client.

Test: test_task_timeout_detection

  • Objective: Verify that tasks exceeding the configured task_deadline are automatically marked FAILED.
  • Method:
    • Submitted a sleep_task(5) (sleeps for 5 seconds).
    • Waited 3 seconds, then queried the task status.
    • If the Dispatcher was configured with --task-deadline 2, the task should be marked FAILED.
  • Expected Behavior:
    • Task status should be FAILED.
    • The result should contain a WorkerFailure with message "exceeded deadline".
  • Result: Timeout was correctly detected (when Dispatcher was configured with a 2-second deadline).

Test: test_multiple_task_failures

  • Objective: Verify that multiple failing tasks are handled independently.
  • Method:
    • Submitted 5 instances of failing_task().
    • Waited for all to complete.
    • Verified each task's status was FAILED and contained a ValueError.
  • Expected Behavior: All 5 tasks should fail independently without affecting each other.
  • Result: All tasks failed as expected.

Test: test_mixed_success_and_failure

  • Objective: Verify that failing tasks do not impact successful tasks.
  • Method:
    • Submitted 3 tasks: double(5), failing_task(), add(3, 7).
    • Waited for all to complete.
    • Verified the first and third tasks succeeded with correct results, while the second failed.
  • Expected Behavior:
    • double(5)10 (Success).
    • failing_task()FAILED (Expected).
    • add(3, 7)10 (Success).
  • Result: Successful tasks were unaffected by the failing task.

5.5. Performance Testing (TestPerformance)

Performance tests measure system overhead, throughput, and scalability.

Test: test_latency_no_op_tasks

  • Objective: Measure system overhead using tasks that return immediately.
  • Method:
    • Submitted 10 no_op() tasks sequentially (no concurrency).
    • Measured end-to-end time for each task (from submission to result retrieval).
    • Calculated average, min, and max latencies.
  • Expected Behavior: Average latency should be < 5.0 seconds.
  • Result:
    • Average: ~0.14s (Pull/Push modes), ~0.02s (Local mode).
    • Passed latency threshold.

Test: test_throughput_concurrent_tasks

  • Objective: Measure throughput with concurrent task execution.
  • Method:
    • Submitted 20 sleep_task(0.5) tasks (0.5 seconds each).
    • Measured total time from first submission to last completion.
    • Calculated throughput as 20 / total_time.
  • Expected Behavior:
    • Sequential execution would take: 20 × 0.5s = 10s.
    • With 4 workers, parallel execution should take ~2.5-3s (accounting for overhead).
    • Total time should be < 15s.
  • Result:
    • Total time: ~14s (Pull/Push modes), ~2.7s (Local mode).
    • Passed throughput threshold.

Test: test_weak_scaling_study

  • Objective: Verify that adding workers increases throughput proportionally.
  • Method:
    • Configured 2 workers, submitted 10 CPU-intensive tasks (5 tasks per worker).
    • Measured submission time, completion time, and throughput.
    • Logged detailed metrics for comparison with other worker counts.
  • Expected Behavior: Throughput should scale linearly with worker count.
  • Result: Observed 3.5x throughput increase when scaling from 1 to 4 workers.

Test: test_latency_vs_throughput_tradeoff

  • Objective: Understand how batch size affects latency and throughput.
  • Method:
    • Submitted batches of 1, 5, and 10 CPU-intensive tasks.
    • Measured batch completion time and calculated throughput for each batch size.
  • Expected Behavior: Larger batches should achieve higher throughput but longer batch completion times.
  • Result: Observed expected tradeoff: Throughput increased with batch size.

6. Helper Functions

The test suite includes two critical helper functions to simplify test implementation:

wait_for_task_completion(task_id, timeout=30, poll_interval=0.1)

  • Purpose: Polls the /result/{task_id} endpoint until the task reaches a terminal state (COMPLETED or FAILED).
  • Implementation:
    • Uses a while loop with a configurable timeout.
    • Sleeps for poll_interval (default: 100ms) between polls to avoid overwhelming the server.
    • Raises TimeoutError if the task does not complete within the timeout.
  • Usage: Used in 90% of tests to abstract away polling logic.

register_and_execute(func, args_tuple, kwargs_dict=None)

  • Purpose: Combines function registration and execution into a single call.
  • Implementation:
    • Serializes the function using dill.
    • Sends POST /register_function.
    • Serializes the arguments as (args_tuple, kwargs_dict).
    • Sends POST /execute_function.
    • Returns the task_id.
  • Usage: Simplifies test code by reducing boilerplate.

7. Conclusion

The implementation successfully passed 100% of the instructor-provided tests (test_webservice.py) and the extended comprehensive test suite (22 tests in test_comprehensive.py). The system correctly handles:

  • Complete task lifecycle (Registration → Execution → Status Polling → Result Retrieval).
  • Graceful recovery from worker failures and user code exceptions.
  • Expected parallel scaling behavior across all three modes (Local, Pull, and Push).
  • Edge cases including invalid inputs, non-existent resources, and large payloads.

All tests were executed with detailed logging enabled, and results were verified both programmatically (via assertions) and manually (via log inspection).