About the project We've built aiusage.ai, a SaaS platform that acts as a middleware layer between users and the Anthropic Claude API. Users provide their own Claude API key, and our backend processes requests through proprietary infrastructure designed to reduce token consumption. We're pre-launch and need thorough third-party QA before opening to paying customers. What we need tested 1. Functional testing (all user flows) Signup, login, password reset, session handling API key submission, encryption at rest, and deletion flows Credit purchase flow across all three tiers ($10, $25, $50) and custom amounts End-to-end request flow: user prompt → our backend → Claude API → response Streaming, tool use, and vision request handling across Sonnet and Haiku models Edge cases: invalid keys, expired keys, rate limits, malformed requests, network failures 2. Pricing and savings claim verification (critical) Our landing page states users save ~$190/month and get "~20× cheaper" Claude bills. We need independent verification: Run identical workloads through (a) direct Claude API and (b) aiusage Measure actual token consumption and cost on the user's Anthropic bill in both cases Test across varied workloads: short prompts, long context, streaming, tool use, vision Document the measured savings ratio with raw data Flag any discrepancy between marketed claims and measured results 3. Security testing API key handling: verify AES-256-GCM encryption at rest, audit key lifecycle, test for leakage in logs, error messages, and network traffic Authentication and session security (OWASP Top 10) Input validation, injection attacks, XSS, CSRF PII scrubbing verification — confirm the claim that prompts are scrubbed server-side before indexing Penetration testing of the key storage and retrieval path TLS configuration, headers, CSP 4. Performance and reliability Latency added by the middleware layer vs. direct Claude API calls Throughput under concurrent load (target: 100 concurrent users) Streaming performance and time-to-first-token Error handling and graceful degradation when Claude API is slow or returns errors 5. Cross-browser and responsive Chrome, Safari, Firefox, Edge (latest 2 versions) Mobile Safari and Chrome on iOS/Android All documented flows work on viewport widths 375px–1920px 6. Content and compliance review All pricing claims on the site match actual measured behavior Terms and Privacy pages are present and reference actual practices No broken links, typos, or inconsistencies across pages Deliverables Test plan document (week 1) for our approval before execution Weekly progress reports with findings logged in a shared tracker (Linear, Jira, or GitHub Issues) Bug reports with reproduction steps, severity, and evidence (screenshots, logs, network captures) Final QA report with: executive summary, all findings by severity, pricing-claim verification data with raw numbers, security audit summary, go/no-go recommendation Regression testing after our team fixes critical and high-severity issues Milestones and payment Milestone 1 ($4,000): Approved test plan + environment setup Milestone 2 ($6,000): Functional + cross-browser testing complete, bugs filed Milestone 3 ($6,000): Security testing + pricing verification complete with data Milestone 4 ($4,000): Performance testing + final report + regression pass Requirements for bidders Team lead with 5+ years QA experience on SaaS products At least one team member with API security / penetration testing experience (OSCP, CEH, or equivalent portfolio) Experience testing LLM-based or API-middleware products is a strong plus Portfolio of past QA reports (redacted is fine) References from 2 previous clients How to apply In your proposal, please include: Your team's relevant experience (links to past work) Proposed test plan outline (not full — just approach) Tools you'll use (test management, security scanners, load testing) Your approach to verifying the pricing/savings claim specifically Timeline and team composition Signing an NDA before accessing staging environment is required.