bhcli

A TUI for chatting on LE PHP Chats
git clone https://git.dasho.dev/bhcli.git
Log | Files | Refs | README

INTEGRATION_TESTS.md (6336B)


AI Moderation Integration Tests

This document explains the AI moderation integration tests that use the actual OpenAI API to test real AI moderation functionality.

Overview

The integration tests are designed to:

Running Integration Tests

Integration tests are marked with #[ignore] to prevent them from running during normal test execution since they:

Prerequisites

  1. Take longer to execute due to network requests
export OPENAI_API_KEY="your-api-key-here"
  1. Take longer to execute due to network requests

Running Individual Tests

# Test harmful message handling
cargo test test_ai_moderation_integration_harmful -- --ignored

# Test safe message handling  
cargo test test_ai_moderation_integration_safe -- --ignored

# Test consistency across multiple requests
cargo test test_ai_moderation_integration_consistency -- --ignored

# Test performance/response time
cargo test test_ai_moderation_integration_performance -- --ignored

# Test edge cases and borderline content
cargo test test_ai_moderation_integration_edge_cases -- --ignored

# Test specific prompt engineering examples
cargo test test_ai_moderation_integration_prompt_variations -- --ignored

Running All Integration Tests

# Run all integration tests with detailed output
cargo test test_ai_moderation_integration -- --ignored --nocapture

Test Descriptions

1. `test_ai_moderation_integration_harmful`

Tests messages that should be moderated:

Expected: Most should be moderated, especially in strict mode.

2. `test_ai_moderation_integration_safe`

Tests messages that should be allowed:

Expected: Should be allowed, especially in lenient mode.

3. `test_ai_moderation_integration_consistency`

Tests the same harmful message multiple times to check consistency.

Expected: Should get consistent results for clear violations.

4. `test_ai_moderation_integration_performance`

Measures AI response time for moderation requests.

Expected: Response within 10 seconds (varies with API load).

5. `test_ai_moderation_integration_edge_cases`

Tests borderline content:

Expected: Most should be allowed as they're innocent.

6. `test_ai_moderation_integration_prompt_variations`

Tests specific examples from our AI prompt to validate prompt engineering.

Expected: Should match the examples in our system prompt exactly.

Key Insights from Integration Tests

AI Behavior Observations

  1. Empty messages, single characters, emojis
  1. Empty messages, single characters, emojis
  1. Empty messages, single characters, emojis

- Strict: Very cautious, may flag borderline content - Balanced: Tries to balance safety with free speech - Lenient: Only flags clear violations

  1. Empty messages, single characters, emojis

Common Test Failures and Their Meaning

  1. Empty messages, single characters, emojis
  1. Empty messages, single characters, emojis
  1. Empty messages, single characters, emojis

Recommendations

For Development

For Production

For Tuning

Cost Considerations

Troubleshooting

API Key Issues

# Check if API key is set
echo $OPENAI_API_KEY

# Test API access
curl -H "Authorization: Bearer $OPENAI_API_KEY" https://api.openai.com/v1/models

Rate Limiting

If you hit rate limits, the tests include delays between requests. You may need to:

Unexpected Results

Future Improvements

  1. Some results may seem counterintuitive but reflect the AI's interpretation
  2. Some results may seem counterintuitive but reflect the AI's interpretation
  3. Some results may seem counterintuitive but reflect the AI's interpretation
  4. Some results may seem counterintuitive but reflect the AI's interpretation
  5. Some results may seem counterintuitive but reflect the AI's interpretation