INDX
Security & Privacy-Preserving RAG
Blog
Digital Transformation

Security & Privacy-Preserving RAG

Solve personal and confidential information leakage risks through PII filters and access control. Detailed enterprise compliance solutions with Pinecone, Presidio implementation examples.

K
Katsuya Ito
CEO
8 min

Security & Privacy-Preserving RAG

As RAG (Retrieval-Augmented Generation) systems become more widespread, protecting corporate confidential information and personal data has become a critical challenge. This article explains specific countermeasures and practical solutions during implementation.

Security Risks in RAG Systems

Major Risk Factors

  • Data leakage: Unintended exposure of confidential information during search and generation processes
  • PII (Personally Identifiable Information) exposure: Personal information such as names, addresses, phone numbers
  • Lack of access control: Data access with inappropriate permissions
  • Information leakage from logs and cache: Confidential information remaining in system logs and cache

Compliance Requirements Faced by Enterprises

typescript
1interface ComplianceRequirements {
2  gdpr: boolean        // EU General Data Protection Regulation
3  ccpa: boolean        // California Consumer Privacy Act
4  hipaa: boolean       // Health Insurance Portability and Accountability Act
5  pci_dss: boolean     // Payment Card Industry Data Security Standard
6  sox: boolean         // Sarbanes-Oxley Act
7}

Defensive Security Measures

1. PII Filtering System

PII detection and masking using Microsoft Presidio:

python
1from presidio_analyzer import AnalyzerEngine
2from presidio_anonymizer import AnonymizerEngine
3
4# Initialize PII analyzer
5analyzer = AnalyzerEngine()
6anonymizer = AnonymizerEngine()
7
8def sanitize_input(text: str) -> str:
9    """Detect and anonymize PII from input text"""
10    # Analyze PII elements
11    results = analyzer.analyze(
12        text=text,
13        entities=["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"],
14        language="en"
15    )
16    
17    # Anonymize detected PII
18    anonymized_result = anonymizer.anonymize(
19        text=text,
20        analyzer_results=results
21    )
22    
23    return anonymized_result.text
24
25# Usage example
26user_query = "John Smith's phone number is 555-123-4567"
27safe_query = sanitize_input(user_query)
28# Result: "<PERSON>'s phone number is <PHONE_NUMBER>"

2. Access Control System

Role-Based Access Control (RBAC) implementation:

typescript
1interface UserPermissions {
2  userId: string
3  roles: string[]
4  departments: string[]
5  securityLevel: number
6}
7
8interface DocumentMetadata {
9  id: string
10  securityLevel: number
11  department: string
12  classification: 'public' | 'internal' | 'confidential' | 'restricted'
13}
14
15class SecureRAGSystem {
16  async searchWithPermissions(
17    query: string, 
18    userPermissions: UserPermissions
19  ): Promise<SearchResult[]> {
20    // 1. Filter based on user permissions
21    const accessibleDocs = await this.filterByPermissions(userPermissions)
22    
23    // 2. Security level check
24    const secureQuery = await this.sanitizeQuery(query)
25    
26    // 3. Execute search based on permissions
27    const results = await this.performSecureSearch(secureQuery, accessibleDocs)
28    
29    // 4. Filter response
30    return this.filterResponse(results, userPermissions)
31  }
32  
33  private async filterByPermissions(
34    permissions: UserPermissions
35  ): Promise<string[]> {
36    return this.documentService.getAccessibleDocuments({
37      userId: permissions.userId,
38      securityLevel: permissions.securityLevel,
39      departments: permissions.departments
40    })
41  }
42}

3. Data Encryption and Secure Storage

Secure implementation with Pinecone:

typescript
1import { PineconeClient } from 'pinecone-client'
2import { encrypt, decrypt } from './encryption-utils'
3
4class SecurePineconeService {
5  private client: PineconeClient
6  private encryptionKey: string
7  
8  constructor(apiKey: string, encryptionKey: string) {
9    this.client = new PineconeClient({ apiKey })
10    this.encryptionKey = encryptionKey
11  }
12  
13  async upsertSecureVectors(
14    indexName: string, 
15    vectors: Vector[]
16  ): Promise<void> {
17    // Encrypt metadata
18    const encryptedVectors = vectors.map(vector => ({
19      ...vector,
20      metadata: {
21        ...vector.metadata,
22        // Encrypt confidential information
23        content: encrypt(vector.metadata.content, this.encryptionKey),
24        // Keep access control info in plaintext (for search)
25        department: vector.metadata.department,
26        securityLevel: vector.metadata.securityLevel
27      }
28    }))
29    
30    const index = this.client.Index(indexName)
31    await index.upsert({ vectors: encryptedVectors })
32  }
33  
34  async querySecure(
35    indexName: string,
36    queryVector: number[],
37    userPermissions: UserPermissions
38  ): Promise<QueryResult[]> {
39    const index = this.client.Index(indexName)
40    
41    // Apply access control filter
42    const filter = {
43      $and: [
44        { securityLevel: { $lte: userPermissions.securityLevel } },
45        { department: { $in: userPermissions.departments } }
46      ]
47    }
48    
49    const results = await index.query({
50      vector: queryVector,
51      filter,
52      topK: 10,
53      includeMetadata: true
54    })
55    
56    // Decrypt and return
57    return results.matches?.map(match => ({
58      ...match,
59      metadata: {
60        ...match.metadata,
61        content: decrypt(match.metadata.content, this.encryptionKey)
62      }
63    })) || []
64  }
65}

Audit and Log Management

Security Logging System

typescript
1interface SecurityLog {
2  timestamp: Date
3  userId: string
4  action: 'search' | 'access' | 'download' | 'share'
5  resourceId: string
6  securityLevel: number
7  result: 'allowed' | 'denied'
8  riskLevel: 'low' | 'medium' | 'high'
9}
10
11class SecurityAuditService {
12  async logSecurityEvent(event: SecurityLog): Promise<void> {
13    // Record security log
14    await this.secureLogStorage.store({
15      ...event,
16      hash: this.generateEventHash(event)
17    })
18    
19    // Send alert for high-risk events
20    if (event.riskLevel === 'high') {
21      await this.alertService.sendSecurityAlert(event)
22    }
23  }
24  
25  async generateComplianceReport(
26    startDate: Date, 
27    endDate: Date
28  ): Promise<ComplianceReport> {
29    const logs = await this.secureLogStorage.query({
30      timestamp: { $gte: startDate, $lte: endDate }
31    })
32    
33    return {
34      totalAccesses: logs.length,
35      unauthorizedAttempts: logs.filter(log => log.result === 'denied').length,
36      highRiskEvents: logs.filter(log => log.riskLevel === 'high').length,
37      departmentBreakdown: this.aggregateByDepartment(logs)
38    }
39  }
40}

Implementation Best Practices

1. Secure Development Lifecycle

typescript
1// Security configuration example
2const securityConfig = {
3  // Encryption settings
4  encryption: {
5    algorithm: 'AES-256-GCM',
6    keyRotationInterval: 90 * 24 * 60 * 60 * 1000, // 90 days
7  },
8  
9  // Access control
10  accessControl: {
11    sessionTimeout: 30 * 60 * 1000, // 30 minutes
12    maxFailedAttempts: 3,
13    lockoutDuration: 15 * 60 * 1000 // 15 minutes
14  },
15  
16  // Logging settings
17  logging: {
18    retentionPeriod: 365 * 24 * 60 * 60 * 1000, // 1 year
19    logLevel: 'info',
20    sensitiveDataMasking: true
21  }
22}

2. Security Testing

typescript
1describe('Security Tests', () => {
2  test('should mask PII in responses', async () => {
3    const query = "Tell me about John Smith's information"
4    const response = await secureRAG.query(query, userPermissions)
5    
6    expect(response.content).not.toContain('John Smith')
7    expect(response.content).toContain('<PERSON>')
8  })
9  
10  test('should deny access to unauthorized documents', async () => {
11    const lowLevelUser = { securityLevel: 1, departments: ['sales'] }
12    const query = "Tell me about the confidential project"
13    
14    const results = await secureRAG.searchWithPermissions(query, lowLevelUser)
15    
16    expect(results).toHaveLength(0)
17  })
18})

3. Continuous Security Monitoring

typescript
1class SecurityMonitor {
2  async monitorRealTime(): Promise<void> {
3    // Detect abnormal access patterns
4    const suspiciousActivities = await this.detectAnomalies()
5    
6    for (const activity of suspiciousActivities) {
7      await this.handleSecurityIncident(activity)
8    }
9  }
10  
11  private async detectAnomalies(): Promise<SecurityIncident[]> {
12    // Anomaly detection using machine learning
13    const recentLogs = await this.getRecentSecurityLogs()
14    return this.anomalyDetector.analyze(recentLogs)
15  }
16}

Compliance Implementation

GDPR Compliance Implementation

typescript
1class GDPRCompliantRAG {
2  async handleDataSubjectRequest(
3    requestType: 'access' | 'rectification' | 'erasure',
4    userId: string
5  ): Promise<void> {
6    switch (requestType) {
7      case 'access':
8        // Data portability rights
9        const userData = await this.exportUserData(userId)
10        await this.sendUserDataExport(userData)
11        break
12        
13      case 'erasure':
14        // Right to be forgotten
15        await this.deleteAllUserData(userId)
16        await this.removeFromVectorDatabase(userId)
17        break
18        
19      case 'rectification':
20        // Right to rectification
21        await this.updateUserData(userId)
22        break
23    }
24  }
25  
26  async ensureDataMinimization(): Promise<void> {
27    // Implement data minimization principle
28    const unnecessaryData = await this.identifyUnnecessaryData()
29    await this.safelyDeleteData(unnecessaryData)
30  }
31}

Conclusion

Building secure RAG systems requires the following key elements:

  • Defense in depth: Combination of PII filters, access control, and encryption
  • Continuous monitoring: Real-time security monitoring and anomaly detection
  • Compliance adherence: Compliance with regulations like GDPR and CCPA
  • Security by design: Security considerations from the design phase

By appropriately combining tools like Pinecone and Presidio, you can build RAG systems that meet enterprise-level security requirements. Through continuous security audits and improvements, you can achieve safe and reliable AI systems.

Tags

セキュリティ
プライバシー
PII
Pinecone
Presidio