在Spring生态中融合GCP OIDC身份与ArangoDB图授权实现动态多租户访问控制

分布式架构

文章字数: 3.8k

阅读时长: 16 分

多租户系统的访问控制，在简单的场景下，往往退化为在每张业务表上增加一个tenant_id字段。这种方案在租户、用户、角色关系固定且简单的系统中尚可运作。但在真实的SaaS平台中，需求远比这复杂：一个用户可能同时属于多个租户，且在每个租户中扮演不同角色；权限体系可能需要支持组织架构层级继承；甚至会出现跨租户的临时授权。此时，传统的基于关系型数据库的RBAC（Role-Based Access Control）模型会迅速变得臃肿不堪，查询性能也会因为大量的JOIN和递归查询而急剧下降。

问题的核心在于，权限本质上是一种关系网络，而用关系型数据库去模拟图关系，本身就是一种妥协。这引出了我们的第一个架构决策点：放弃关系型数据库，转而采用一种更原生、更贴合问题域的数据模型来承载复杂的授权逻辑。

方案A：基于关系型数据库的递归查询模型

这是最常见的起点。通常我们会设计几张核心表：tenants, users, roles, permissions, user_tenant_roles (关联用户、租户和角色), role_permissions (关联角色和权限)。

当需要检查用户U在租户T中是否对资源R有操作P的权限时，SQL查询大致如下：

SELECT COUNT(*)
FROM user_tenant_roles utr
JOIN role_permissions rp ON utr.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.id
WHERE utr.user_id = :userId
  AND utr.tenant_id = :tenantId
  AND p.resource_key = :resourceKey
  AND p.action = :action;

这个模型的问题在于其僵硬性。如果引入“角色组”或者“组织单元”的概念，权限可以被继承，我们就必须使用递归公用表表达式（Recursive CTEs）来遍历继承链。查询会变得异常复杂，且在深度嵌套的组织结构下性能堪忧。在真实项目中，这种查询往往是性能瓶颈的重灾区，数据库优化器很难对其进行有效优化。

方案B：基于ArangoDB的图原生授权模型

ArangoDB作为一款多模型数据库，其图数据库能力为我们提供了另一种截然不同的思路。我们可以将授权体系中的所有实体建模为图的顶点（Vertex），将它们之间的关系建模为边（Edge）。

顶点集合 (Vertex Collections):
- Users: 存储用户信息，关键属性是OIDC Provider提供的唯一subject ID。
- Tenants: 存储租户信息。
- Roles: 定义角色，如ADMIN, EDITOR。
- Resources: 定义受保护的资源实体，如Project, Document。
边集合 (Edge Collections):
- MEMBER_OF: 连接Users和Tenants，表示用户是某租户的成员。
- HAS_ROLE: 连接MEMBER_OF这条边和Roles顶点。这里的关键设计是，我们将角色关系附加在成员关系（MEMBER_OF）上，而不是直接连接用户和角色。这使得一个用户在不同租户中可以拥有不同角色。
- CAN_ACCESS: 连接Roles和Resources，并带上权限属性（如READ, WRITE）。

这样的模型天然地表达了“谁（User）在哪（Tenant）以什么身份（Role）能对什么（Resource）做什么（Permission）”这一核心授权问题。查询权限变成了一次图遍历，这正是图数据库的强项。

最终选型与架构概览

我们决定采用方案B。技术栈组合如下：

身份认证 (Authentication): 利用Google Cloud Identity Platform作为OIDC Provider。它为我们处理了用户注册、登录、多因素认证等所有复杂的身份管理流程，并最终向我们的应用颁发一个标准的ID Token (JWT)。
应用框架与安全: Spring Framework，具体使用Spring Security 5的OAuth 2.0 Resource Server模块来接收并验证GCP颁发的JWT。
授权 (Authorization): ArangoDB。在Spring Security的授权流程中，我们会使用从JWT中解析出的用户信息，去查询ArangoDB中的权限图，做出最终的访问决策。
部署环境: Google Cloud (GCP)，便于与Identity Platform无缝集成。

整个流程的架构图如下：

sequenceDiagram
    participant User
    participant Browser
    participant SpringApp as Spring Boot App (on GKE)
    participant GCIP as Google Cloud Identity Platform
    participant ArangoDB as ArangoDB (on GCE/Managed)

    User->>Browser: 访问应用
    Browser->>SpringApp: 请求受保护资源
    SpringApp-->>Browser: 302 重定向到 GCIP 登录页
    Browser->>GCIP: 用户输入凭据
    GCIP-->>Browser: 认证成功，返回ID Token (JWT)
    Browser->>SpringApp: 携带 JWT 再次请求资源 (Authorization: Bearer ...)
    SpringApp->>GCIP: 获取公钥 (JWKS URI) 以验证JWT签名
    GCIP-->>SpringApp: 返回公钥
    SpringApp->>SpringApp: 验证JWT (签名, issuer, audience, 过期时间)
    alt JWT有效
        SpringApp->>ArangoDB: AQL图查询：检查用户权限
        ArangoDB-->>SpringApp: 返回查询结果 (有权限/无权限)
        alt 有权限
            SpringApp-->>Browser: 返回200和资源数据
        else 无权限
            SpringApp-->>Browser: 返回403 Forbidden
        end
    else JWT无效
        SpringApp-->>Browser: 返回401 Unauthorized
    end

核心实现细节

1. Spring Security 配置：JWT验证

首先，我们需要将Spring Boot应用配置为一个OAuth 2.0资源服务器。它需要知道由谁颁发的JWT是可信的。这通过application.yml中的issuer-uri配置完成，Spring Security会自动从该URI派生出JWKS（JSON Web Key Set）地址来获取验证签名的公钥。

application.yml

spring:
  security:
    oauth2:
      resourceserver:
        jwt:
          # 从GCP Identity Platform控制台获取
          # 格式: https://securetoken.google.com/YOUR-GCP-PROJECT-ID
          issuer-uri: "https://securetoken.google.com/my-gcp-project-id"
          # 你的应用在GCP Identity Platform中注册的客户端ID
          audiences:
            - "my-gcp-project-id"

SecurityConfig.java

package com.example.graphauth.config;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.security.config.annotation.method.configuration.EnableMethodSecurity;
import org.springframework.security.config.annotation.web.builders.HttpSecurity;
import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity;
import org.springframework.security.config.http.SessionCreationPolicy;
import org.springframework.security.web.SecurityFilterChain;
import org.springframework.security.web.access.AccessDeniedHandler;

import jakarta.servlet.http.HttpServletResponse;

@Configuration
@EnableWebSecurity
@EnableMethodSecurity(prePostEnabled = true) // 启用@PreAuthorize等方法级安全注解
public class SecurityConfig {

    private static final Logger log = LoggerFactory.getLogger(SecurityConfig.class);

    @Bean
    public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {
        http
            .csrf(csrf -> csrf.disable()) // API服务通常禁用CSRF
            .sessionManagement(session -> session.sessionCreationPolicy(SessionCreationPolicy.STATELESS)) // 无状态会话
            .authorizeHttpRequests(authorize -> authorize
                .requestMatchers("/api/public/**").permitAll() // 公开API
                .anyRequest().authenticated() // 其他所有API都需要认证
            )
            .oauth2ResourceServer(oauth2 -> oauth2.jwt(jwt -> {})) // 配置为JWT资源服务器
            .exceptionHandling(exceptions -> exceptions
                .authenticationEntryPoint((request, response, authException) -> {
                    // 自定义未认证时的响应
                    log.warn("Authentication failed: {}", authException.getMessage());
                    response.sendError(HttpServletResponse.SC_UNAUTHORIZED, "Authentication required");
                })
                .accessDeniedHandler(accessDeniedHandler()) // 自定义权限不足时的响应
            );

        return http.build();
    }
    
    @Bean
    public AccessDeniedHandler accessDeniedHandler() {
        return (request, response, accessDeniedException) -> {
            // 在生产环境中，日志应该更详细，包含请求信息
            log.warn("Access denied for user on resource {}: {}", request.getRequestURI(), accessDeniedException.getMessage());
            response.sendError(HttpServletResponse.SC_FORBIDDEN, "You do not have permission to access this resource");
        };
    }
}

这段配置做了几件关键事情：

开启了基于JWT的资源服务器功能。
设置了会话管理策略为STATELESS，因为每个请求都应携带JWT，服务器无需维护会话状态。
启用了方法级别的安全注解，这是我们集成自定义授权逻辑的关键。
提供了基本的错误处理，对401和403返回更友好的信息并记录日志。

2. ArangoDB 数据模型与实体

我们使用spring-data-arangodb来简化与数据库的交互。

顶点实体 (Vertices)

package com.example.graphauth.db.model;

import com.arangodb.springframework.annotation.Document;
import com.arangodb.springframework.annotation.ArangoId;
import org.springframework.data.annotation.Id;

@Document("users")
public class UserNode {
    @Id // spring-data id
    private String id;

    @ArangoId // ArangoDB's internal id
    private String arangoId;

    // OIDC 'sub' claim, an immutable identifier for the user
    private String subjectId; 
    private String email;
    // ... other user attributes
}

@Document("tenants")
public class TenantNode {
    @Id
    private String id;
    @ArangoId
    private String arangoId;
    
    private String name;
    private String domain; // e.g., acme.myapp.com
}

@Document("roles")
public class RoleNode {
    @Id
    private String id;
    @ArangoId
    private String arangoId;

    private String name; // e.g., TENANT_ADMIN, PROJECT_EDITOR
}

// 这里的ResourceNode是一个泛化的概念，实际可以有多种资源类型
@Document("resources")
public class ResourceNode {
    @Id
    private String id;
    @ArangoId
    private String arangoId;

    private String resourceType; // e.g., "Project", "BillingInfo"
    private String resourceKey; // A unique key for this resource instance
}

边实体 (Edges)

package com.example.graphauth.db.model;

import com.arangodb.springframework.annotation.Edge;
import com.arangodb.springframework.annotation.From;
import com.arangodb.springframework.annotation.To;
import org.springframework.data.annotation.Id;

import java.util.Set;

@Edge("member_of")
public class MemberOfEdge {
    @Id
    private String id;

    @From
    private UserNode user;

    @To
    private TenantNode tenant;

    // 可以在边上附加属性
    private String status; // e.g., "active", "invited"
}

@Edge("has_role")
public class HasRoleEdge {
    @Id
    private String id;

    // 关键：这条边从另一条边出发，ArangoDB支持这种模型
    @From
    private MemberOfEdge membership;

    @To
    private RoleNode role;
}


@Edge("can_access")
public class CanAccessEdge {
    @Id
    private String id;

    @From
    private RoleNode role;

    @To
    private ResourceNode resource;

    // 权限动作直接定义在边上
    private Set<String> actions; // e.g., {"READ", "WRITE", "DELETE"}
}

一个值得注意的细节是HasRoleEdge的@From指向了MemberOfEdge。这在关系型数据库中是无法直接表达的，但在ArangoDB中，边的_from和_to字段可以指向任何文档，包括其他边。然而，更通用的做法是将用户、租户、角色的关系建模为超图或将membership也建模为一个顶点。为简化，这里我们假设一种更直接的遍历路径，即角色是用户在特定租户下的属性。

一个更符合AQL遍历习惯的、更优化的模型是：

User – MEMBER_OF {tenantId: “t1”} -> Tenant
User – HAS_ROLE {tenantId: “t1”, role: “ADMIN”} -> Role

让我们采用第二种更扁平、易于查询的模型，即在HAS_ROLE边上直接存储tenantId。

修正后的 HAS_ROLE 边:

@Edge("has_role")
public class HasRoleEdge {
    @Id
    private String id;

    @From
    private UserNode user;

    @To
    private RoleNode role;

    // 角色所属的租户ID，作为边的属性
    private String tenantId;
}

这个模型更易于查询，我们将在AQL中看到它的威力。

3. 核心授权逻辑：AQL查询与自定义权限评估器

我们将创建一个PermissionService来封装AQL查询，并创建一个自定义的Spring Security方法注解@CheckPermission，结合@PreAuthorize来实现声明式授权。

PermissionService.java

package com.example.graphauth.service;

import com.arangodb.ArangoCursor;
import com.arangodb.ArangoDatabase;
import com.arangodb.model.AqlQueryOptions;
import com.arangodb.util.MapBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;

import java.util.Map;

@Service("permissionService")
public class PermissionService {

    private static final Logger log = LoggerFactory.getLogger(PermissionService.class);
    private final ArangoDatabase arangoDatabase;

    public PermissionService(ArangoDatabase arangoDatabase) {
        this.arangoDatabase = arangoDatabase;
    }

    /**
     * The core authorization check method.
     * Checks if a user (identified by OIDC subject ID) has a specific action permission
     * on a resource within a given tenant.
     *
     * @param subjectId  The user's unique identifier from the JWT 'sub' claim.
     * @param tenantId   The ID of the tenant context.
     * @param resourceType The type of the resource (e.g., "Project").
     * @param resourceKey  The specific key of the resource instance.
     * @param action     The action to be performed (e.g., "READ", "DELETE").
     * @return true if permission is granted, false otherwise.
     */
    public boolean hasPermission(String subjectId, String tenantId, String resourceType, String resourceKey, String action) {
        // 在生产代码中，应校验所有输入参数不为空
        
        final String aqlQuery = """
            FOR user IN users
                FILTER user.subjectId == @subjectId
                
                // 1. Find all roles this user has within the specified tenant.
                // This traverses User -> Role via the 'has_role' edge, filtering by tenantId on the edge.
                FOR role, roleEdge IN 1..1 OUTBOUND user has_role
                    FILTER roleEdge.tenantId == @tenantId
                    
                    // 2. From those roles, find all accessible resources.
                    // This traverses Role -> Resource via the 'can_access' edge.
                    FOR resource, accessEdge IN 1..1 OUTBOUND role can_access
                        // 3. Filter for the specific resource and required action.
                        FILTER resource.resourceType == @resourceType
                        FILTER resource.resourceKey == @resourceKey
                        FILTER @action IN accessEdge.actions
                        
                        // 4. If we find at least one valid path, we have permission.
                        // LIMIT 1 is a crucial optimization; we stop as soon as we find a match.
                        LIMIT 1
                        RETURN true
            """;

        Map<String, Object> bindVars = new MapBuilder()
                .put("subjectId", subjectId)
                .put("tenantId", tenantId)
                .put("resourceType", resourceType)
                .put("resourceKey", resourceKey)
                .put("action", action)
                .get();

        log.debug("Executing AQL for permission check: subjectId={}, tenantId={}", subjectId, tenantId);
        try (ArangoCursor<Boolean> cursor = arangoDatabase.query(aqlQuery, bindVars, new AqlQueryOptions(), Boolean.class)) {
            // If the cursor has any result, it means the query returned 'true', so permission is granted.
            return cursor.hasNext();
        } catch (Exception e) {
            log.error("Error executing AQL permission check", e);
            // 在安全上下文中，查询失败应默认为拒绝访问
            return false;
        }
    }
}

这个AQL查询是整个架构的核心。它优美地体现了图查询的声明性：

它从一个已知的user顶点开始（通过OIDC subjectId定位）。
沿着has_role边向外遍历一层，同时过滤边的tenantId属性，精准地找到用户在当前租户下的所有角色。
从找到的每个role顶点出发，再沿着can_access边向外遍历一层。
过滤resource顶点的类型和键，以及can_access边上的actions数组是否包含所需的权限。
LIMIT 1是关键的性能优化。一旦找到任何一条满足条件的路径，查询就会立即终止并返回，避免不必要的全图扫描。

4. 在Controller中使用声明式授权

现在，我们可以将这个服务集成到Spring Security的授权决策中。最优雅的方式是使用@PreAuthorize。

package com.example.graphauth.controller;

import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.security.core.annotation.AuthenticationPrincipal;
import org.springframework.security.oauth2.jwt.Jwt;
import org.springframework.web.bind.annotation.*;

@RestController
@RequestMapping("/api/tenants/{tenantId}/projects")
public class ProjectController {

    // ... inject ProjectService

    @GetMapping("/{projectKey}")
    @PreAuthorize("@permissionService.hasPermission(" +
                  "authentication.principal.subject, " + // 从Jwt Principal获取subjectId
                  "#tenantId, " +                         // 从@PathVariable获取tenantId
                  "'Project', " +                         // 硬编码资源类型
                  "#projectKey, " +                       // 从@PathVariable获取projectKey
                  "'READ'" +                              // 硬编码所需权限
                  ")")
    public String getProject(
            @PathVariable String tenantId,
            @PathVariable String projectKey,
            @AuthenticationPrincipal Jwt jwt) { // Spring Security自动注入JWT Principal
        
        // 如果@PreAuthorize检查通过，这里的代码才会执行
        // jwt.getSubject() 提供了当前用户的唯一ID
        return String.format("Project data for %s in tenant %s, accessed by user %s", 
                              projectKey, tenantId, jwt.getSubject());
    }

    @DeleteMapping("/{projectKey}")
    @PreAuthorize("@permissionService.hasPermission(authentication.principal.subject, #tenantId, 'Project', #projectKey, 'DELETE')")
    public void deleteProject(
            @PathVariable String tenantId, 
            @PathVariable String projectKey) {
        
        // 执行删除项目的业务逻辑
    }
}

通过SpEL（Spring Expression Language），@PreAuthorize注解可以直接调用permissionService中的hasPermission方法。它神奇地将来自不同上下文的参数（认证主体、URL路径变量、硬编码字符串）聚合在一起，传入我们的授权检查方法。这种方式代码干净、意图明确，将授权逻辑与业务逻辑完全解耦。

架构的扩展性与局限性

扩展性

这个架构的真正威力在于其扩展性。

复杂的继承关系: 如果需要支持组织单元（OU）的权限继承，只需在图中增加OU顶点和PART_OF边来构建组织树。AQL查询可以通过K..M OUTBOUND语法轻松实现任意深度的遍历。
动态角色分配: 临时授权或项目级别的特殊角色，只需在图中动态地增删has_role边即可，无需修改任何表结构。
性能: 对于大多数SaaS应用，权限图的深度和广度都在ArangoDB高效处理的范围内。AQL查询通常比深度嵌套的SQL JOIN快几个数量级。通过在边的属性上建立索引（如has_role的tenantId），可以进一步加速遍历的起始阶段。

局限性

运维复杂性: 引入ArangoDB意味着技术栈中增加了一个新的、可能不太熟悉的数据存储系统。需要投入资源学习其运维、备份、监控和调优。
数据一致性: 这是一个最终一致性的系统。当权限发生变更时（例如，从图中删除一条has_role边），可能需要一点时间才能在所有应用实例中生效，取决于缓存策略。对于需要强一致性、即时撤销权限的场景，需要设计额外的缓存失效机制。
AQL学习曲线: 虽然AQL功能强大，但它是一门新的查询语言，团队需要时间来掌握其语法和最佳实践，尤其是复杂的图遍历和优化技巧。
“热点”顶点问题: 在图中，如果某个顶点（例如一个超级管理员角色）连接了大量的边，它可能会成为查询中的“超级节点”或“热点”，导致遍历性能下降。这需要通过合理的数据建模来规避，例如进行角色拆分或权限分片。

ArangoDB OpenID Connect (OIDC) Spring Framework Google Cloud (GCP) OAuth 2.0

本篇

在Spring生态中融合GCP OIDC身份与ArangoDB图授权实现动态多租户访问控制

2023-10-27 分布式架构

ArangoDB OpenID Connect (OIDC) Spring Framework Google Cloud (GCP) OAuth 2.0

结合Terraform与Packer构建基于Quarkus、JWT和HBase的不可变多租户数据平台

2023-10-27 分布式架构

HBase JWT Packer Terraform Quarkus