Enable Pages access control
I’m opening this to track enabling Pages access control on .com. Seems we need to to some testing and check before enabling this as discussed inhttps://gitlab.slack.com/archives/C1BSEQ 138 / P 1542369390013600.
it won’t be enabled by default (impossible to do so on HA installations). I guess we’ll need to do a production readiness review, etc, before turning it on
Summary
GitLab Pages access control was introduced inhttps://gitlab.com/gitlab- org / gitlab-ce / issues / 33422.
Here’s the Omnibus changesgitlab-org / omnibus-gitlab! 2583 (merged)and the admin docshttps: / /docs.gitlab.com/ee/administration/pages/#access-control.
Related issues
- https://gitlab.com/gitlab-org/gitlab-ce/issues/ 59286–
blocker, on review stage
- https://gitlab.com/gitlab-org/gitlab-ce/issues/ 56386–
inconsistent settings UI
, will cause users to accidentally make pages private while updating project settings, but they can easily turn it back
Deployment plan
…
- run
rake gitlab: pages: make_all_public
# starts background migration which fixeshttps://gitlab.com/gitlab-org/gitlab-ce/issues/ 59286, should take about an hour to finish
…
Architecture
- Add architecture diagrams to this issue of feature components and how they interact with existing GitLab components. Include internal dependencies, ports, security policies, etc.
Couldn’t find any, onlythese docs
- Describe each component of the new feature and enumerate what it does to support customer use cases.
- pages daemon
- checks if access_control is enabled for a particular project
- redirects users to gitlab.com for auth
- receives back redirect from gitlab.com with temporary token
- exchange temporary token to a permanent one and stores it in user session
- per every request go to
"% s / api / v4 / projects /% d / pages_access"
for checking user authorization
- gitlab rails app
- updates pages configs
- perform OAuth auth (redirects and token exchange)
- For each component and dependency, what is the blast radius of failures? Is there anything in the feature design that will reduce this risk?
- pages daemon
- in theory, can fail on the start with incorrect, but since we incrementally restart daemons, that will just result in
reverted deploy
- if access-control will not work properly, pages sites with enabled access-control can become unavailable
- in theory, misconfiguration can result in cycle redirect between gitlab.com and gitlab.io for projects with enabled access-control
- in theory, can fail on the start with incorrect, but since we incrementally restart daemons, that will just result in
- rails app
- in worst case scenario pages configs can stop updating (but I’m being too paranoid, this is well covered with tested and many users use it)
- if we enable access-control on a big amount of pages projects we can
hit api rate limit
and pages sites would be temporarily unavailable
- If applicable, explain how this new feature will scale and any potential single points of failure in the design.
- see rate limits from the previous section
- Both
pages-daemon
andrails api
are single points of failure
Operational Risk Assessment
- What are the potential scalability or performance issues that may result with this change?
- see rate limits above
- List the external and internal dependencies to the application (ex: redis, postgres, etc) for this feature and how the it will be impacted by a failure of that dependency.
- rails api – pages with enabled access-control will become unavailable
- Were there any features cut or compromises made to make the feature launch?
- none yet
- List the top three operational risks when this feature goes live.
- a spike in API requests if a lot of projects will enable it, and we couldn’t just turn this feature off after a month since this will make all sites public …
- What are a few operational concerns that will not be present at launch, but may be a concern later?
- Can the new product feature be safely rolled back once it is live, can it be disabled using a feature flag?
- After a while no. But each project need to enable it manually and can turn it off at any time.
- We may consider guarding pages access level by feature flag for period of testing on gitlab.com
- Document every way the customer will interact with this new feature and how customers will be impacted by a failure of each interaction.
- set pages_access_level in project settings
- open pages site – user will see 500 in case auth is misconfigured, or if anything goes wrong
- As a thought experiment, think of worst-case failure scenarios for this product feature, how can the blast-radius of the failure be isolated?
Database
No changes affecting database required
Security
- Were thegitlab security development guidelinesfollowed for this feature?
- If this feature requires new infrastructure, will it be updated regularly with OS updates?
- does not require
- Has effort been made to obscure or elide sensitive customer data in logging?
- yes, not tokens are present in logs
- Is any potentially sensitive user-provided data persisted? If so is this data encrypted at rest?
- no user data used
Performance
- Explain what validation was done following GitLab’sperformance guidlinesplease explain or link to the results below
- (Query Performer)
- Sherlock
- Request Profiling
No check was done, but rails part consists basically of 1 API request for auth
- Are there any potential performance impacts on the database when this feature is enabled at GitLab.com scale?
- I don’t see them
- Are there any throttling limits imposed by this feature? If so how are they managed?
- no throttling limits
- If there are throttling limits, what is the customer experience of hitting a limit?
- No throttling limits
- For all dependencies external and internal to the application, are there retry and back-off strategies for them?
- no, if OAuth is failed or api requests failed, we render 500, but reloading of the page should help
- Does the feature account for brief spikes in traffic, at least 2x above the expected TPS?
Backup and Restore
- Outside of existing backups, are there any other customer data that needs to be backed up for this product feature?
- (No)
- Is the service logging in JSON format and are logs forwarded to logstash?
- yes
- Is the service reporting metrics to Prometheus?
- currently
NO
, since failures repoted to logs, we can count error log messages per minute
- currently
- How is the end-to-end customer experience measured?
- There is not metrics I’m aware of
- Do we have a target SLA in place for this service?
- Do we know what the indicators (SLI) are that map to the target SLA?
- Do we have alerts that are triggered when the SLI’s (and thus the SLA) are not met?
- Do we have troubleshooting runbooks linked to these alerts?
- What are the thresholds for tweeting or issuing an official customer notification for an outage related to this feature?
Responsibility
- Which individuals are the subject matter experts and know the most about this feature?
- Which team or set of individuals will take responsibility for the reliability of the feature once it is in production?
- Is someone from the team who built the feature on call for the launch? If not, why not?
Testing
- Describe the load test plan used for this feature. What breaking points were validated?
- For the component failures that were theorized for this feature, were they tested? If so include the res ults of these failure tests.
- Give a brief overview of what tests are run automatically in GitLab’s CI / CD pipeline for this feature?
- auth process is tested inside gitlab-pages daemon with mock api server
EditedJul 12, 2019byVladimir Shushlin
GIPHY App Key not set. Please check settings