BT

广东了36选7开奖结果:GitHub Engineering Adopts New Architecture for MySQL High Availability

| by Hrishikesh Barua Follow 15 Followers on Jul 08, 2018. Estimated reading time: 3 minutes |

深圳风采开奖号码 www.ljvch.cn GitHub.com uses MySQL as a backbone for many of its critical services like the API, authentication and the GitHub.com website itself. GitHub's engineering team replaced its previous DNS and Virtual IP (VIP)-based setup with one based on Orchestrator, Consul and the GitHub Load Balancer in order to get around split brain and DNS caching issues.

GitHub runs multiple MySQL clusters for different services and tasks, making it imperative to have them highly available (HA). GitHub's infrastructure is spread out across multiple datacenters, consisting of around 15 clusters, close to 150 production servers and 15 TB of MySQL tables. Each MySQL cluster has a single master, which responds to write requests, and multiple replicas, which serve read requests. The master node forms a single point of failure, and without it writes would completely fail. The HA requirements for this setup include auto-detection of failure, auto-promotion of a replica node to a master and auto-advertisement of the new master node to client applications.

GitHub's engineering team has employed several strategies for HA over the years, gradually moving towards uniformity across the organization. Since this is not restricted to MySQL, requirements for an HA solution also include cross-datacenter availability and split brain prevention. There are different possible approaches for MySQL master discovery. Previously, GitHub utilized DNS and VIP for discovery of the MySQL master node. The client applications would connect to a fixed hostname, which would be resolved by DNS to point to a VIP. A VIP allows traffic to be routed to different hosts to provide mobility without tying it down to a single host. The VIP would always be owned by the current master node. However, there were potential issues with the VIP acquire-and-release process during failover events, including split-brain situations. When this happens, two different hosts can have the same VIP and traffic can be routed to the wrong one. In addition, DNS changes have to occur to handle a master node that is in a different data center, and that can take time to propagate due to DNS caching at clients.

The latest setup at GitHub includes the Orchestrator toolkit, Consul for service discovery and the GitHub Load Balancer. In this architecture, when a client application looks up the master’s IP on DNS via its name, it is resolved via Anycast. The advantage of using Anycast is that while the name is resolved to the same IP address in every data center, the client traffic to that IP will be routed to the nearest master. The nearest master is the one that is co-located in the same data center. This routing is taken care of by GLB, which knows the current active MySQL master backends.

GitHub MySQL HA architecture
Image courtesy: https://GitHubengineering.com/mysql-high-availability-at-GitHub/

Orchestrator, also a GitHub engineering open source project, is responsible for master failure detection and the failover process. It utilizes collective knowledge drawn from all MySQL nodes including the replica to arrive at an informed decision about the master’s state. When a write master fails, the Orchestrator leader node detects the failure and starts the failover process to choose a new MySQL master. The rest of the Orchestrator cluster nodes notice this change and update their local Consul daemon with the new master details. Consul, a service discovery tool from HashiCorp, keeps track of the master nodes by storing them as key-value pairs. Consul can run in a distributed mode across datacenters but in GitHub's case each Consul cluster is independent at a datacenter level. The GLB gets notified of master status changes on a failover event using Consul Template, which queries the Consul clusters and updates the GLB state, which in turn routes traffic to the new master.

In the article, Shlomi Noach, senior infrastructure engineer at GitHub, mentions that although the new setup provides "between 10 to 13 seconds" of max outage time in most cases, there are some scenarios that need more work, like data center isolation leading to a split-brain or a Consul outage at the time of failover.  GitHub’s new setup is a move away from traditional techniques based on networking, to ones based on proxying and service discovery. It completely replaces the VIP-based one, but there is debate around whether it would have been easier to adopt a different approach utilizing the Border Gateway Protocol (BGP).

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Welcome to 2018 by Robert Van Dell II

Leader/follower is the modern preferred terminology over master/slave (e.g. "leader election" from failure detection).

Re: Welcome to 2018 by Daniel Bryant

Thanks for you comment Robert, and I understand your concern (I too personally prefer the modern terminology).

I've looked in the MySQL docs (dev.mysql.com/doc/refman/8.0/en/replication.html) and they do still use the older terminology, and so I can understand why Hrish chose the the original words for this news piece. However, the source article from GitHub used the terms "master-replica" and so I have updated this piece to reflect their choice.

Best wishes,

Daniel
InfoQ News Manager

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT
  • 小龙虾走俏催生新职业“品虾师” 2018-12-15
  • 女子痴迷鹿晗 商场门口对人形立牌拭泪亲吻 2018-12-14
  • 成都:共享办公受追捧 助力写字楼“去库存” 2018-12-14
  • 回复@老老保老张工:伪高工想回到那种你生产的产品再水都有人买单都不会倒闭的日子?没门儿! 2018-12-12
  • 习近平:深入实施创新驱动发展战略 为振兴老工业基地增添原动力 2018-12-12
  • 昆明母婴室地图出炉啦!公众场合喂奶不再羞答答 春城壹网 七彩云南 一网天下 2018-12-10
  • 南昌市新建区司法局深入湖区渔船宣传法律 2018-12-09
  • 银白配色更高贵-热门标签-华商网数码 2018-12-08
  • 超美雾凇冰挂奇观   豫北最大瀑布群变冰帘 2018-12-07
  • 台东“孩子的书屋”:撑起偏乡学童翻转命运的机会 2018-12-07
  • 实验室里“种植”钻石,这样的人造钻戒你能接受吗? 2018-12-06
  • 新赛季CBA联赛常规赛分组 吉、辽、深、广、青同组 2018-12-05
  • 【网络中国节】端午遇上足球杯 平陆交警夜查全力保平安 2018-12-05
  • 法学教育 要离生活更近些 2018-12-04
  • 写“平乐镇”前,小说家颜歌的光怪陆离 2018-12-03
  • 364| 840| 160| 422| 41| 819| 131| 288| 234| 169|