Node Hosting Type
Node hosting type
Although the development team will gradually and try to make the software installation and monitoring of the nodes more automated, as an Internet device, there will inevitably be problems that need to be dealt with, so each node has a certain amount of operation and maintenance management.
The technical reserves of different nodes are also different. For example, some nodes may have provided Internet services themselves, have servers and technical manpower, and some may not understand technology at all. However, in order to ensure the overall stability of the HyperGraph network and provide consistent services to HyperGraph users, there are considerable requirements for the response speed and quality of the service, as well as the degree of standardization. Therefore, in response to this situation, the node hosting and service level are made Define. Node hosting type
1. Full hosting
Full hosting means that after the node registration, the server is negotiated and purchased. In addition to hardware upgrades that require continued participation, there is no need to participate in the management of software installation and maintenance, and service stability monitoring, or even log in to the server.
However, the node can still view the statistics of the node through the background of the node, and can also optionally obtain the cloud platform sub-account, and obtain the server configuration and operating status through the assigned sub-account in the cloud server background, in order to avoid causing unstable factors and Leaving security risks, you cannot log in to the server and the management server.
Under this type of hosting, the node only needs to pay attention to its own statistics and business volume, and does not need to care about the operation of the server, so there is no need for technical personnel, and the node initiator does not need to know the technology.
Currently, the development team does not charge additional technical support fees for full server hosting. 2. Semi-hosted
Semi-hosted nodes are nodes with a certain amount of experience in using the cloud platform. They purchase nodes in the designated institutional area by themselves. Therefore, the way to join the network can also be purchased in a unified manner, and the server can be obtained by assigning sub-accounts. Because it involves a large amount of data interaction with the RPC node, the node must be purchased at a location close to the RPC node, otherwise the node will not work. After the purchase is completed, the development team will take over, install the software, and manage the operation and maintenance. Both parties can log in and manage the node server, and the node can expand the device by itself, restart and stop the server by itself.
The node checks the statistics of the node through the background of the node, and can also obtain the cloud platform sub-account or obtain the configuration and operating status of the server in the cloud server background through the cloud platform, but in order to avoid causing instability to the service and leaving security risks, You can’t change the server settings arbitrarily, and you can’t restart the server at will. Even if it is capacity expansion, the development team must participate and support.
Under this type of hosting, the node can not only pay attention to its own statistics and business volume, but also understand the running status of the server and the service configuration. Therefore, the node initiator needs someone to have a certain understanding of Linux operation and maintenance, and understand the cloud platform It understands the importance of Internet service stability, and guarantees that the contact technician can keep a smooth 24-hour phone call to jointly deal with problems. Of course, automatic monitoring will be added to the normal equipment conditions, and there will be no problems under normal circumstances.
3. Full self-service management
Full self-service management means that the node server is self-service from procurement to service provision. The developer team only provides installation programs and scripts, provides software upgrades and necessary monitoring services, but does not require the developer team to have an account to log in to the server and manage Program, this kind of operation is carried out by the node party technology in cooperation with the developer.
The node checks the statistics of the node through the background of the node, and understands the configuration and running status of the server on its own. However, in order to avoid unstable factors and security risks to the service, once online services are provided, do not change the server settings casually, let alone random Restarting the server, even if it is capacity expansion, requires the participation and support of the development team.
In this type, the developer does not have the authority to log in and manage the node server, so the node is solely responsible for configuration upgrades, capacity expansion, and service management. Therefore, technical personnel on the node side are required to be able to operate Linux database, Postgres database, Nodejs series development tools and other software proficiently, and understand the operation and configuration of the cloud platform, expansion and other operations, and understand the importance of Internet service stability. The node side must be someone that can answer the phone in hours to deal with the problem.
Therefore, in order to ensure the stability of service quality, unless it is an experienced node, in principle, do not fully self-manage. Service quality requirements
Providing Internet services, 7*24 hours, encounters problems and is unavoidable, but when problems are encountered, they must be quickly recovered, or be imperceptible to users, or have a small impact. These are the principles for handling Internet problems. So on the HyperGraph service, including but not limited to the following points:
All server core resources, including CPU, load, and disk space, exceed the threshold, and an email alert will be sent every 5 minutes.
Therefore, the core service processes, including ipfs, graph-node, geth and other processes, monitor the running status of the process every minute. If the process is lost, it will automatically start and send an email and SMS alarm.
For key users, it is recommended to do backup query deployment. For subgraphs deployed on more than two nodes, do deployment and data redundancy. Once one of them has problems, it is recommended to switch online immediately.
For the network connection of all nodes, sub-graph query, and the health status of RPC and sub-graph, do monitoring and alarm by email and SMS every minute.
To do this to meet the following service requirements, the first step is to achieve the standard of 99.9% reliability, and work towards 99.99% reliability. Here is an explanation of what these two data mean.
In the field of Internet services, there is a word called SLA, English is Service Level Assurance, which can be translated as service level. It can also be understood as the reliability of the service, which is generally described as a percentage.
99.9% means: it can be used in 99.9% of the cases. What is the concept? There are 365 days in a year, so 1/1000 of the situation cannot be used, that is, it will not be used in 1/3 of the day. Only 8 hours are allowed in a day. If you count it into one day, there are 86400 seconds a day, and only 86.4 seconds are allowed in a day, that is, one and a half minutes can’t be used. Other times are required to be able to provide services normally. This is a low standard, but it is not easy to achieve. 99.99% is a relatively high standard, and less than one hour of downtime is allowed a year.
Last updated