413 Site Reliability and Production Engineering Resources & Tools

by John | Nov 1, 2021 | Computers and Internet, Engineering, Top Compilations | 0 comments

413 Free Site Reliability and Production Engineering Tools & Resources

What is Site Reliability Engineering (SRE)? Fundamentally, it’s what happens when you ask a software engineer to design an operations function. SRE is a people discipline focused on reliability, availability, and performance of software systems, whether web applications or systems software. SRE is a specialized team role, not a job description. SRE is a subset of Site Reliability Engineering, a methodology for designing, building, and operating large distributed systems reliably.

Site Reliability Engineering is a management philosophy introduced by Google in 2008 to describe its internal operations model. The goal of the site reliability engineering team is to create and maintain a platform that can be easily and frequently deployed and updated without any disruption to either services or users. To achieve this goal, the SRE team usually works closely with other teams, such as developers and designers. On large sites, the SRE team also maintains an organizational structure that allows it to move quickly and coordinate projects.

This post is a curated list of awesome Site Reliability and Production Engineering resources. These resources include books, articles, blogs, newsletters covering various topics such as culture, reliability, monitoring, planning, SLA and many more.

Books

Culture

Team

Education

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Blogs

Brendan Gregg’s Blog
Highly Technical Blog Posts About Systems Internals, Performance and SRE.
Everything Sysadmin
Blog Posts About SysAdmin/DevOps/SRE by Tom Limoncelli.
High Scalability
Technical Blog Posts About Systems Architecture.
rachelbythebay
Techincal Blog Posts.
Susan J. Fowler
Various blog posts about SRE, Software Engineering and Microservices.
SysAdvent
One article for each day of December, ending on the 25th article.
Stephen Thorne’s Blog
Blog Posts About SRE
Increment
A digital magazine about how teams build and operate software systems at scale.
GopherSRE
Blog Posts about Go and SRE.
Cindy Sridharan
Blog posts about distributed systems and their management.
Blameless Blog
Blog posts about SRE culture and practices.
Resilience Roundup
Weekly analysis of Resilience Engineering and Human Factors research designed for software systems
Squadcast Blog
Blog posts about SRE best practices, reliability, on-call and incident management.
FireHydrant Blog
Posts about complex systems, incident response, and SRE best practices.
Rootly Blog
Incident management best practices and guides.

Newsletters

DevOpsLinks
A weekly newsletter about SRE, SysAdmin and DevOps news, tools, tutorials and opinions.
KubeWeekly
The weekly newsletters for all things Kubernetes. KubeWeekly is curated by Bob Killen, Chris Short, Craig Box, Kim McMahon and Michael Hausenblas
SRE Weekly
Weekly Site Reliability Newsletter.
O’Reilly Systems Engineering and Operations Newsletter
Weekly systems engineering and operations news and insights from industry insiders.
ChaosEngineering.news
Chaos Engineering newsletter. All things Chaos Engineering, directly to your inbox!

Conferences & Meetups

SRECon Conferences
The Official SRE Conference.
LISA Conferences
Prominent Conference About SysAdmin/DevOps/SRE.
SRE Tech Talks
SRE Talks Hosted by Google.
South Bay Site Reliability Engineering (Sunnyvale, CA) Meetup
A Group For Individuals Who Tackle Reliability Challenges For Web-Scale Systems.
San Francisco Reliability Engineering
A Group Of People Who Are Passionate About Reliable, Performant Software Systems.
Site Reliability Engineering Munich, Germany
SRE Meetup in the greater area of Oktoberfest city.
ADDO – All Day DevOps
A 24 hour conference that is completely online and free.
Site Reliability Engineering Paris, France
SRE Meetup in the city of light.
Site Reliability Engineering India
SRE Meetup India

Twitter

Google SRE Twitter Account
Google’s SRE Twitter Account.
SREBook
The Official Twitter Account of Site Reliability Engineering Book.
SREcon
SRECon’s Official Twitter Account.
SREWorkbook
The Official Twitter Account of Site Reliability Workbook.
The SRE Dev
SRE-related Posts from dev.to
Twitter SRE
The Official Twitter Account of Twitter’s SRE team.
Twitter SRE Weekly
The Official Twitter Account of SRE Weekly Newsletter.
USENIX Association
The Official USENIX Twitter Account.

413 Site Reliability and Production Engineering Resources & Tools

Books

Culture

Other Related Posts

Team

Education

Hiring

Reliability

Monitoring & Observability & Alerting

On-Call

Post-Mortem

Capacity Planning

Service Level Agreement

Performance

Programming

Misc Articles

Blogs

Newsletters

Conferences & Meetups

Twitter

Related Posts

December is Spiritual Literacy Month – Free Spirituality Ebooks

December is Universal Human Rights Month – Free Human Rights Ebooks

Merry Christmas! – Free Christmas Ebooks

December is National Write a Business Plan Month – Free Business Plan Templates

Links

Top Categories

Submissions

Youtube

Telegram

Follow Us