Operability Book
@Operability
Team Guide to Software #Operability by @matthewpskelton, @smileandeliver, & @robtthatcher | published by @ConfluxBooks
你可能會喜歡
Learn about Software #Operability with this free sample chapter from the book Team Guide to Software Operability by @matthewpskelton @smileandeliver and @robtthatcher via @leanpub leanpub.com/SoftwareOperab…
Superb thread on likely problems with modern software systems 👇 #sre #operability
I've seen a lot of people asking "why does everyone think Twitter is doomed?" As an SRE and sysadmin with 10+ years of industry experience, I wanted to write up a few scenarios that are real threats to the integrity of the bird site over the coming weeks.
STEP 0: instrument the code and use telemetry (logs, metrics, traces) to how the code actually works in production.
Sometimes you want to change the architecture of a submodule completely. 🧱 It is tempting just to rewrite a module from scratch. 🤷♂️ But with careful planning, you can perform any code transformation in small steps. 🧵 👇
Reliable systems, good alerting and monitoring, and luck. 3 of the ingredients to making on-call not suck quite so much. Plus being paid for it, of course! @heyitsols on ‘Being Paid To Sleep’ at @DevOpsDaysLDN
A product or system designed without ongoing operational concerns will *appear* cheaper but likely end up costing significantly more in the long term compared to a system co-designed with the people who will actually operate it. 2/
Incidents? Those are just failure modes leaving the system.
"Software operability still suffers because #Devs are no closer to actually running the software that they build, and the SREs still don't have time to engage with Devs to fix problems when they arise.”
WARNING! You'll need PROPER observability. If you don't have proper introspection, the technical challenges will be almost impossible to overcome. In the mentioned example we were able to trace the annotated db queries through the complete code artifact.
Additionally we started to annotate all the database queries with the responsible team names in order to build a db table and column ownership map. Sometimes using an ORM gives you actual advantages, so we were able to pull this off without a huge performance overhead and effort.
Can we build #observable services without logs? @glenathan, Senior Software Engineer @Geckoboard shares at #QConPlus a story of how it went: bit.ly/3w48gFw 💡Missed this talk? You can still register & have on-demand access to all the talks for 3 months. #Observability
That said, the what's more important than "project versus product" is that the funding and execution model for the software evolution keeps that software viable in the long term. Prefer sustainable pace and attention to operability rather then feature factory. 👍
This is so good to hear! 🙌 Weirdly, I was using (and wrote about) this kind of approach almost exactly 10 years ago: - Dynamics severity levels - Unique IDs - A focus on events #operability blog.matthewskelton.net/2012/12/05/tun…
blog.matthewskelton.net
Tune logging levels in Production without recompiling code
This article first appeared in Software Development Practice, Issue 1, published by IAP (ISSN 2050-1455) Abstract When raising log events in code it can be difficult to choose a severity level (su…
"If we have proper visualisation and better metaphors, we set much better conditions for our operators to be comfortable in understanding and responding to variations in our systems." @yurynino #QConLondon
Our latest version of the Multi-Team Software Delivery Assessment deck by @matthewpskelton @ConfluxHQ now on sale! Including the additional themes of #security, #teamtopologies, on-call and SRE & Reliability, this is our most comprehensive tool! agilestationery.com/products/softw…
Nice example of a useful, in-app scheduled maintenance message from @MiroHQ 👏 "Upcoming scheduled maintenance: Saturday, March 26, 2022 at 5:30 AM (your time). Miro will be unavailable for 1 hour." The "your time" bit is good #UX #operability #reliability
The challenge: anticipating how change creates new failure modes
Very happy to see the discussion around developer experience. Now how about operator experience?
LARGE SYSTEMS USUALLY OPERATE IN FAILURE MODE, via @dangolant Or like I used to say, your distributed system exists in a continuous state of partial degradation. There are bugs and flakes and failures all the way down, and hardly any of them ever matter. Until they do.
We at @ConfluxHQ have been doing a lot of work around reliability over the past 2 or 3 years. See reliabilitymodel.com for some ideas about exploring and measuring reliability. 👍
github.com
GitHub - telus/reliability-model
Contribute to telus/reliability-model development by creating an account on GitHub.
In 2021 and beyond, showing a generic "Oops. Something went wrong" 500 error page is not just user-hostile but exposes major flaws in the product engineering approach. ➡️ Design for the UX under error conditions. #operability #UX
Team Guides for Software leanpub.com/b/teamguidesfo… by Matthew Skelton, Rob Thatcher, Alex Moore, Chris Young, Mattia Battiston, Ash Winter, Rob Meaney, Manuel Pais and Chris O'Dell is the featured bundle on the Leanpub homepage! leanpub.com cc @matthewpskelton
I'm lucky to work with lots of great people here @weareglofox but it's so cool see @clintonsweetnam 1. Design a system with #testability and #operability as a primary concern from the outset and then 2. Share how he and his team did it with the whole Engineering Department😍
United States 趨勢
- 1. Louisville 73.5K posts
- 2. Jets 133K posts
- 3. Virginia 230K posts
- 4. MD-11 16.6K posts
- 5. Honolulu 7,511 posts
- 6. #OlandriaxGlamourWOTY 2,055 posts
- 7. Azzi 6,813 posts
- 8. Jared Isaacman 6,549 posts
- 9. #AreYouSure2 40.3K posts
- 10. Madrid 434K posts
- 11. UPS Flight 2976 13.8K posts
- 12. Colts 64.6K posts
- 13. Jay Jones 27K posts
- 14. Courtois 69.5K posts
- 15. Cheney 287K posts
- 16. Sarah Strong 1,416 posts
- 17. #いい推しの日 828K posts
- 18. UConn 5,526 posts
- 19. Miyares 14.6K posts
- 20. #JiminxJungKook 38.8K posts
Something went wrong.
Something went wrong.