Agentless synthetic monitoring for Citrix Virtual Apps and Desktops, while feeding performance data directly into Splunk

Discover in this webinar why 2 Steps is disrupting synthetic monitoring for Citrix

Transcripts available below.

Transcript

2 STEPS, CITRIX AND SPLUNK LIVE WEBINAR ON SYNTHETIC MONITORING

Anil Kumar:

All right, good morning, good afternoon and good evening everyone, depending on where you're connecting from. Welcome to yet another exciting Citrix Ready technical webinar, where we showcase how Citrix and our partners like 2 Steps and Splunk, who are here with us today, have integrated with Citrix's products to deliver valuable products and solutions to common problems faced by our customers.

Anil Kumar:

I'm Anil Kumar, your host moderator for today. I'm a Technical Marketing Manager at Citrix, working in this role for about five years now, and have had the privilege to work with many technology partners like 2 Steps and Splunk, who are here with us today. Along with me, we have a good set of speakers with us today from 2 Steps and Splunk. So let's go around the table for a quick round of introductions. Simon, do you want to go first?

Simon Trilsbach:

Yeah. Thanks, Anil. Welcome everybody to the webinar. Thank you for joining. My name is Simon Trilsbach. I'm the Managing Director of 2 Steps. I've been with the business almost three years now and essentially responsible for the product strategy and go-to-market to bring 2 Steps. And it's been a pleasure working with our partners, Splunk and Citrix. Looking forward to sharing what the product does with the audience today.

Andy Bearsley:

Hi, everyone. So my name is Andy Bearsley. I'm a Splunker. I work with our user community who are using Splunk for things that aren't related to security. So anything to do with IT operations or application delivery or any number of things, really. I help customers to get some really good insights out of their machine data using Splunk.

Andrew Newlands:

Hi, everyone. My name is Andrew Newlands. I'm Head of Product at 2 Steps. My background is engineering, and I've done a number of things in that sphere. But for the last year or so, I've had the pleasure of building out the 2 Steps product, which we're going to show you today. I'm very excited to do so.

Anil Kumar:

All right, thanks and welcome, Simon, Andrew and Andy to the webinar. I know it's very early for you guys over there, but really very exciting. I've seen the slides before. We discussed this earlier and had the privilege to validate and see your product demos. I'm very excited to see what's coming up next.

Anil Kumar:

A little bit about Citrix Ready and what we do here. So Citrix Ready, it's a technology partner program, where we recommend solutions that are trusted to enhance Citrix environments for digital workspace, networking and our cloud services. All the products which are featured in our Citrix Ready marketplace have completed thorough verification, testing, thereby providing that extra confidence and joint solution compatibility, which our customers look out for. And with this online catalog and our brand new program, you could easily find and build a trusted infrastructure with all or most of the Citrix Ready products we have online.

Anil Kumar:

And with that said, and without any further ado, let me hand it over to our speakers today, to learn more about how we can use agentless synthetic monitoring. So over to you, Simon. Thank you.

Simon Trilsbach:

Wonderful, thanks, Anil. And again, thank you everyone for joining the webinar. Extremely early for us in Melbourne, Australia. It's 3:30 in the morning, but we've got a supply of coffee, so we're good to go.

Simon Trilsbach:

Today, we're going to be introducing you to an exciting new capability called 2 Steps. Just on the name for a second, it's called 2 Steps, it's a bit of a play on words. It's really all about being two steps ahead of the problem. And as IT operations professionals or service delivery professionals, our work is challenging and difficult. And the job today is to demonstrate how 2 Steps can provide some value in a Citrix scenario, Citrix environment.

Simon Trilsbach:

We're going to talk about how we help monitor end-user experience and as Anil mentioned, push the performance metrics directly into Splunk. The challenge is really to demonstrate how we can reduce cost, reduce risk and reduce effort.

Simon Trilsbach:

And by the end of the webinar, hopefully, you're going to be able to see how 2 Steps enables you to be quicker, in regards to the implementation of synthetic tests. Smarter, by correlating end-user experience metrics with infrastructure metrics that are produced and pushed directly into Splunk. Safer, getting ahead of the issues that impact customer experience. And as we all know, customer experience is now a board-level conversation. And then easier, showing you not only do we produce performance metrics, but we also produce video replays of issues. So easier for you to communicate what's going on with other stakeholders within your business.

Simon Trilsbach:

Okay, so a little bit about 2 Steps. 2 Steps is a relatively new product in the market. But in some ways, it's been almost 15 years in the making. It is part of a business called Remasys, which has been in the space of synthetic monitoring for over 15 years now.

Simon Trilsbach:

What we've done is taken a lot of our legacy IP around agentless synthetic monitoring. So what you'll hear a lot about today is a very novel and interesting technique, in terms of implementing synthetic monitoring. So synthetic monitoring has been around for a while and the selenium approach is very mature. But unfortunately, selenium doesn't work for Citrix. It's really only used for Chrome-based tests.

Simon Trilsbach:

So what we've done is we've developed a technique to enable us to automate workflows in virtualised Citrix applications and Citrix desktops. So although the product is new, the IP has been around for a number of years, as has the development team. I'll just pass it over to Andy, and he will give you the overview of Splunk. I'm sure you're all aware of the organization, but as a courtesy, Andy, over to you.

Andy Bearsley:

Sure. Thank you, Simon. Splunk is really a machine data platform. I guess it's a place where we can take data from many different silos, bring it together in one place, and then provide insights on that for various teams. So the insights might be for people on the application support and application delivery side, or it might be from the owners of the business service, or it could be the security team. So we provide I guess different lenses on the data. It's really good for breaking down silos.

Andy Bearsley:

We can actually take that, I guess to the next level, which is around providing some machine learning. That's really good for predicting when an incident might take place. And that early insights, 30 minutes in advance is critical for application support teams.

Andy Bearsley:

And one of the things I love about the combination of 2 Steps and Splunk is that one of the lead indicators is end-user experience. And this has always been a blind spot for us, from a Citrix perspective. So this is a really valuable combination of forces joining up.

Simon Trilsbach:

Wonderful. Thanks, Andy. So to set things up, up until now, synthetic monitoring for virtualised applications has been limited. Current solutions that we've seen in the marketplace tend to stop at either the logon process to Citrix or the launching of an application.

Simon Trilsbach:

And the problem with that is when we're thinking about the end-user experience, we want to go past that, we want to understand what's happening when our users are actually in the application themselves. So we'll look at an example, or some examples of end-user experience actions in the next slide. But it's really important that we get visibility of how somebody is using the application and that's really been missing in the marketplace.

Simon Trilsbach:

In today's webinar, we'll explain how 2 Steps will unlock new ways to automate these workflows, or we call them user journeys within an application, how it's a quick and simple process with no coding, no embedded agents, no hooking into your APIs to get the automation running. And what we're going to do in terms of the presentation is break it down into two main headings.

Simon Trilsbach:

The first is stage one, which is how you can move to a more proactive posture, by getting ahead of problems before there's a major customer impact. That's all about implementing synthetic monitoring that simulates user behaviour, creating baselines, benchmarks and SLAs that in turn form KPIs. And then accelerating root cause analysis by correlating end-user experience data with infrastructure data that's powering the business service. So that's stage one and it's really about proactive.

Simon Trilsbach:

Stage two is how can we try and leverage this holy grail of being predictive? Is there a way that we can stop IT issues happening before they happen and impact the users? So that's really about consolidating the performance data in Splunk, leveraging predictive algorithms or machine learning algorithms, and then understanding what's normal and what is not normal.

Andy Bearsley:

And from the Splunk side, for us, this is a combination of using really Splunk as a big data platform for the IT operations data, and the end-user experience data, putting some machine learning on top of that to predict 30 minutes out when incidents might take place. And Gartner actually describes that as AI ops, the combination of big data and machine learning to provide insights to the support teams that they wouldn't otherwise have.

Simon Trilsbach:

Wonderful. Thanks, Andy. End-user experience is what we refer to as an output metric. It's tied to the business action a user is trying to perform when accessing an application. So am I able to print, am I able to check out? How long does it take to run a database search, upload a file, et cetera? Now, the aim of end-user experience monitoring is to understand the performance of these mission-critical applications within our business on when a user is trying to perform a business action.

Simon Trilsbach:

Additionally, it's critical to monitor what's happening at the infrastructure level or input metrics. And these are the different infrastructure components that power a business service such as a website, a CRM, or an ERP system.

Simon Trilsbach:

The challenge is, if you just have either the input or the output on its own, that it can be limiting or misleading. So what I mean by that is, if you have a piece of infrastructure data that is performing erroneously, the challenge is, what is the impact? Is it a P1 incident? Are my users being affected? Do I need to wake up the building? So having the end user experience data along with the infrastructure data, you can start making those correlations and understand the priority of fixes.

Simon Trilsbach:

Conversely, if you're just looking at the output data, which is the end-user experience data and applications are performing slowly, then the challenge is where do I begin to start my investigation? How do I fix the underlying cause? And of course, this becomes even more complex when we're working in a Citrix environment. So it's only when you start to correlate input data, i.e. infrastructure data and output data, end-user experience data, that you can start to get a full picture.

Simon Trilsbach:

So before we get into the demo, what we're going to try and do is just set this up as a typical scenario that we see across a lot of the organisations that we talk to. That's a remote site scenario. So recently, we were talking to a state government department in Australia that had 20 remote sites. And the challenge was that the support desk would get hit with complaints that the applications were running slow.

Simon Trilsbach:

So not only were they being reactive because they were waiting on the call to hit the help desk when the call did come in, it was incredibly difficult for them to understand where the underlying cause was. Was it Citrix? Was it the application? Was it infrastructure at one of the remote sites? Were all the sites being affected? Was it one, was it a cluster? And this is a common problem. So we'll talk about how we can go about helping with that scenario. So hopefully that sounds familiar and even more hopefully, you'll be able to see how the solution may be able to help with that.

Simon Trilsbach:

The current challenges with Citrix virtual applications, so as I mentioned, current monitoring solutions stop at login or application launch. And this doesn't provide real-time intelligence of performance within the application. So this idea of, we need to automate workflows or user journeys that our users are using within those applications.

Simon Trilsbach:

Without synthetic monitoring, then you're exposed. You're exposed because you're not going to get that early warning sign or that regular heartbeat of how the application is performing. You may not even have baselines or KPIs set up. So you don't even know what good performance looks like.

Simon Trilsbach:

And then the final challenge with the current suite of solutions that we see out in the marketplace is all of the performance metrics, even though they are application logon or application launch, are stored in a separate repository. So if you're trying to correlate that data with the infrastructure data that you've got in Splunk, then it just becomes even more complex and difficult.

Andy Bearsley:

Yeah, that's actually an interesting point about siloed. I think that's one of the biggest challenges that I personally see in big organisations is the problem of silos, where we might have one team that's interested in the end-user experience part that's critical for them, and then we have completely separate other teams on the network side and on the firewall side, who might be more often security-oriented. But they have data as well that it's actually really valuable understanding the health of the service.

Andy Bearsley:

And then there are the database teams, and the application teams, and the virtualisation teams. These are all silos. And when it comes down to an incident time, for us is critical. Having all of these silos not having all that information in one place is something that can be a real problem. I think hitting that siloed mark is a really critical thing to do.

Simon Trilsbach:

Great segue onto the next slide, which is really around time. So at the top of the slide, the top of the presentation, we were talking about reducing cost and reducing risk and reducing effort. And here's the typical scenario without robust synthetic monitoring, is that you'll have an impacting fault at the bottom, and then a series of events that will start to alert the operation, the control centre, and then you have the median time to resolution. And there's a direct correlation between the more time it takes to fix the issue, the bigger the dollar impact is on our organisation.

Simon Trilsbach:

A lot of the times, there may be a fix to the issue, but it's not a permanent fix. So you'll see the same problem cropping up time and time again. So what we're really trying to do is we're trying to gain time, and by gaining time, we reduce the cost within our business.

Simon Trilsbach:

So monitoring end-user experience, just to summarise why it's important, it's a fantastic way to produce performance benchmarks and SLAs. So a great method of developing application performance baselines or KPIs. Because you're always running the test from a known state in a controlled environment.

Simon Trilsbach:

It's a regular heartbeat for application performance. So as soon as performance degrades, you will know, in other words, you are buying time. So it provokes action before the users were impacted. An excellent leading indicator of something not acting in a normal way, and a robust dataset that feeds into an AI ops framework. And again, we'll unpack this further in the presentation.

Simon Trilsbach:

I'm going to hand it over to Andrew Newlands now, who will walk you through the demo. What we're going to see is how we set up a synthetic test without any code, without any scripting, without any embedded agents. I'm just going to move on to the next slide. Andrew, it's over to you.

Andrew Newlands:

Thank you, Simon. So this is actual video footage of a synthetic test or transaction being recorded within 2 Steps. Now, what you can see is that we have the application or system under a test displayed right here inside 2 Steps, almost as if you're running the Citrix virtual desktop, virtual app yourself.

Andrew Newlands:

And you can see to build a test, I click on the icons or type in the text boxes. 2 Steps on the right there is building up a script of what it thinks I want to do and testing each action as we go. In reality, what's happening is 2 Steps is acting as kind of a middleman. So the real application is running on a headless Linux server. And its user interface is being displayed through 2 Steps onto the front end.

Andrew Newlands:

And whenever I click on a button say or type in a box, 2 Steps is looking at the screen using computer vision techniques and trying to figure out what visual cues on the screen I'm reacting to. So you can see when I clicked on that text box there, a box appeared with a cross in the middle. The cross here represents the actual point where I clicked, and the box represents what 2 Steps have decided is a sufficient template to detect what I'm looking for.

Andrew Newlands:

I can give it a hint. So you'd have seen there that I stretched the box out to grab a whole block of text. That's useful both in creating a more intuitive, bigger template and also in avoiding things like saying a banner ad on a web page that will change over time, you don't want to include in the script.

Andrew Newlands:

Now, what you can see here is that I've taken all the finalised tests, and I'm adding checkpoints. So rather than looking at individual steps, I'm grouping steps together into logical blocks, like logging into a storefront or opening an app or ordering a functional test.

Andrew Newlands:

When we save and replay the test ad nauseum in the scheduled operation, 2 Steps will gather statistics not just for the test as a whole, but also for each of these checkpoints. So we'll be able to track their performance and any failures over time as a time series of data. And this last step here is me just scheduling the test to run every five minutes, which will lead us into the next slide, but we'll see the resulting time series in Splunk.

Andrew Newlands:

So as promised, this is a very simple visualisation of those five checkpoints, and how their performance varied over time.

Andrew Newlands:

This is a standard Splunk visualisation. So it's built on a regular query. There's no proprietary data once... our data model, it's not proprietary. So once it's in Splunk, you can use the usual tools that you're used to, to chop it up and visualise it and combine our data in your dashboards as per normal.

Andy Bearsley:

I love this. This is Andy from Splunk here. Because we've had for the last few years, the ability to do synthetic monitoring, running these scripts, hitting a web application, and that interacts with the document object model in the browser and it drives on a synthetic basis, web activities.

Andy Bearsley:

But from the Splunk side, having insights into the Citrix user experience has always been a blind spot. I can see that this takes that very similar model. That it seems to apply this purely from driving the GUI perspective. I think this is a really valuable thing.

Andy Bearsley:

I guess I've got a question for you, Andrew. So when these scripts are running, is it running from a single, central location?

Andrew Newlands:

That's a really good question, Andy. The answer is no, not necessarily. You can run multiple 2 Steps test nodes in different locations. So if you have say an internet facing application, you might want to test it from once you're inside your office and simultaneously from outside your data center over the internet.

Andrew Newlands:

And you can do that, you can schedule a test to run simultaneously from multiple locations, and thereby control for different networks involved. So you can connect over a mobile network, over the internet, over your LAN or GWAN or WAN, and see how those networks, for example, affect the test. Absolutely, yes, you can run from multiple locations.

Andy Bearsley:

Pretty good.

Anil Kumar:

All right, Andrew, I have one from my end. So when I was seeing the demo, I could see it looked like the administrators had to, as and when the tests are being recorded, they had to enter some kind of code or at different checkpoints, they had to enter a few things. So to these admins who is managing, need any kind of coding background, or do they need any kind of training to get this all going?

Andrew Newlands:

Thanks, Anil. No, not really. What you were seeing was not coding, that was naming the steps, the images, the templates. So you can just click your way through it and it will work. But we use generic names like step one, image one, et cetera. Whereas to make the test more readable, when I come back to it, I like to name this, but this is me clicking the login button.

Andrew Newlands:

To actually learn to use 2 Steps is not much harder than using the application under a test. In fact, probably it's more valuable the person doing the test building be an application expert or subject matter expert for the application of a workflow, rather than a technical person. Training someone on 2 Steps is easy, whereas learning the ins and outs of an application of a business workflow is often a lot harder.

Anil Kumar:

Great, thank you for that, Andrew.

Andrew Newlands:

My pleasure. This last slide of the demo is another one of our dashboards from Splunk. And it's a feature we're quite proud of. It's the ability to watch a video replay of a test, right there inside Splunk, with indeed an overlay on the right showing you the checkpoints as they're passed.

Andrew Newlands:

This is a really great communication tool, because as Simon alluded to earlier, often when something breaks, it's very hard to get a concise description of what is broken. Sometimes Chinese whispers will come into play. Someone will phone the help desk and complain about something and that will be translated into maybe another issue that sounds the same, but isn't, and that will come to the technical team and they'll be like, "What does this really mean? We don't understand."

Andrew Newlands:

Whereas with 2 Steps, when something breaks, you can have a video recording of it breaking. And you can copy that link or copy that video and pass it around and say, "This is the problem. This is what we need to fix." And that really crystallises communications.

Anil Kumar:

Right. I think that's something which is very helpful. And it always helps for somebody who's troubleshooting to have the recording, because you may not get the end-user to reproduce the issue. So that definitely helps.

Andrew Newlands:

Yes, this is a feature that's driven in large part by my own professional experience working for a large financial institution where we used to find that it was very, very hard to get a good description of a problem. And a large part of that mean time to resolution was actually built up of defining exactly what was broken so we can start fixing it. And this is meant to eliminate that phase as much as possible.

Simon Trilsbach:

Wonderful. Thanks, Andrew. Yeah, so again, this concept of clawing back time, which is going to save the business money. So the benefits of 2 Steps plus Splunk for Citrix, flexibility, ease and integration.

Simon Trilsbach:

I think the feedback that we have had is that there isn't anything like this in the marketplace, in terms of 2 Steps' power in relation to automation of workflows within a Citrix environment. As Andy from Splunk mentioned earlier, it's reasonably easy to do this with selenium for web applications, but anything outside of that has been challenging up until now. So it has unparalleled flexibility, in terms of the application, that it can monitor or the platforms it can monitor.

Simon Trilsbach:

Now, today, we're talking about Citrix virtualised applications. But 2 Steps also has the ability to implement synthetic monitoring across Android, Windows, Internet Explorer, those legacy CRMs that you see, such as Siebel. Also, things like two-factor authentication are frameworks that we can work with. So incredible flexibility, ease of use.

Simon Trilsbach:

Andrew touched on this, you do not need an expensive development resource to set up the tests. In fact, we recommend that you have a business stakeholder that understands the application that you want to monitor, to set up the tests and it really is quite easy to do. I am not a developer, and I set up all my tests when I'm going to talk to prospects. So if I can do it, anybody can.

Simon Trilsbach:

And then the third component is this idea of everything being integrated and the consolidation of data. So what that is providing in terms of benefit is the ability to make correlations between the infrastructure components that are powering the business service and the end-user experience of the application.

Simon Trilsbach:

What that means in terms of our time diagram is that through 2 Steps and Splunk alerts, we're now getting ahead of the curve. We now have a regular heartbeat. We have set up benchmarks and baseline KPIs of what good performance looks like. And as soon as that deviates, as soon as that slows down to a state that we're unhappy with or it breaks completely, then we're going to know immediately. That gives us an opportunity to get in front of it before the majority of our users are impacted.

Simon Trilsbach:

Okay, so we're going to move on to stage two now, which is really this idea of predictive. So is it possible for me to even predict IT issues before they happen? I'm going to hand it over to Andy from Splunk that's going to give you a bit of an overview of the way Splunk sees machine learning and AI ops. Andy, over to you.

Andy Bearsley:

Thanks, Sam. Yeah, look, it's a really interesting thing when we've got people talking about AI ops. And as we mentioned earlier, the way that Gartner describes AI ops is that it's taking IT operations data, putting that into a big data platform, and then applying machine learning on top to look at patterns to make life easier for IT operations teams. Really, the aim is to actually prevent incidents from happening in the first place.

Andy Bearsley:

So think big data, and then machine learning. So Splunk natively, naturally is a big data platform. And when we apply machine learning on top of that, we can do some amazing things. And when we look at machine learning, there are three main categories that come into play.

Andy Bearsley:

So the first category of machine learning is around anomaly detection. And anomaly detection really tells us, is the application and the infrastructure experience, is it behaving normally or not? Is it normal or not? And understanding when something is not normal, that gives us the ability to jump in and see the very, very start of an issue. So to jump in and remediate it before it actually impacts our customers. So that's anomaly detection, it's normal versus not normal.

Andy Bearsley:

And then the next stage from there is to be able to predict the health of a service. And a service might be a combination of both technical end user experience and business KPIs, key performance indicators. And looking at the pattern of behaviour and predicting future health is a really valuable thing to do. And we've found in Splunk, our sweet spot is predicting the health of service about 30 minutes in advance.

Andy Bearsley:

But I guess we need some key ingredients to go into that predictive model. And one of those key ingredients is the end user experience, because that's a lead indicator. It's a really good way, it's a good metric that helps us to understand what the future behaviour might be like.

Andy Bearsley:

And predictive analytics is really good for issues that tend to repeat themselves. And it's surprising, in our experience, a lot of incidents tend to be from a root cause that just happens again and again. These are typically ones that are just diabolically hard to actually troubleshoot and reproduce. So being able to predict an incident based on what we're seeing right now is a really valuable thing for IT operations teams.

Andy Bearsley:

The third I guess the category of machine learning is around clustering. This is where we use our machine learning algorithms to look at the tens or hundreds or thousands of events, alerts and alarms taking place. We can cluster these together.

Andy Bearsley:

So we can say, well, we might have had 1,000 alerts. But actually, it's related to one single episode, which might have a timeline. And doing the clustering is a really great way of reducing the amount of distraction that the operation and support teams have. So they can see a very, very clean signal of what's going on through what can otherwise be a very noisy environment. So those are the three categories of machine learning that we apply on top of our bigger platform, really to provide AI ops capabilities to our customers.

Andy Bearsley:

So with this next part, I'll show you what it looks like once we take the user experience from Citrix via 2 Steps and when we put that into Splunk. We'll show you how we can apply some machine learning to help customers predict and react very, very quickly to incidents.

Andy Bearsley:

And the module that I'm going to show you in Splunk is called IT Service Intelligence. A lot of our customers affectionately refer to it as ITSI or ITSI. But that's really, you can think of this as being I guess an easy button for machine learning for your IT operations data. So, welcome to ITSI.

Andy Bearsley:

This is a view of Splunk and we are looking at the health of a service. And so you can see in this case, the service is really the remote end user experience for our New York office. And if you look at that tree diagram, this is where in Splunk we are visualising the dependencies of the service and how they can affect each other. So we can see right now that the health, the colour of our service is orange right now.

Andy Bearsley:

And the question is, "Well, that's interesting, but why has it gone orange?" Because orange indicates that we need to apply some attention because something's going wrong. I can look at the dependencies and I see there are some key dependencies for our local area network, our virtual desktop delivery and the virtual app delivery.

Andy Bearsley:

And it's interesting, if I look down the tree, I can see that the virtual desktop and virtual app delivery, they both rely on the authorisation service. And that's interesting because I can see that the authorisation service is having an issue.

Andy Bearsley:

And what does the authorisation service rely on? Well, it relies on Active Directory, and IIS. IIS is actually behaving really well at the moment. I can remove that as a possibility for things that are going wrong.

Andy Bearsley:

And this really brings my attention down to the bottom left-hand corner down to the Active Directory technical service. This is really interesting for us. There are a number of key performance indicators that tell me the health of Active Directory right now. I can look at the metrics on the right-hand side of the screen and I can see that the storefront response time, it's really having a challenge at the moment. I can see very, very recently it's spiked up.

Andy Bearsley:

So this, very quickly, from a visual perspective, allows me to look at the overall service, and really be able to understand what key component might be causing the root cause. So you can imagine, if you're on the support desk, and you're looking at a large number of services, it's a really great way to be able to very quickly to drill down, to work out what the root cause might be. And then we can escalate it to the appropriate team.

Andy Bearsley:

So the question is, if the response time has recently spiked, what was the timeline, when did the issue actually start? That takes me into the drill down, where I can look at the timelines. And for me, the timelines are really valuable, because then we can work out, well, when did the system start to become not normal? And what went wrong first?

Andy Bearsley:

Because often, when you're in the middle of an incident, everything can be looking like it's going wrong. But understanding the thing that went wrong first is really critical. I can see that the service health score, we score it on a scale of zero to 100. So it was pretty steady for quite a few hours. And then it started to decrease, until we went from a status of green, which is normal, to yellow, which is not normal.

Andy Bearsley:

And if I look underneath the overall service, I can see in this case, the storefront logon and a lot of other people call it I guess a Citrix workspace logon. I can see that this was pretty good for a while and then it ramped up massively. So it went from green to orange to red, in a pretty short amount of time.

Andy Bearsley:

But that storefront logon, remember that relies on the authentication service. And the authentication service started to exhibit an issue much earlier on. So it's about three hours, we were having issues with the authentication response. We could see it went from green to yellow, so not orange or red. But yellow is really valuable, because that tells us when something is not normal.

Andy Bearsley:

Remember, the machine learning behind the scenes says this is not normal for this time of day. And if I drill down even further, I can see, well, why did the authentication response slow down? Well, it looks like the disk IO, the read ops on that service started to show patterns of being not normal much earlier. And so you can see here, this gives us a visual timeline to be able to just drill down and understand what went wrong first.

Andy Bearsley:

And the amazing thing here is that we're looking at the metrics. However, our support teams, they require some real evidence, they really want to know what was happening on the actual server behind the scenes? And because this is Splunk, we can provide data from many different sources. So we've got those metrics. But we also have a direct view here of the alerts and alarms that are happening on the storefront server.

Andy Bearsley:

And so this actually shows us what was going on in the logs. This is really a unique view of looking at both a combination of metrics and logs and alerts all in a single place. So this is a really valuable way to work out what went wrong first, so we can jump in and address that.

Andy Bearsley:

Now remember, we talked about different types of machine learning? So understanding if something is normal or not normal is one category of machine learning. But the other thing we can do in Splunk is provide a predictive view on the health of the service. And as I said earlier, I think 30 minutes in our experience is our sweet spot, where we can look at the service and predict, based on what we're seeing the machine learning says, "Well, that's interesting. Based on what's happening right now, we predict that the health of the service will be X."

Andy Bearsley:

And in this case, we can see that right now, we might have a high score for our service, say 95 to 100. But we're predicting in the next 30 minutes, it's going to drop down to 50, which is a significant drop.

Andy Bearsley:

And the thing about prediction is that you need to be able to explain why we predict that it's going to drop, what are the root causes? And in this case, Splunk is telling us that the storefront logon time is one of those key metrics. And if we're seeing that dip, we predict, based on what we've seen before, that the health of the service is going to dip.

Andy Bearsley:

So the key KPIs here are the end user experience, the sessions disconnected, the Active Directory CPU utilisation, and the disk IO read performance. So those, Splunk is telling us that those are the key indicators that's telling us that we're going to have a problem 30 minutes from now. And 30 minutes in the context of dealing with an incident, it's just magic to have that time back up your sleeve.

Simon Trilsbach:

I think it's fair to say that the end user experience KPI that's being produced by 2 Steps is reasonably valuable in this view.

Andy Bearsley:

It's exceptionally valuable. End user experience is one of these magic lead indicators that's really important for us, for predicting when we're going to have issues, that will actually affect customers.

Andy Bearsley:

So what happens when we get that predictive analysis, that machine learning that's running? So when that predicts that we're going to have an issue 30 minutes in advance, that can give us an alert, and we might feed that alert into our Incident Management System. So we might use this for instance, to create a ticket in ServiceNow and Revit are both good examples of that. And this provides us the ability to just have a headstart and get in front of the issue before it takes place.

Andy Bearsley:

So you can see down the bottom of the screen, we've got a critical episode that we've highlighted that might be happening in 30 minutes time. And so this is a great way of using machine learning to help the IT teams get in front of issues. And the end user experience from Citrix via 2 Steps is a critical ingredient in all of that process.

Anil Kumar:

I can tell you from my personal experience that the issue on the authentication servers or on the Active Directory, the impact is huge. And Citrix being used by a lot of enterprises to enable remote access and have their critical applications and desktops, this really stops users from logging in, causing huge downtimes and impacting business. So having this prediction 30 minutes earlier for the IT administrators, the Citrix administrators and everybody would really help to proactively remediate this issue.

Anil Kumar:

I think this is something which a lot of the admins who have joined here today would really love and would like to know more about.

Simon Trilsbach:

Thanks, Anil. Couldn't agree more. So back to the timeline, the orange line in the middle was really that proactive concept. So getting ahead of the events that impact all your users, by having that regular heartbeat through 2 Steps and Splunk alerts.

Simon Trilsbach:

But now we're looking at the green line, which is 2 Steps and Splunk machine learning through ITSI. And the claim is that there's no time to resolution, because we're getting on top of the issue, before there is an issue, before there is an impact.

Simon Trilsbach:

I think one of the things to call out is if you are on the webinar and you don't have ITSI, then that's okay, because there is still the stage one of being proactive and gaining time and our counsel would be, that's where you start. Implementing synthetic monitoring on user workflows within your Citrix virtual applications is going to have significant benefit and you're going to gain time, you're going to reduce risk and cost and effort. And once you've cracked that, then it's moving to this predictive state and that's where ITSI can help.

Simon Trilsbach:

Okay, I'm just going to bring this home. We spoke about the... Excuse me, my battery is just running a little bit low. Okay, we spoke about this idea of reducing cost, reducing effort, reducing risk. What have we spoke about? Well, let's talk about saving time. It's quicker to build tests, it's faster to get to the root cause of the problem. Simplifying, there's no code, there's no agents. It's the same method of setting up synthetic monitoring tests across all platforms.

Simon Trilsbach:

In regards to organisation and integration, all of your end user experience data and metrics fed directly into Splunk, your source of truth, your single pane of glass. In regards of connecting, we're connecting all of your remote sites' data into one single repository. And in regards to informing, we're producing an empirical dataset that informs you of what is happening from an end user experience perspective. And this is critical when you're moving to a predictive AI ops framework.

Simon Trilsbach:

When you put all of that together, what does it mean for the business? Well, it means that you're reducing risk, because there's less exposure to customer experience issues. You're reducing effort, because you now have a framework of implementing tests, which is way easier. You're reducing costs, because you're reducing the impact of IT issues.

Simon Trilsbach:

And that concludes the presentation. If anybody on the webinar is interested in learning more, then here is my contact details. Please drop me an email. We would be delighted to jump on a call and unpack this capability further, talk about some of the use cases that we've come across, and how it can help your business. And with that, I'll hand it back to Anil and hopefully we've got some questions that the presentation has provoked.

Anil Kumar:

Right, thanks very much for the presentation, Simon, Andrew and Andy. I've been monitoring the questions panel, and we've got quite a few questions from the audience. So let me pick a few from here.

Anil Kumar:

So the very first question is, could you monitor VDIs or Citrix virtual desktops as well? Because I think the question came because we showed how 2 Steps could build test cases on virtual labs. But could we do the same with virtual desktops as well?

Andrew Newlands:

Yeah, the answer is absolutely we can do virtual desktops as well. In fact, 2 Steps can monitor any system or application whereby you can get the screen fed into it. It can zoom Citrix virtual apps and desktops, VNC, Windows IDP, or any Linux loading program, as well as Android. It's quite easy to adapt other systems to fit in there as well. But yes, straight out the box, we can do both forms of Citrix application and virtual desktop.

Anil Kumar:

And does it matter if Citrix is hosted on premises or they are hybrid, using some of the public, some of the resources on the public cloud? Does it really matter where the Citrix infrastructure is deployed?

Andrew Newlands:

Not even slightly. 2 Steps itself is simply swallowing up the Citrix workspace client. As long as the workspace client can connect to the infrastructure, it will work out of the box. 2 Steps itself has no idea where your Citrix tests are running.

Anil Kumar:

So by workspace client, you mean the Citrix workspace app or the receiver which gets installed on all the end user machines to launch Citrix.

Andrew Newlands:

Exactly that. Yes.

Anil Kumar:

All right. Andy, when you were showing the demo, I could see you were clicking on each of the steps, how an end user would launch the application and at each step, you could record and also highlight the key pointers there. But I could also see a Splunk dashboard in the background. So for somebody who wants to use 2 Steps, how do they get started? Is it an app within Splunk? If you could provide some background on the technical requirements.

Andrew Newlands:

Yes, absolutely. As you've astutely observed, the front end is a Splunk app, which is on Splunk base. The back end runs on a Linux server, so it can be virtualised to be on the cloud or physical.

Andrew Newlands:

We recommend a Red Hat or CentOS machine, which you'll install a couple of standard Linux packages. There's some very lightweight configuration. But apart from that, it's really just a matter of putting the package onto a Linux box, pointing to your Citrix server, install the app into Splunk and off you go.

Anil Kumar:

Right. So not a lot of time to get started, so that's always good. I think I see one more redundant question, where they're asking, does it matter if virtual apps and desktops are implemented in a FSLogix environment?

Andrew Newlands:

No, indeed, it doesn't.

Anil Kumar:

Right. So one way, something related to building the test steps, I think. So one of the questions is, does it use PowerShell scripts to run the synthetic query?

Andrew Newlands:

No, it doesn't. What it's doing is on our back end server, it fires up a copy of a Citrix workspace application in a little container and it simulates keyboard and mouse input, going into the workspace application. So it literally moves the mouse pointer and clicks the button, as far as Citrix is concerned.

Andrew Newlands:

It is actually running on Linux in the back end. It's a custom .NET core assembly at a very technical level, but there's no PowerShell involved.

Anil Kumar:

Great. I remember initially when Simon and you presented the demo during the Citrix Ready validation, you also did mention after the steps have been recorded, you could also simulate the test based on the number of users who used the application. So maybe at the start of the day, 50 to 100 users access the application at the same time. So you could really simulate that within 2 Steps and load the data into Splunk to see if there's any problem which comes up. So that was something which was very cool.

Andrew Newlands:

Yeah, there are a number of other modules for 2 Steps, which we didn't show you here. One of them, as you alluded to is the bulk test module, which lets you spin up 100 or 1,000 individual users and have them run through the same job or a similar job in parallel, to beat the system and verify load. Obviously, that's not something you do for five minutes, for a monitoring job, but it's good for testing other aspects of the system.

Andrew Newlands:

There are other modules, for example, there's one to do two-factor authentication versus an external SMS provider, for example, which unfortunately we didn't have time to go through here. It's not directly relevant to this sort of application. But the core product is what you see, the automation piece, and it can be delivered in various ways. So running tests every five minutes to monitor something, or as you said, fire of 1,000 users at once to test capacity.

Anil Kumar:

Andy, a question to you, I think the ITSI model and the ability to identify or predict a problem in the environment, 30 minutes or so earlier is something very helpful to Citrix, for sure. But I personally had the opportunity to learn a little bit about Splunk during one of the trainings, and that was something which I did not know that you could load any kind of logs, data from your entire infrastructure and could come out with very intuitive results.

Anil Kumar:

So, if you could be brief us more on some of the other use cases, which will help our audience, it would be helpful.

Andy Bearsley:

Sure. I guess from a Splunk perspective, one of the things I really love is the fact that we can take data from many different data sources and have it in a single place and break down those silos. So for instance, for application teams, often they have blind spots to do with firewalls and the network environment. And the networking teams, they have a blind spot, in terms of, if they make a change, what impact it has on the application teams.

Andy Bearsley:

And then there are the infrastructure and the Citrix teams and everyone historically operates in their own bubble. So being able to bring all that data into one place and break down those silos, so that you can look at the... With your area of expertise, you can look at the impact that you're having on the business. And when you understand the impact that other teams are having on you, that's a really powerful thing to help reduce that time to restore a service.

Andy Bearsley:

And so having that visibility across silos is one of the key things that people look for when they're implementing Splunk. What we often recommend is pick a critical business service that relies on Citrix, but has a number of moving parts to it. And pick a business service that has a problem that's worth solving.

Andy Bearsley:

And then once people start on that journey, they can Splunk that service and get some very quick results, and then use that to expand out to the rest of the organisation. That's a typical journey that we see a lot of our Splunk community going down.

Anil Kumar:

Thanks so much for that, Andy. That's about the time we had for questions for today. I think we've run out of time, but we have more questions which have come in, but we'll definitely reach out to every one of you over the email.

Anil Kumar:

I also see some of the attendees asking if the webinar is being recorded. To answer that, yes, we will be sharing the recording with all the registrants to their email addresses shortly.

Anil Kumar:

And with that said, we are about to end today's webinar. I want to take a moment to thank all our speakers for making this fantastic presentation and sharing great insights with us. Thanks, Simon, Andrew and Andy. And last but not the least, I want to thank everyone who were able to attend today's webinar. And this shall conclude our broadcast for today. Thank you.

gettouchImg
Ready to get started?

Get in touch to create a trial account or book a demo

gettouchImg