Coffee_laptop

Setting Sights on Flash: An Interview with Josh De Jong of Leupold & Stevens

david-240-2Flashstorage.com recently had the opportunity to speak to Senior Virtualization & Storage Engineer Josh De Jong about $.40 coffee, flash storage, and modernizing IT within a company that deeply honors tradition.

Q: Would you mind telling us a little about yourself and Leupold & Stevens?

A: Leupold & Stevens is over 100-year-old manufacturing company based in Beaverton Oregon. We are in the hunting and shooting sports—-from range finders and scopes to binoculars and spotting scopes. And we’ve been doing it out the same building for over sixty years. The company has a relatively small IT shop. We have a four-person systems administration team along with a three-person help desk. Almost everything is virtualized. We’re about 98% VMware with only a few physical servers that we haven’t yet to get virtualized—due to their demanding workload. But the hope is that will get there soon enough.

I’ve been here at Leupold & Stevens for just over two years now. I’m a VMware certified professional, as well as a vExpert for 2015. I’m also one of the VMUG co-leaders for the Portland, Oregon area. Oh, and I also run a blog, which is 40centcoffee.wordpress.com. There, I keeping document my experiences with technology—everything from VMware to storage and backups and all that other stuff.

Q: 40centcoffee is a very interesting name. Where did it come from?

A: All the other jobs I ever worked at offer free coffee. But when I started at Leupold, I noticed they charge $.40 for a cup of coffee. So for some reason the name just stuck with me. So I started my blog shortly after I started working here, so that’s the name I went with.

Q: Leupold & Stevens sounds like an organization that steeped in tradition. How was it to convince your company’s executives to adopt a relatively new technology like flash storage?

A: You’re right, there’s a lot of tradition here. This place is well established. We’re not a company that is easily swayed by new trends and new technology.  We’re not a startup. At other organizations, you typically change direction/focus every quarter or every year. Here, we have a plan and we stick to that plan—year in and year out. We’ve been through it all before.  With over 100 years experience, our company has seen pretty much every trend in the industry. We have to do our due diligence when evaluating any software or hardware before it goes our facilities—whether it’s IT related or related to manufacturing process. And of course that includes flash storage.

Q: What was the main driver to consider flash storage?

A: Before I joined Leupold, the company had already made a large storage investment. We ran four or five different models with different purposes—some for our production workloads like Oracle, Exchange, and SQL Server and some for backups. We had purchased another filer right when I had started here. This one was flash-enabled, so it had spinning disk with flash acceleration for the read cache. And we had purchased this as part of a converged infrastructure solution—so Cisco UCS blades replacing our existing legacy Dell servers that are standalone and running VMware 10Gb infrastructure.

I quickly became concerned that this particular storage solution wasn’t going to deliver the performance that the business was demanding. I mean, our performance did increase over what we are seeing before on our older storage system. The problem was that it still wasn’t what we are what we were expecting—especially given what we paid for it. It just wasn’t what we were hoping to see. And after about a year of running on this on this filer, we just kind of felt like the flash “bolt-on” approach just wasn’t enough.

Keep in mind, we have mixed workload throughput; we’re running Exchange, Oracle, and SQL Server. These are all highly demanding applications, and they’re all fighting for the same 300GBs worth of flash for reads. It quickly became clear that it just wasn’t it enough to keep up with demand.

We decided to conduct a database stress test to find out what kind of performance we were actually able to get. And during that test, our DBA got up to about 6500 IOPS and just over a gigabit of throughput. And according to Oracle, we saw 20 to 30 milliseconds of latency within the Oracle application itself. Then the filer just fell over. It a just couldn’t keep up. Most of our VMs that we were running at the time paused and were inaccessible for about 30 Seconds—until we killed the stress test. After that, the filer just went back to the way it had been running before.

I really didn’t expect for our flash-enabled filer to peak at 6500 IOPS. The storage was only a year old! We needed at least 30,000 to 35,000 IOPS at hopefully sub-10 milliseconds of latency. Oracle database performance was really the main driver more than anything else.

Q: Is Oracle your primary application?

A: About half of our infrastructure is dedicated to Oracle—so from our test/dev and staging to upgrade and production environment. We run four different Oracle environments.  So the majority of what we have in our environment is Oracle or supporting Oracle. We’re also running Oracle in a virtual environment.

We have we virtualized the Linux operating system. However, the way that Oracle is configured is with Direct NFS mounts within the operating system directly to storage. So as opposed to presenting the VMDK directly to the Oracle Virtual Machine, it is within the VM operating system, which has a direct mount to storage.

Q: What requirements did you need in your flash storage array?

A: We were looking for low latency and high IOPS for the Oracle applications. But in addition, we wanted the flexibility to run other workloads from the same storage system.  We are primarily an NFS shop as a result of having Oracle. (In my experience, Oracle’s direct NFS has better performance when we virtualize It. We use Direct NFS and we want to be able to keep it that way.) But we also still have iSCSI that we need for Exchange and our SQL Server 2012 Cluster. As a result, we needed to be able to do both NFS and iSCSI.

We were also looking for something that supported variable block sizes. With our previous solution, you could have any block that you wanted as long as it’s 4K. That just wasn’t going to work for us since we were running multiple workloads.

We also were looking at flash arrays that offer inline compression and deduplication. The problem is, many traditional arrays don’t give you things like deduplication. And if they do, it’s post-process and you aren’t able to dedupe your data across your entire storage pool; you’re only doing it on the individual volume level, which means that you have to buy more more flash to handle the writes. I moved a 5.5TB VMDK to my old array, then I had to dedupe it. It took 40 hours to dedupe the file and nearly pegged my CPU for all 40 of those hours.

Q: How has flash storage benefited your organization?

A: Our flash storage array gave us the performance, capacity, and flexibility we needed. We conducted a thorough analysis of the technology. We did a POC, migrating one of our Oracle environments over and performed the same database stress test. We were able to get over 117,000 IOPS and almost 5 gigabits of throughput to the array. We also saw only 3 milliseconds of latency. Needless to say, our DBA was impressed.

One of the limitations before was the processing power it took to handle post-process deduplication and compression. With our new array, we are able to turn on compression on our database volumes and get a 65% data reduction rate. On our old array, we didn’t touch any of it—no dedupe, no compression. But now we have both turned on, and we still see sub 30% CPU usage during normal operations.

Beyond the database stress test, we also ran an ETL job. We typically run this nightly for Business Intelligence. It’s basically a full load of our information. Before, it took over 17 hours to run. With the new hybrid array, it only took 4 hours, which means we can run it on our production environment at the end of the day and we’ll actually have up-to-date numbers in the morning that the business can use.

And from a multi-protocol standpoint, we are running Oracle, Exchange, SQL Server from a single array. Everything is running along without an issue. That being said, while we originally planned for just one box here in our Portland office, we ultimately decided to go ahead buy a second hybrid array that we’ll throw in our Denver facility for disaster recovery. So now we’re replicating all of our data that resides here in our Portland office to our Denver facility.

Q: Finally, what topics do you write about on 40centcoffee.WordPress.com?

A: For some of the newer technology, there isn’t a lot of documentation. You might be able to find a post here and there in the forums, but it’s often sparse. Part of my job here at Leupold is to document our processes and procedures so we have a standardized way of doing things. That’s why I put things out on my blog, 40centcoffee.wordpress.com. I really wanted to be able to share my experiences to help others. If you don’t have a POC box in house, or you don’t have the time to go through it, you can still learn about how to set up projects, create additional IP address, configure replication, or register your array within vCenter. Those types of things weren’t out there in the world, so that’s what I really wanted share. My blog is a convenient place where I can put all that information.

Josh recently participated in a webinar with our friend W. Curtis Preston of Truth in IT and Tegile Systems, our sponsor. You can watch it here on-demand.

You can also follow Josh on Twitter @EuroBrew