LINUX IP . I Since 1994: The Original Magazine of the\Linu^Com\mirutV \\\\ a WlV \ b mv \\ WWW vj AD w \\> SERVER\ \ HARDENIN TIPS AND TRICK MANAGE LINUX SYSTEMS WITH PUPPET / \ v £5 Puppet Application Orchestration Eliminate IT complexity AP“PR^ NOVEMBER 2015 | ISSUE 259 | www.linuxjournal.com w \Vs ON \\\\ HOW-TO: WliFiWetwork Installation What's the Future for Big Data? PERFORMANCE TESTING FOR WEB APPLICATIONS FLASH ROMs WITH A RASPBERRY PI WATCH: ISSUE OVERVIEW Practical books for the most technical people on the planet. GEEK GUIDES Download books for free with a simple one-time registration. http://geekguide.linuxjournal.com Improve Business Processes with an Enterprise Job Scheduler Improve Business Processes with an Enterprise Job Scheduler Author: Mike Diehl Sponsor: Skybot SPONSORED BY x T InterMapper GEEK GUIDE Finding Your Way Mapping Your Network to Improve Manageability ( SPONSORED BY r 0 GeoTrust 3EEK GUIDE CO DIY Commerce Site Author: Reuven M. Lerner Sponsor: GeoTrust SPONSORED BY / puppet GEEK GUIDE CO DIY Commerce Combating Infrastructure LINUX TTliTli Site Sprawl mux TrnTI SPONSORED BY _ iCJa.cS & Cintei) 1 SEEK GUIDE CO i Get in the Fast Lane with NVMe Get in the Fast Lane with NVMe Author: Mike Diehl Sponsor: Silicon Mechanics & Intel SPONSORED BY GEEK GUIDE Take Control of Growing Redis NoSQL Server Clusters SPONSORED BY Bit9 GEEK GUIDE Linux in the Time of Malware Linux in the Time of Malware Author: Federico Kereki Sponsor: Bit9 + Carbon Black SPONSORED 8Y ff) GeoTrust GEEK GUIDE 1 CO Web Servers and SSL Authentication UNU} ( TTnTli Finding Your Way: Mapping Your Network to Improve Manageability Author: Bill Childers Sponsor: InterMapper Combating Infrastructure Sprawl Author: Bill Childers Sponsor: Puppet Labs Take Control of Growing Redis NoSQL Server Clusters Author: Reuven M. Lerner Sponsor: IBM Apache Web Servers and SSL Encryption Author: Reuven M. Lerner Sponsor: GeoTrust CONTENTS NOVEMBER 2015 ISSUE 259 SYSTEM ADMINISTRATION \ \ \ \ \ \\YV\ 52 Managing Linux Using Puppet Vv^ Managing 1 ,your servers doe have to be a chomY/ithVupps David Barton \\\\\ 68 Server Hard A look at some esser to follow t^Vnitigate Greg Bledsoe \ wmr ON THE COVER • Server Hardening Tips and Tricks, p. 68 • Manage Linux Systems with Puppet, p. 52 • Performance Testing for Web Applications, p. 22 • Flash ROMs with a Raspberry Pi, p. 34 • How-To: Wi-Fi Network Installation, p. 38 • What’s the Future for Big Data?, p. 84 Cover Image: © Can Stock Photo Inc. / Anterovium 4 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS 22 Reuven M. Lerner’s At the Forge Performance Testing 28 Dave Taylor’s Work the Shell Words—We Can Make Lots of Words 34 Kyle Rankin’s Hack and / Flash ROMs with a Raspberry Pi 38 Shawn Powers’ The Open-Source Classroom Wi-Fi, Part II: the Installation 84 Doc Searls’ EOF How Will the Big Data Craze Play Out? IN EVERY ISSUE 8 Current lssue.tar.gz 10 UPFRONT 20 Editors’ Choice 46 New Products 91 Advertisers Index LINUX JOURNAL (ISSN 1075-3583) is published monthly by Belltown Media, Inc., PO Box 980985, Houston, TX 77098 USA. Subscription rate is $29.50/year. Subscriptions start with the next issue. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 5 LINUX JOURNAL Subscribe to Linux Journal Digital Edition for only $2.45 an issue. ENJOY: Timely delivery Off-line reading Easy navigation LINUX JOURNAL Executive Editor Jill Franklin jill@linuxjournal.com Senior Editor Doc Searls doc@linuxjournal.com Associate Editor Shawn Powers shawn@linuxjournal.com Art Director Garrick Antikajian garrick@linuxjournal.com Products Editor James Gray newproducts@linuxjournal.com Editor Emeritus Don Marti dmarti@linuxjournal.com Technical Editor Michael Baxter mab@cruzio.com Senior Columnist Reuven Lerner reuven@lerner.co.il Security Editor Mick Bauer mick@visi.com Hack Editor Kyle Rankin lj@greenfly.net Virtual Editor Bill Childers bill.childers@linuxjournal.com Contributing Editors Ibrahim Haddad • Robert Love • Zack Brown • Dave Phillips • Marco Fioretti • Ludovic Marcotte Paul Barry • Paul McKenney • Dave Taylor • Dirk Elmendorf • Justin Ryan • Adam Monsen President Publisher Associate Publisher Director of Digital Experience Accountant Carlie Fairchild publisher@linuxjournal.com Mark Irgang mark@linuxjournal.com John Grogan john@linuxjournal.com Katherine Druckman webmistress@linuxjournal.com Candy Beauchamp acct@linuxjournal.com Linux Journal is published by, and is a registered trade name of, Belltown Media, Inc. PO Box 980985, Houston, TX 77098 USA Phrase search and highlighting Ability to save, clip and share articles Editorial Advisory Panel Nick Baronian Kalyana Krishna Chadalavada Brian Conner • Keir Davis Michael Eager • Victor Gregorio David A. Lane • Steve Marquez Dave McAllister • Thomas Quinlan Chris D. Stark • Patrick Swartz Embedded videos Android & iOS apps, desktop and e-Reader versions Advertising E-MAIL: ads@linuxjournal.com URL: www.linuxjournal.com/advertising PHONE: +1 713-344-1956 ext. 2 Subscriptions E-MAIL: subs@linuxjournal.com URL: www.linuxjournal.com/subscribe MAIL: PO Box 980985, Houston, TX 77098 USA LINUX is a registered trademark of Linus Torvalds. SUBSCRIBE TODAY! Puppet Application Orchestration Application Delivery Made Simple Model complex, distributed applications as Puppet code so you can quickly and reliably roll out new infrastructure and applications. Learn more at puppetlabs.com AP U PR£ Current_lssue.tar.gz Get Smart SHAWN POWERS W anna get smart? Use Linux. (Mic drop.) I hope you all rolled your eyes a bit, because although there's a kernel of truth there, everyone knows it takes a lot more than using Linux to be successful in IT. It takes hard work, planning, strategizing, maintaining and a thousand other things system administrators, developers and other tech folks do on a daily basis. Thankfully, Linux makes that work a little easier and a lot more fun I Reuven M. Lerner starts off this issue continuing his pseudo¬ series on Web performance enhancements. The past few months he has described how to deal with bottlenecks on your systems. Here, he looks at some ways to help suss out those hard-to-find problems before they become showstoppers. Whether you're trying to test a product proactively or trying to pressure a troublesome system into VIDEO: Shawn Powers runs through the latest issue. showing its underlying problems, Reuven's column will be very helpful. Dave Taylor continues his theme on making words, and this month, he shifts the focus from wooden building blocks to tinier wooden blocks—namely, Scrabble tiles. If you're stuck for a word and don't feel like a horrible cheating liar for using a script to help you, Dave's column likely will appeal to you. I'm pretty sure my Aunt Linda has been using Dave's script for years, because I just can't seem to beat her at Words With Friends. Although he's normally the geekiest in the bunch, Kyle Rankin goes to a new level of awesome this month when he revisits Libreboot. This time, his new laptop can't be flashed using software, so instead he actually uses a second computer to flash the chip on the motherboard with wires! I'm not sure how I can get to his level of nerdery in my column, other than maybe announcing my upcoming Raspberry-Pi- powered moon rover. Seriously though, Kyle's column is a must-read. I finish up my Wi-Fi series in this 8 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM CURRENT ISSUE.TAR.GZ issue with an article about hardware. Understanding theory, channel width and frequency penetration is all well and good, but if you put your access points in the wrong place, your performance will still suffer. Knowledge and execution go together like peanut butter and chocolate, so using last month's theory to build this month's network infrastructure should be delicious. Even if you already have a decent Wi-Fi setup in your home or office, my article might help you tweak a little more performance out of your existing network. David Barton helps teach us to be smarter IT professionals by giving us a detailed look at Puppet. DevOps is all the rage for a very good reason. Tools like Puppet can turn a regular system administrator into a system superhero and transform developers into solution-delivering pros. David shows how to manage your Linux servers in a way that is scalable, repeatable and far less complicated than you might think. Managing multiple servers is great, but if those servers aren't secure, you're just scaling up a disaster waiting to happen. Greg Bledsoe walks through the process of server hardening. It's a stressful topic, because making sure your servers are secure is the hallmark of what it means to be a successful administrator. Unfortunately, it's also a moving target that can keep you up at night worrying. In his article, Greg explores some best practices along with some specific things you can do to make your already awesome Linux servers more secure and reliable. Whether you manage a simple Web server or a farm of cloud instances delivering apps, server hardening is vital. I think Spiderman said it best: "With great power comes great responsibility." That's true in life, but also true in computing. It's easy to take Linux for granted and assume that it's so secure out of the box, you needn't worry about it, or assume that since Linux is free, there's no cost when your infrastructure grows. By being smart about how you manage computers, you can take advantage of all the awesomeness Linux has to offer without falling victim to being overwhelmed or overconfident. Want to get smart? Do smart things. That's really the only waylH Shawn Powers is the Associate Editor for Linux Journal. He’s also the Gadget Guy for LinuxJournal.com. and he has an interesting collection of vintage Garfield coffee mugs. Don’t let his silly hairdo fool you. he’s a pretty ordinary guy and can be reached via e-mail at shawn@linuxjournal.com Or. swing by the #linuxjournal IRC channel on Freenode.net. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 9 FRONT NEWS + FUN diff -u WHAT’S NEW IN KERNEL DEVELOPMENT The NMI (non-masking interrupt) system in Linux has been a notorious patchwork for a long time, and Andy Lutomirski recently decided to try to clean it up. NMIs occur when something's wrong with the hardware underlying a running system. Typically in those cases, the NMI attempts to preserve user data and get the system into as orderly a state as possible, before an inevitable crash. Andy felt that in the current NMI code, there were various corner cases and security holes that needed to be straightened out, but the way to go about doing so was not obvious. For example, sometimes an NMI could legitimately be triggered within another NMI, in which case the interrupt code would need to know that it had been called from "NMI context" rather than from regular kernel space. But, the best way to detect NMI context was not so easy to determine. Also, Andy saw no way around a significant speed cost, if his goal were to account for all possible corner cases. On the other hand, allowing some relatively acceptable level of incorrectness would let the kernel blaze along at a fast clip. Should he focus on maximizing speed or guaranteeing correctness? He submitted some patches, favoring the more correct approach, but this was actually shot down by Linus Torvalds. Linus wanted to favor speed over correctness if at all possible, which meant analyzing the specific problems that a less correct approach would introduce. Would any of them lead to real problems, or would the issues be largely ignorable? As Linus put it, for example, there was one case where it was theoretically possible for bad code to loop over infinitely recursing NMIs, causing the stack to grow without bound. But, the code to do that would have no use whatsoever, so any code that did it would be buggy anyway. So, Linus saw no need for Andy's patches to guard 10 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM [UPFRONT i against that possibility. Going further, Linus said the simplest approach would be to disallow nested NMIs—this would save the trouble of having to guess whether code was in NMI context, and it would save all the other usual trouble associated with nesting call stacks. Problem solved! Except, not really. Andy and others proved reluctant to go along with Linus' idea. Not because it would cause any problems within the kernel, but because it would require discarding certain breakpoints that might be encountered in the code. If the kernel discarded breakpoints needed by the GDB debugger, it would make GDB useless for debugging the kernel. Andy dug a bit deeper into the code in an effort to come up with a way to avoid NMI recursion, while simultaneously avoiding disabling just those breakpoints needed by GDB. Finally, he came up with a solution that was acceptable to Linus: only in-kernel breakpoints would be discarded. User breakpoints, such as those set by the GDB user program, still could be kept. The NMI code has been super thorny and messed up. But in general, it seems like more and more of the super-messed-up stuff is being addressed by kernel developers. The NMI code is a case in point. After years of fragility and inconsistency, it's on the verge of becoming much cleaner and more predictable.— zackbrown They Said It If a problem has no solution, it may not be a problem, but a fact—not to be solved, but to be coped with over time. —Shimon Peres Happiness lies not in the mere possession of money. It lies in the joy of achievement, in the thrill of creative effort. —Franklin D. Roosevelt Do not be too moral. You may cheat yourself out of much life. Aim above morality. Be not simply good; be good for something. —Henry David Thoreau If you have accomplished all that you planned for yourself, you have not planned enough. —Edward Everett Hale The bitterest tears shed over graves are for words left unsaid and deeds left undone. —Harriet Beecher Stowe WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 11 [UPFRONT i Android Candy: If You’re Not Using This, Then Do That The "If This Then That" site has been around for a long time, but if you haven't checked it out in a while, you owe it to yourself to do so. The Android app (which had a recent name change to simply "IF") makes it easy to manipulate on the fly, and you're still able to interact with your account on its Web site. The beauty of IFTTT is its ability to work without any user interaction. I have recipes set up that notify me when someone adds a file into a shared Dropbox folder, which is far more convenient than constantly checking manually. I also manage all my social network postings with IFTTT, so if I post a photo via Instagram or want to send a text update to Facebook and Twitter, all my social networking channels are updated. In fact, IFTTT even allows you to cross-post Instagram photos to Twitter and have them show up as native Twitter images. If you're not using IFTTT to automate your life, you need to head over to http://ifttt.com and start now. If you're already using it, you should download the Android app. ■1G7 (Image via Google Play Store) which has an incredible interface to the already awesome IFTTT back end. Get it at the Play Store today; just search for "IF" or "IFTTT"—either will find the app. —SHAWN POWERS 12 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Install Windows? Yeah, Open Source Can Do That. For my day job, I occasionally have to demonstrate concepts in a Windows environment. The most time-consuming part of the process is almost always the installation. Don't get me wrong; Linux takes a long time to install, but in order to set up a multi-system lab of Windows computers, it can take days! Thankfully, the folks over at https://automatedlab.codeplex.com have created an open-source program that automatically will set up an entire lab of servers, including domain controllers, user accounts, trust relationships and all the other Windows things I tend to forget after going through the process manually. Because it's script-based, there are lots of pre-configured lab options ready to click and install. Whether you need a simple two-server lab or a complex farm with redundant domain controllers, Automated Lab can do the heavy lifting. Although the tool is open source, the Microsoft licenses are not. You need to have the installation keys and ISO files in place before you can build the labs. Still, the amount of time and headaches you can save with Automated Lab makes it well worth the download and configuration, especially if you need to build test labs on a regular basis. —SHAWN POWERS LINUX JOURNAL Fit Your Service SUBSCRIPTIONS: Linux Journal is available in a variety of digital formats, including PDF, .epub, .mobi and an on-line digital edition, as well as apps for iOS and Android devices. Renewing your subscription, changing your e-mail address for issue delivery, paying your invoice, viewing your account details or other subscription inquiries can be done instantly on-line: http://www.linuxjournal.com/subs. E-mail us at subs@linuxjournal.com or reach us via postal mail at Linux Journal, PO Box 980985, Houston, TX 77098 USA. Please remember to include your complete name and address when contacting us. ACCESSING THE DIGITAL ARCHIVE: Your monthly download notifications will have links to the various formats and to the digital archive. To access the digital archive at any time, log in at http://www.linuxjournal.com/digital. LETTERS TO THE EDITOR: We welcome your letters and encourage you to submit them at http://www.linuxjournal.com/contact or mail them to Linux Journal, PO Box 980985, Houston, TX 77098 USA. Letters may be edited for space and clarity. WRITING FOR US: We always are looking for contributed articles, tutorials and real-world stories for the magazine. An author's guide, a list of topics and due dates can be found on-line: http://www.linuxjournal.com/author. FREE e-NEWSLETTERS: Linux Journal editors publish newsletters on both a weekly and monthly basis. Receive late-breaking news, technical tips and tricks, an inside look at upcoming issues and links to in-depth stories featured on http://www.linuxjournal.com. Subscribe for free today: http://www.linuxjournal.com/ enewsletters. ADVERTISING: Linux Journal is a great resource for readers and advertisers alike. Request a media kit, view our current editorial calendar and advertising due dates, or learn more about other advertising and marketing opportunities by visiting us on-line: http://ww.linuxjournal.com/ advertising. Contact us directly for further information: ads@linuxjournal.com or + 1 713-344-1956 ext. 2. r WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 13 [UPFRONT i Recipy for Science More and more journals are demanding that the science being published be reproducible. Ideally, if you publish your code, that should be enough for someone else to reproduce the results you are claiming. But, anyone who has done any actual computational science knows that this is not true. The number of times you twiddle bits of your code to test different hypotheses, or the specific bits of data you use to test your code and then to do your actual analysis, grows exponentially as you are going through your research program. It becomes very difficult to keep track of all of those changes and variations over time. Because more and more scientific work is being done in Python, a new tool is available to help automate the recording of your research program. Recipy is a new Python module that you can use within your code development to manage the history of said code development. Recipy exists in the Python module repository, so installation can be as easy as: pip install recipy The code resides in a GitHub repository, so you always can get the latest and greatest version by cloning the repository and installing it manually. If you do decide to install manually, you also can install the requirements with the following using the file from the recipy source code: pip install -r requirements.txt Once you have it installed, using it is extremely easy. You can alter your scripts by adding this line to the top of the file: import recipy It needs to be the very first line of Python executed in order to capture everything else that happens within your program. If you don't even want to alter your files that much, you can run your code through Recipy with the command: python -m recipy my_script.py All of the reporting data is stored within a TinyDB database, in a file named test.npy. 14 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM [UPFRONT i Once you have collected the details of your code, you now can start to play around with the results stored in the test.npy file. To explore this module, let's use the sample code from the recipy documentation. A short example is the following, saved in the file my_script.py: import recipy import numpy arr = numpy.arange (10) arr = arr + 500 numpy.save( 1 test.npy 1 , arr) The recipy module includes a script called recipy that can process the stored data. As a first look, you can use the following command, which will pull up details about the run: recipy search test.npy On my Cygwin machine (the power tool for Linux users forced to use a Windows machine), the results look like this: Run ID: eb4de53f-d90c-4451-8e35-d765cb82d4f9 Created by berna_000 on 2015-09-07T02:18:17 Ran /cygdrive/c/Users/berna_000/Dropbox/writing/1j/ ^science/recipy/my_script.py using /usr/bin/python Git: commit 1149a58066ee6d2b6baa88ba00fd9effcf434689, in ^repo /cygdrive/c/Users/berna_000/Dropbox/writing, ^with origin https://github.com/joeybernard/writing.git Environment: CYGWIN_NT-10.0-2.2.0-0.289-5-3-x86_64-64bit, ^python 2.7.10 (default, Jun 1 2015, 18:05:38) Inputs: none Outputs: /cygdrive/c/Users/berna_0O0/Dropbox/writing/lj/ ^science/recipy/test.npy Every time you run your program, a new entry is added to the test.npy file. When you run the search command again, you will get a message like the following to let you know: ** Previous runs creating this output have been found. **Run with --all to show. ** If using a text interface isn't your cup of tea, there is a GUI available with the following command, which gives you a potentially nicer interface (Figure 1): recipy gui This GUI is actually Web-based, so once you are done running this command, you can open it in the browser of your choice. Recipy stores its configuration and the database files within the directory -/.recipy. The configuration is stored in the recipyrc file in this folder. The database files also are located here by default. But, WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 15 [UPFRONT i O recipy/recipy X \ G recipy python - Google Se... X Introducing recipy: effortless p... X ReciPy X + * # 127.0.0.1:49940 © C Q .Search ☆ £ 4- ft © O 5§l Most Visited Dashboard j Getting Started £ The Druid Network: All... ^ Feedly M Gmail Inbox Q YouTube ■ Letter Writers Alliance... ■ Open Library (Open Li... ifc MSK Forums ^ News RecipyGui Runs Latest run Search runs Runs £3 X = View details E Run ID: 926d55c7-792d-43d6-9c3b-a848c1331aaa Created by berna_000 on 2015/09/07 02:32 Ran /cygdrive/c/Users/berna_000/Dropbox/writing/lj/science/recipy/my_script.py using /usr/bin/python Environment: CYGWIN_NT-10.0-2.2.0-0.289-5-3-x86_64-64bit, python 2.7.10 (default, Jun 1 2015, 18:05:38) Inputs: none Outputs: /cygdrive/c/Users/berna_000/Dropbox/writing/lj/science/recipy/test.npy ® 0 EE H Desktop i^i mu Di in- Search the web and Windows o m A * Figure 1. Recipy includes a GUI that provides a more intuitive way to work with your run data. you can change that by using the configuration option: [database] path = /path/to/file.json This way, you can store these database files in a place where they will be backed up and potentially versioned. You can change the amount of information being logged with a few different configuration options. In the [general] section, you can use the debug option to include debugging messages or quiet to not print any messages. By default, all of the metadata around git commands is included within the recorded information. You can ignore some of this metadata selectively with the configuration section [ignored metadata]. If you use the di f f option, the output from a gi t diff command won't be stored. If instead you wanted to ignore everything, you could use the gi t option to skip everything related to git commands. You can ignore specific modules on either the recorded inputs or the outputs by using the configuration sections 16 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM [UPFRONT i [ignored inputs] and [ignored outputs], respectively. For example, if you want to skip recording any outputs from the numpy module, you could use: [ignored outputs] numpy If you want to skip everything, you could use the special all option for either section. If these options are stored in the main configuration file mentioned above, it will apply to all of your recipy runs. If you want to use different options for different projects, you can use a file named .recipyrc within the current directory with the specific options for the project. The way that recipy works is that it ties into the Python system for importing modules. It does this by using wrapping classes around the modules that you want to record. Currently, the supported modules are numpy, scikit- learn, pandas, scikit-image, matplotlib, pillow, GDAL and nibabel. The wrapper function is extremely simple, however, so it is an easy matter to add wrappers for your favorite scientific module. All you need to do is implement the PatchSimple interface and add lists of the input and output functions that you want logged. After reading this article, you never should lose track of how you reached your results. You can configure recipy to record the details you find most important and be able to redo any calculation you did in the past. Techniques for reproducible research are going to be more important in the future, so this is definitely one method to add to your toolbox. Seeing as it is only at version 0.1.0, it will be well worth following this project to see how it matures and what new functionality is added to it in the future.— joey Bernard LINUX JOURNAL on your e-Reader LINUX JOURNAL AN INDEPTH LOOK AT WI-FI TECHNOLOGY DODGE BLOCKED NETWORKS with an RPi RASPBERRY PI . . BUILD A 1 LARGE- 1 SCREEN COMMAND CENTER FIXING POTENTIAL PERSONAL BOTTLENECKS BOUNDARIES INWEBAPPS AND THE CLOUD Customized Kindle and Nook editions now available LEARN MORE WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 17 [UPFRONT i Simple Photo Editing, Linux Edition! (Image from http://www.pinta-project.com) A while back I wrote about the awesome open- source image editing program Paint.NET, which is available only for Windows. Although I'm thrilled there is an open- source option for Windows users, Paint.NET is one of those apps that is so cool, I wish it worked in Linux! Thankfully, there's another app in town with similar features, and it's cross-platform! Pinta isn't exactly a Paint.NET clone, but it looks and functions very much like the Windows-only image editor. It has simple controls, but they're powerful enough to do most of the simple image editing you need to do on a day-to-day basis. Whether you want to apply artistic filters, autocorrect color levels or just crop a former friend out of a group photo, Pinta has you covered. There certainly are more robust image editing options available for Linux, but often programs like GIMP are overkill for simple editing. Pinta is designed with the "less is more" mentality. It's available for Linux, OS X, Windows and even BSD, so there's no reason to avoid trying Pinta today. Check it out at http://www.pinta-project.com. —SHAWN POWERS 18 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM usenix ■ LISA 1 17 More craft. Less cruft. The LISA conference is where IT operations professionals, site reliability engineers, system administrators, architects, software engineers, and researchers come together, discuss, and gain real-world knowledge about designing, building, and maintaining the critical systems of our interconnected world. LISA15 will feature talks and training from: 1 Mikey Dickerson, United States Digital Service 1 Nick Feamster, Princeton University 1 Matt Harrison, Python/Data Science Trainer, Metasnake 1 Elizabeth Joseph, Hewlett-Packard 1 Tom Limoncelli, SRE, Stack Exchange, Inc 1 Dinah McNutt, Google, Inc 1 James Mickens, Harvard University 1 Chris Soghoian, American Civil Liberties Union 1 John Willis, Docker Register Today! Sponsored by USENIX in cooperation with LOPSA Nov. 8 - 13, 2015 Washington, D.C. usenix.org/lisa15 [EDITORS’ CHOICE] Tiny Makers If you've ever dropped Mentos in a bottle of Coke with kids or grown your own rock candy in a jar with string, you know how excited children get when doing science. For some of us, that fascination never goes away, which is why things like Maker Faire exist. If you want your children (or someone else's children) to grow into awesome nerds, one of the best things you can do is get them involved with projects at http://www.makershed.com. Although it's true that many of the kits you can purchase are a bit too advanced for kindergartners, there are plenty that are perfect for any age. You can head over to http://www.makershed.com/ collections/beginner to see a bunch 20 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM of pre-selected projects designed for beginners of all ages. All it takes is a dancing brush-bot or a handful of LED throwies to make kids fall in love with making things. Even if you don't purchase the kits from Maker Shed, I urge you to inspire the youngsters in your life into creating awesome things. If you guide them, they'll be less likely to do the sorts of things I did in my youth, like make a stun gun from an automobile ignition coil and take it to school to show my friends. Trust me, principals are far more impressed with an Altoid-tin phone charger for show and tell than with a duct-tape-mounted taser gun. You can buy pre-made kits at http://www.makershed.com or visit sites like http://instructables.com for homemade ideas you can make yourself. In fact, doing cool projects with kids is such an awesome thing to do, it gets this month's Editors' Choice award. Giving an idea the award might seem like an odd thing to do, but who doesn't love science projects? We sure do!— shawnpowers Powerful: Rhino Rhino M4800/M6800 • Dell Precision M6800 w/ Core i7 Quad (8 core) • 15.6"-17.3" QHD+ LED w/ X@3200xl800 • NVidia Quadro K5100M • 750 GB - 1 TB hard drive •Up to 32 GB RAM (1866 MHz) • DVD±RW or Blu-ray • 802.11a/b/g/n •Starts at $1375 • E6230, E6330, E6440, E6540 also available • High performance NVidia 3-D on an QHD+ RGB/LED • High performance Core i7 Quad CPUs, 32 GB RAM • Ultimate configurability — choose your laptop's features • One year Linux tech support — phone and email • Three year manufacturer's on-site warranty • Choice of pre-installed Linux distribution: Tablet: Raven Raven X240 • ThinkPad X240 by Lenovo • 12.5" FHD LED w/ X@1920xl080 •2.6-2.9 GHz Core i7 •Up to 16 GB RAM • 180-256 GBSSD •Starts at $1910 • W540, T440, T540 also available { Rugged: Tarantula Tarantula CF-31 • Panasonic Toughbook CF-31 • Fully rugged MIL-SPEC-810G tested: drops, dust, moisture & more • 13.1" XGA TouchScreen •2.4-2.8 GHz Core i5 •Up to 16 GB RAM • 320-750 GB hard drive / 512 GB SSD • CF-19, CF-52, CF-H2, FZ-G1 available EmperorLinux 0 www.EmperorLinux.com ri ...where Linux 8i laptops converge 1-888-651-6686 A Model specifications and availability may vary. COLUMNS AT THE FORGE Performance Testing REUVEN M. LERNER A look at tools that push your server to its limits, testing loads before your users do. In my last few articles. I've considered Web application performance in a number of different ways. What are the different parts of a Web application? How might each be slow? What are the different types of slowness for which you can (and should) check? How much load can a given server (or collection of servers) handle? So in this article, I survey several open-source tools you can use to better identify how slow your Web applications might be running, in a number of different ways. I should add that as the Web has grown in size and scope, the number and types of ways you can check your apps' speed also have become highly diverse, such that talking about "load testing" or "performance testing" should beg the question, "Which kind of testing are you talking about?" I also should note that although I have tried to cover a number of the most popular and best-known tools, there are dozens (and perhaps hundreds) of additional tools that undoubtedly are useful. If I've neglected an excellent tool that you think will help others, please feel free to send me an e-mail or a Tweet; if readers suggest enough such tools, I'll be happy to follow up with an additional column on the subject. In my next article. I'll conclude this series by looking at tools and techniques you can use to identify and solve client-side problems. Logfiles One of the problems with load testing is that it often fails to catch the problems you experience in the wild. For this reason, some of the best tools that you have at your disposal are the logfiles on your Web server and in your database. I'm a bit crazy about logfiles, in that I enjoy having more information than I'll really need written in there, just in case. Does that tend to make my applications 22 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 1 COLUMNS AT THE FORGE perform a bit worse and use up more disk space? Absolutely—but I've often found that when users have problems, I'm able to understand what happened better, and why it happened, thanks to the logfiles. This is true in the case of application performance as well. Regarding Ruby on Rails, for example, the logfile will tell you how long each HTTP request took to be served, breaking that down further into how much time was spent in the database and creating the HTML output ("view"). This doesn't mean you can avoid digging deeper in many cases, but it does allow you to look through the logfile and get a basic sense of how long different queries are taking and understand where you should focus your efforts. In the case of databases, logfiles are also worth a huge amount. In particular, you'll want to turn on your database's system that logs queries that take longer than a certain threshold. MySQL has the "slow query log", and PostgreSQL has the log_min_duration_statement configuration option. In the case of PostgreSQL, you can set log_min_duratiorestatement to be any number of ms you like, enabling you to see, in the database's log, any query that takes longer than (for example) 500 ms. I often set this number to be 200 or 300 ms when I first work on an application, and then reduce it as I optimize the database, allowing me to find only those that are truly taking a long time. It's true that logfiles aren't quite part of load testing, but they are an invaluable part of any analysis you might perform, in production or even in your load tests. Indeed, when you run the load tests, you'll need to understand and determine where the problems and bottlenecks are. Being able to look at (and understand) the logs will give you an edge in such analysis. Apachebench Once you've set up your logfiles, you are ready to begin some basic load testing. Apachebench (ab) is one of the oldest load-testing programs, coming with the source code for Apache httpd. It's not the smartest or the most flexible, but ab is so easy to use that it's almost certainly worth trying it for some basic tests. ab takes a number of different options, but the most useful ones are as follows: ■ n: the total number of requests to send. ■ c: the number of requests to make concurrently. ■ i: use a HEAD request instead of GET. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 23 COLUMNS AT THE FORGE Thus, if I want to start testing the load on a system, I can say: ab -n 10000 -c 100 -i http://myserver.example.com/ Note that if you're requesting the home page from an HTTP server, you need to have the trailing slash, or ab will pretend it didn't see a URL. As it runs, ab will produce output as it passes every 10% milestone. ab produces a table full of useful information when you run it. Here are some parts that I got from running it against an admittedly small, slow box: Concurrency Level: Time taken for tests: Complete requests: Failed requests: Total transferred: HTML transferred: Requests per second: Time per request: Time per request: 100 36.938 seconds 1000 0 1118000 bytes 0 bytes 27.07 [#/sec] (mean) 3693.795 [ms] (mean) 36.938 [ms] (mean, across all concurrent ^requests) Transfer rate: 29.56 [Kbytes/sec] received In other words, my piddling Web server was able to handle all 1,000 requests. But it was able to handle only 27 simultaneous requests, meaning that about 75% of the concurrent requests sent to my box were being ignored. It took 3.6 seconds, on average, to respond to each request, which was also pretty sad and slow. Just from these results, you can imagine that this box needs to be running more copies of Apache (more processes or threads, depending on the configuration), just to handle a larger number of incoming requests. You also can imagine that I need to check it to see why going to the home page of this site takes so long. Perhaps the database hasn't been configured or optimized, or perhaps the home page contains a huge amount of server-side code that could be optimized away. Now, it's tempting to raise the concurrency level (-c option) to something really large, but if you're running a standard Linux box, you'll find that your system quickly runs out of file descriptors. In such cases, you either can reconfigure your system or you can use Bees with Machine Guns, described below. So, what's wrong with ab? Nothing in particular, other than the fact that you're dealing with a simple HTTP request. True, using ab's various options, you can pass an HTTP authentication string (user name and password), set cookies (names and values), and even send POST and PUT requests whose inputs come from specified files. But if you're looking to check the timing and performance 24 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 1 COLUMNS AT THE FORGE of a set of user actions, rather than a single URL request, ab isn't going to be enough for you. That said, given that the Web is stateless, and that you're likely to be focusing on a few particular URLs that might be causing problems, ab still might be sufficient for your needs, assuming that you can set the authentication and cookies appropriately. The above also fails to take into account how users perceive the speed of your Web site, ab measured only the time it took to do all of the server-side processing. Assuming that network latency is zero and that JavaScript executes infinitely fast, you don't need to worry about such things. But of course, this is the real world, which means that client-side operations are no less important, as you'll see in my next article. Bees with Machine Guns (BWMG) If there's an award for best open-source project name, I think that it must go to Bees with Machine Guns. Just saying this project's name is almost guaranteed to get me to laugh out loud. And yet, it does something very serious, in a very clever way. It allows you to orchestrate a distributed denial-of-service (DDOS) attack against your own servers. The documentation for BWMG states this, but I'll add to the warnings. This tool has the potential to be used for evil, in that you can very easily set up a DDOS attack against any site you wish on the Internet. I have to imagine that you'll get caught pretty quickly if you do so, given that BWMG uses Amazon's EC2 cloud servers, which ties the servers you use to your name and credit card. But even if you won't get caught, you really shouldn't do this to a site that's not your own. In any event, Bees assumes that you have an account with Amazon. It's written in Python, and as such, it can be installed with the pip command: pip install beeswithmachineguns The basic idea of Bees is that it fires up a (user-configurable) number of EC2 machines. It then makes a number of HTTP requests, similar to ab, from each of those machines. You then power down the EC2 machines and get your results. In order for this to work, you'll need at least one AWS keypair (.pern file), which Bees will look for (by default) in your personal ~/.ssh directory. You can, of course, put it elsewhere. Bees relies on Boto, a Python package that allows for automated work with AWS, so you'll also need to define a ~/.boto file containing your AWS key and secret (that is, user name and password). WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 25 COLUMNS AT THE FORGE Once you have the keypair and .boto files in place, you then can set up your Bees test. I strongly suggest that you put this in a shell script, thus ensuring that everything runs. You really don't want to fire up a bunch of EC2 machines with the bees up command, only to discover the following month that you forgot to turn it off. Bees uses the bees command for everything, so every line of your script will start with the word bees. Some of the commands you can issue include the following: ■ bees up: start up one or more EC2 servers. You can specify the -s option to indicate the number of servers, the -g option to indicate the security group, and -k to tell Bees where to look for your EC2 keypair file. ■ bees attack: much like ab, you'll use the -n option to indicate the number of requests you want to make and the -c option to indicate the level of concurrency. ■ bees down: shut down all of the EC2 servers you started in this session. So, if you want to do the same thing as before (that is, 1,000 requests), but now divided across ten different servers, you would say: bees up -s 10 -g beesgroup -k beespair bees attack -n 100 -c 10 -u http://myserver.example.com/ bees down When you run Bees, the fun really begins. You get a verbose printout indicating that bees are joining the swarm, that they're attacking (bang bang!) and that they're done ("offensive complete"). The report at the conclusion of this attack, similar to ab, will indicate whether all of the HTTP requests were completed successfully, how many requests the server could handle per second, and how long it took to respond to various proportions of bees attacking. Bees is a fantastic tool and can be used in at least two different ways. First, you can use it to double¬ check that your server will handle a particular load. For example, if you know that you're likely to get 100,000 concurrent requests across your server farm, you can use Bees to load that up on 1,000 different EC2 machines. But another way to use Bees, or any load-testing tool, is to probe the limits of your system—that is, to overwhelm your server intentionally, to find out how many simultaneous requests it can take before failing over. This simply might be to understand the limits of the application's current architecture and implementation, or it might provide 26 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 1 COLUMNS AT THE FORGE you with insights into which parts of the application will fail first, so that you can address those issues. Regardless, in this scenario, you run your load-testing tool at repeatedly higher levels of concurrency until the system breaks—at which point you try to identify what broke, improve it and then overwhelm your server once again. A possible alternative to Bees with Machine Guns, which I have played with but never used in production, is Locust. Locust can run on a single machine (like ab) or on multiple machines, in a distributed fashion (like Bees). It's configured using Python and provides a Web-based monitoring interface that allows you to see the current progress and state of the requests. Locust uses Python objects, and it allows you to write Python functions that execute HTTP requests and then chain them together for complex interactions with a site. Conclusion If you're interested in testing your servers, there are several high-quality, open-source tools at your disposal. Here, I looked at several systems for exploring your server's limits, and also how you can configure your database to log when it has problems. You're likely going to want to use multiple tools to test your system, since each exposes a different set of potential problems. In my next article, I'll look at a variety of tools that let you identify problems and slowness within the client side of your Web application. ■ Reuven M. Lerner trains companies around the world in Python. PostgreSQL. Git and Ruby. His ebook. “Practice Makes Python”, contains 50 of his favorite exercises to sharpen your Python skills. Reuven blogs regularly at http://blog.lerner.co.il and tweets as @reuvenmlerner. Reuven has a PhD in Learning Sciences from Northwestern University, and he lives in Modi’in. Israel, with his wife and three children. Resources Apachebench is part of the HTTP server project at the Apache Software Foundation. That server is hosted at https://httpd.apache.org. ab is part of the source code package for Apache httpd. Bees with Machine Guns is hosted on GitHub at https://github.com/newsapps/ beeswithmachineguns. That page contains a README with basic information about how to use the program. It assumes familiarity with Amazon’s EC2 service and a working set of keys. Locust is hosted at http://locust.io, where there also is extensive documentation and examples. You will need to know Python, including the creation of functions and classes, in order to use Locust. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 27 COLUMNS WORK THE SHELL Words-We Can Make Lots of Words DAVE TAYLOR In this article, Dave Taylor shows complicated script code to complete the findwords script. Now you’ll be ready to crush everyone in Scrabble and Words with Friends . It was a dark and stormy night when I started this series here in Linux Journal —at least two months ago, and in Internet terms, that's quite a while. And just wait until our robot overlords are running the show, because then two months will be 10-20 generations of robot evolution and quite frankly, the T-2000 probably could have solved this problem already anyway. Puny humans! But, we haven't yet reached the singularity—at least, I don't think so. I asked Siri, and she said we hadn't, so that's good enough, right? Let's dive back in to this programming project because the end is nigh! Well, for this topic at least. The challenge started out as trying to make words from a combination of letter blocks. You know, the wooden blocks that babies play with (or, alternatively, hurl at you if you're within 20 feet of them). Those give you six letters per space, but I simplified the problem down to the Scrabble tiles example: you have a set of letters on your rack; what words can you make with them? I've talked about algorithms for the last few months, so this time, let's really dig in to the code for findwords, the resultant script. After discarding various solutions, the one I've implemented has two phases: ■ Identify a list of all words that are composed only of the letters started with (so "axe" wouldn't match the starting letters abcdefg). ■ For each word that matches, check 28 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS WORK THE SHELL that the number of letters needed to spell the word match up with the occurrences of letters in the starting pattern (so "frogger" can't be made from forger—but almost). Let's have a look at the code blocks, because it turns out that this is non¬ trivial to implement, but we have learned to bend The Force to do our bidding (in other words, we used regular expressions). First we step through the dictionary to identify n-letter words that don't contain letters excluded from the set, with the additional limitation that the word is between (length-3) and (length) letters long: unique="$(echo $1 | sed 's/./&\ /g' | tr '[[:upper:]] ' '[[:lower:]]' | sort | uniq | \ fmt | tr -C -d ’ [ [:alpha:]]')" while [ $minlength -It $length ] do regex=" A [$unique]{$minlength}$" if [ $verbose ] ; then echo "Raw word list of length $minlength for \ letterset $unique:" grep -E $regex "$dictionary" | tee -a $testwords else grep -E $regex "$dictionary" >> $testwords fi minlength="$(( $minlength + 1 ))" done I explained how this block works in my column in the last issue (October 201 5), if you want to flip back and read it, but really, the hard work involves the very first line, creating the variable Sunique, which is a sorted, de-duped list of letters from the original pattern. Given "messages", for example, Sunique would be "aegms". Indeed, given "messages", here are the words that are identified as possibilities by findwords: Raw word list of length 6 for letterset aegms: assess mammas masses messes sesame Raw word list of length 7 for letterset aegms: amasses massage message Raw word list of length 8 for letterset aegms: assesses massages messages Clearly there's more work to do, because it's not possible to make the word "massages" from the starting pattern "messages", since there aren't enough occurrences of the letter "a". That's the job of the second part of WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 29 COLUMNS WORK THE SHELL the code, so I'm just going to show you the whole thing, and then I'll explain specific sections: pattern="$(echo $1 | sed ’s/./&\ /g 1 | tr ’[[:upper:]]’ 1 [[:lower:]]' | sort | fmt sed 's/ //g')" for word in $( cat $testwords ) do simplified="$(echo $word | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt sed 's/ //g 1 )" ## PART THREE: do all letters of the word appear # in the pattern once and exactly once? Easy way: # loop through and remove each letter as used, # then compare end states indx=l; failed=0 before=$pattern while [ $indx -It ${#simplified} ] do ltr=${simplified:$indx:1} after=$(echo $before | sed "s/$ltr/-/") if [ $before = $after ] ; then failed=l else before=$after fi indx=$(( $indx + 1 )) done if [ $failed -eq 0 ] ; then echo "SUCCESS: You can make the word $word" fi done The first rather gnarly expression to create Spattern from the specified starting argument ($1) normalizes the pattern to all lowercase, sorts the letters alphabetically, then reassembles it. In this case, "messages" would become "aeegmsss". Why? Because we can do that to each of the possible words too, and then the comparison test becomes easy. The list of possible words was created in part one and is stored in the temporary file Stestwords, so the "for" loop steps us through. For each word, $simplified becomes a similarly normalized pattern to check. For each letter in the proposed word, we replace that letter with a dash in the pattern, using two variables, $before and $af ter, to stage the change so we can ensure that something really did change for each letter. That's what's done here: after=$(echo $before | sed "s/$ltr/-/") If $before = Safter, then the needed letter from the proposed word wasn't found in the pattern, and the word can't be assembled from the pattern. On the other hand, if there are extra letters in the pattern after we're done analyzing the word, that's fine. That's the situation where we can make, for example, "games" from "messages", and that's perfectly valid, 30 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS WORK THE SHELL even with the leftover letters. I've added some debugging statements so you can get a sense of what's going on in this example invocation: $ sh findwords.sh messages Raw word list of length 5 for letterset aegms: amass asses eases games gamma gases geese mamma sages seams seems Raw word list of length 6 for letterset aegms assess mammas masses messes sesame Raw word list of length 7 for letterset aegms amasses massage message Raw word list of length 8 for letterset aegms assesses LINUX JOURNAL on your Android device Download the app now from the Google Play Store. RASPBERRY PI Virtual Private Cloud www.linuxjournal.com/android For more information about advertising opportunities within Linux Journal iPhone, iPad and Android apps, contact John Grogan at +1-713-344-1956 x2 or ads@linuxjournal.com. COLUMNS WORK THE SHELL i massages messages created pattern aeegmsss SUCCESS: You can make the word asses SUCCESS: You can make the word eases SUCCESS: You can make the word games SUCCESS: You can make the word gases SUCCESS: You can make the word sages SUCCESS: You can make the word seams SUCCESS: You can make the word seems SUCCESS: You can make the word masses SUCCESS: You can make the word messes SUCCESS: You can make the word sesame SUCCESS: You can make the word message SUCCESS: You can make the word messages So, we can make a dozen different words out of the word "messages", including the word messages itself. What about the original pattern we were using in previous columns: "chicken"? For this one, let's skip the potential words and just look at the solution: SUCCESS: You can make the word chic SUCCESS: You can make the word chi n SUCCESS: You can make the word heck SUCCESS: You can make the word hick SUCCESS: You can make the word hike SUCCESS: You can make the word i nch SUCCESS: You can make the word neck SUCCESS: You can make the word nice SUCCESS: You can make the word nick SUCCESS: You can make the word check SUCCESS: You can make the word chick SUCCESS: You can make the word chink SUCCESS: You can make the word niche SUCCESS: You can make the word chicken Impressive! To make this work a bit better, I've added some error checking, included an -f flag so we can have the script also output failures, not just successes, and left in some additional debugging output if $verbose is set to true. See Listing 1 for the complete code. It's also available at http://www.linuxjournal.com/ extra/findwords. That's it. Now we have a nice tool that can help us figure out what to play the next time we're stuck on Scrabble, Words with Friends, or even looking at a big stack of letter blocks. Next month. I'll turn my attention to a different scripting challenge. Do you have an idea? Send it to ljeditor@linuxjournal.com.B Dave Taylor has been hacking shell scripts since the dawn of the computer era. Well, not really, but still. 30 years is a long time! He’s the author of the popular Wicked Cool Shell Scripts (10th anniversary update coming very soon from O’Reilly and NoStarch Press) and can be found on Twitter as @DaveTaylor and more generally at his tech site http://www.AskDaveTaylor.com. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. 32 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS WORK THE SHELL Listing 1. findwords.sh #!/bin/sh # Findwords -- given a set of letters, try to find all the words you can # spell dictionary="/Users/taylor/Documents/Linux Journal/dictionary.txt" testwords=$(mktemp /tmp/findwords.XXXXXX) || exit 1 if [ -z "$1" ] ; then echo "Usage: findwords [sequence of letters]" exit 0 fi if [ "$1" = "-f" ] ; then showfaiIs—1 shift fi ## PART ONE: make the regular expression length="$(echo "$1" | wc -c)" minlength=$(( Slength - 4 )) # we can ignore a max of 2 letters if [ $minlength -It 3 ] ; then echo "Error: sequence must be at least 5 letters long" exit 0 fi unique="$(echo $1 | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | uniq | fmt | \ tr -C -d '[[:alpha:]]')" while [ Sminlength -It Slength ] do regex=" A [$unique]{$minlength}$" if [ Sverbose ] ; then echo "Raw word list of length Sminlength for letterset Sunique:" grep -E Sregex "Sdictionary" | tee -a Stestwords else grep -E Sregex "Sdictionary" >> Stestwords fi minlength="$(( Sminlength + 1 ))" done ## PART TWO: sort letters for validity filter pattern="$(echo $1 | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt | sed 's/ //g')" for word in $( cat Stestwords ) do # echo "checking Sword for validity" simplified="$(echo Sword | sed 's/./&\ /g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt | sed 's/ //g')" ## PART THREE: do all letters of the word appear in the pattern # once and exactly once? Easy way: loop through and # remove each letter as used, then compare end states indx=l failed=0 before=$pattern while [ Sindx -It ${#simplified} ] do ltr=${simplified:$indx:l} after=$(echo Sbefore | sed "s/$ltr/-/") if [ Sbefore = Safter ] ; then # nothing changed, so we don't have that # letter available any more if [ Sshowfails ] ; then echo "FAILURE: came close, but can't make Sword" fi failed=l else before=$after fi indx=$(( Sindx + 1 )) done if [ $failed -eq 0 ] ; then echo "SUCCESS: You can make the word Sword" fi done /bin/rm -f Stestwords exit 0 WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 33 COLUMNS Flash ROMs with a Raspberry Pi KYLE RANKIN It’s always so weird seeing a bunch of wires between your laptop and a Raspberry Pi. Earlier this year, I wrote a series of columns about my experience flashing a ThinkPad X60 laptop with Libreboot. Since then, the Libreboot project has expanded its hardware support to include the newer ThinkPad X200 series, so I decided to upgrade. The main challenge with switching over to the X200 was that unlike the X60, you can't perform the initial Libreboot flash with software. Instead, you actually need to disassemble the laptop to expose the BIOS chip, clip a special clip called a Pomona clip to it that's wired to some device that can flash chips, cross your fingers and flash. I'm not generally a hardware hacker, so I didn't have any of the special- purpose hardware-flashing tools that you typically would use to do this right. I did, however, have a Raspberry Pi (well, many Raspberry Pis if I'm being honest), and it turns out that both it and the Beaglebone Black are platforms that have been used with flashrom successfully. So in this article, I describe the steps I performed to turn a regular Raspberry Pi running Raspbian into a BIOS-flashing machine. The Hardware To hardware-flash a BIOS chip, you need two main pieces of hardware: a Raspberry Pi and the appropriate Pomona clip for your chip. The Pomona clip actually clips over the top of your chip and has little teeth that make connections with each of the chip's pins. You then can wire up the other end of the clip to your hardware¬ flashing device, and it allows you to reprogram the chip without having to remove it. In my case, my BIOS chip had 16 pins (although some X200s use 34 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS 8-pin BIOS chips), so I ordered a 16-pin Pomona clip on-line at almost the same price as a Raspberry Pi! There is actually a really good guide on-line for flashing a number of different ThinkPads using a Raspberry Pi and the NOOBS distribution; see Resources if you want more details. Unfortunately, that guide didn't exist when I first wanted to do this, so instead I had to piece together what to do (specifically which GPIO pins to connect to which pins on the clip) by combining a general-purpose article on using flashrom on a Raspberry Pi with an article on flashing an X200 with a Beaglebone Black. So although the guide I link to at the end of this article goes into more depth and looks correct, I can't directly vouch for it since I haven't followed its steps. The steps I list here are what worked for me. Pomona Clip Pinouts The guide I link to in the Resources section has a great graphic that goes into detail about the various pinouts you may need to use for various chips. Not all pins on the clip actually need to be connected for the X200. In my case, the simplified form is shown in Table 1 for my 16-pin Pomona clip. So when I wired things up, I connected pin 2 of the Pomona clip to GPIO pin 17, but in other guides, they use GPIO pin 1 for 3.3V. I list both because pin 17 worked for me (and I imagine any 3.3V power source might work), but in case you want an alternative pin, there it is. Build Flashrom There are two main ways to build flashrom. If you intend to build and flash a Libreboot image from source, you can use the version of flashrom that comes with the Libreboot source. You also can just build flashrom directly from its git repository. Either way, you first will need to pull down all the build dependencies: $ sudo apt-get install build-essential pciutils ^usbutils libpci-dev libusb-dev libftdil ^libftdi-dev zliblg-dev subversion If you want to build flashrom directly from its source, do this: $ svn co svn://flashrom.org/ flashrom/trunk flashrom $ cd flashrom $ make Table 1. Pomona Clip Pinouts SPI Pin Name 3.3V CS# SO/SIOI GND S1/SIOO SCLK Pomona Clip Pin # 2 7 8 10 15 16 Raspberry Pi GPIO Pin # 1 (17*) 24 21 25 19 23 WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 35 COLUMNS Otherwise, if you want to build from the flashrom source included with Libreboot, do this: $ git clone http://libreboot.org/ **li breboot. gi t $ cd libreboot $ ./download flashrom $ ./build module flashrom In either circumstance, at the end of the process, you should have a flashrom binary compiled for the Raspberry Pi ready to use. Enable SPI The next step is to load two SPI modules so you can use the GPIO pins to flash. In my case, the Raspbian image I used did not default to enabling that device at boot, so I had to edit /boot/config.txt as root and make sure that the file contained dtparam=spi=on and then reboot. Once I rebooted, I then could load the two spi modules: $ sudo modprobe spi_bcm2708 $ sudo modprobe spidev Now that the modules loaded successfully, I was ready to power down the Raspberry Pi and wire everything up. Wire Everything Up To wire everything up, I opened up my X200 (unplugged and with the battery removed, of course), found the BIOS chip (it is right under the front wrist rest) and attached the clip. If you attach the clip while the Raspberry Pi is still on, note that it will reboot. It's better to make all of the connections while everything is turned off. Once I was done, it looked like what you see in Figure 1. Then I booted the Raspberry Pi, loaded the two SPI modules and was able to use flashrom to read off a copy of my existing BIOS: sudo ./flashrom -p linux_spi:dev=/dev/spidev0.0 factoryl.rom Now, the thing about using these clips to flash hardware is that sometimes the connections aren't perfect, and I've found that in some instances, I had to perform a flash many times before it succeeded. In the above case, I'd recommend that once it succeeds, you perform it a few more times and save a couple different copies of your existing BIOS (at least three), and then use a tool like sha256sum to compare them all. You may find that one or more of your copies don't match the 36 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS Figure 1. Laptop Surgery rest. Once you get a few consistent copies that agree, you can be assured that you got a good copy. After you have a good backup copy of your existing BIOS, you can attempt a flash. It turns out that quite a bit has changed with the Libreboot-flashing process since the last time I wrote about it, so in a future column, I will revisit the topic with the more up-to-date method to flash Libreboot.a Kyle Rankin is a Sr. Systems Administrator in the San Francisco Bay Area and the author of a number of books, including The Official Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks. He is currently the president of the North Bay Linux Users’ Group. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. Resources Hardware Flashing with Raspberry Pi: https://github.com/bibanon/Coreboot-ThinkPads/wiki/Hardware-Flashing-with-Raspberry-Pi WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 37 COLUMNS THE OPEN-SOURCE CLASSROOM Wi-Fi, Part II: the Installation Moving from theoretical Wi-Fi to blinky lights! SHAWN POWERS Researching my last article, I learned more about Wi-Fi than most people learn in a lifetime. Although that knowledge is incredibly helpful when it comes to a real-world implementation, there still are a few caveats that are important as you take the theoretical to the physical. One of the most frustrating parts of a new installation is that you're required to put the cart before the horse. What do I mean by that? Well, when I set up my first Wi-Fi network in a school district, I paid a company to send technicians into the buildings with their fancy (and expensive) set of tools in order to give me a survey of the buildings so I'd know how many access points I'd need to cover things. What they failed to mention is that in order to determine how many access points I'd have to add, they tested my existing coverage and showed me dead spots. Since this was a brand- new installation, and I didn't have any access points to begin with, the survey result was "you need access points everywhere". Needless to say, I was less than impressed. So in order to set up a proper wireless network, the easiest thing to do is guess how many access points you'll need and put that many in place. Then you can do a site survey and figure out how well you guessed. Thankfully, your guesses can be educated guesses. In fact, if you understand how Wi-Fi antennas work, you can improve your coverage area drastically just by knowing how to position the access points. Antenna Signal Shape It would be simple if Wi-Fi signals came out of the access points in a big sphere, like a giant beach ball of signal. Unfortunately, that's not how it actually happens. Whether you have internal antennas or external positionable antennas, the signal is "shaped" like a donut with its hole over the antenna (Figure 1). While it 38 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS k THE OPEN-SOURCE CLASSROOM Antenna Pattern i I Wi-Fi Product Figure 1. Knowing what the signal looks like (image from http://ampedwireless.com). still partially resembles a sphere, it's important to note where the signal isn't. Namely, there's a dead zone directly at the end of the antenna. If you've ever considered pointing the antenna at your distant laptop, trying to shoot the signal out the end of the antenna like a magic wand, you can see why people should leave magic wands to Harry Potter. I also want to mention long-range access points. When you purchase a long-range AP, it sounds like you're getting a more powerful unit. It's a little like a vacuum cleaner with two speeds—why would anyone ever want to use the low setting? With long-range access points, however, you're not getting any increased power. The trick is with how the antenna radiates its signal. Rather helps with placement than a big round donut shape, LR access points squish the donut so that it has the same general shape, but is more like a pancake. It reaches farther out to the sides, but sacrifices how "tall" the signal pattern reaches. So if you have a two-story house, changing to a long-range access point might get signal to your backyard, but the folks upstairs won't be able to check their e-mail. One last important aspect of antenna placement to consider is polarity. Wi-Fi antennas talk most efficiently when they have similar polarity. That means their "donuts" are on the same plane. So if you have your access point's antennas positioned horizontally (perhaps you have a very tall, very skinny building), any client antennas pointing vertically WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 39 COLUMNS THE OPEN-SOURCE CLASSROOM will have a different polarity from your access point. They'll still be able to talk, but it will be less efficient. It's sort of like if you turned this article sideways. You still could read it, but it would be slower and a bit awkward. Since it's far better to have mismatched polarity than no signal at all, understanding the antenna pattern on your access points means you can position them for maximum coverage. If you have multiple antennas, you should consider where you want coverage as you position them vertically, horizontally or even at 45-degree angles (remember, a 45-degree angle will mess up polarity, but it might mean that distant upstairs bedroom has coverage it might not get otherwise). If your access point doesn't have external antennas, it's most likely designed to have the "donut" stretch out to the sides, as if the antenna were pointing straight up. For units that can mount on the ceiling or wall, keep that in mind as you consider their positions, and realize coverage will be very different if you change from ceiling mount to wall mount. The Big Guessing Game Armed with an understanding of how Wi-Fi signal radiates out from the access points, the next step is to make your best guess on where you should place them. I usually start with a single access point in the middle of a house (or hallway in the case of a school), and see how far the signal penetrates. Unfortunately, 2.4GHz and 5GHz don't penetrate walls the same. You'll likely find that 2.4GHz will go through more obstacles before the signal degrades. If you have access points with both 2.4GHz and 5GHz, be sure to test both frequencies so you can estimate what you might need to cover your entire area. Thankfully, testing coverage is easy. Some access points, like my UniFi system, have planning apps built in (Figure 2), but they are just planning and don't actually test anything. There are programs for Windows, OS X and Android that will allow you to load up your floor plan, and then you can walk around the building marking your location to create an actual "heat map" of coverage. Those programs are really nice for creating a visual representation of your coverage, but honestly, they're not required if you just want to get the job done. Assuming you know the floor plan, you can walk from room to room using an Android phone or tablet with WiFi Analyzer and see the signal strength in any 40 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS THE OPEN-SOURCE CLASSROOM u 1 f Figure 2. Since this was a fairly new house, the UniFi planning tool did a nice job of accurately predicting coverage area. given location. Just make sure the app you choose supports 2.4GHz and 5GHz, and that your phone or tablet has both as well! If you do want the heat map solution, Windows users will like HeatMapper from http://www.ekahau.com, and OS X users should try NetSpot from http://www.netspotapp.com. Android users should just search the Google Play store for "Wi-Fi heat map" or "Wi-Fi mapping". I don't know of a Linux-native heat map app that works from a laptop, but if anyone knows of a good one, please write in, and I'll try to include it in a future Letters section. Some Tough Purchase Decisions Here's where installing Wi-Fi starts to get ugly. If you read my last article (in the October 2015 issue), you'll know that with 2.4GHz, there are only three channels you should be using. If you live in close proximity to other people (apartments, subdivisions and so on), your channel availability might be even worse. When you add the variable coverage distance between 2.4GHz and 5GHz, it means placing access points is really a game of compromise. There are a couple ways to handle the problem, but none are perfect. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 41 COLUMNS THE OPEN-SOURCE CLASSROOM In a home where two or three access points is going to be enough, you generally can place them in the best locations (after testing, of course) and crank the power up to full blast on the 2.4GHz and 5GHz radios. You'll likely have plenty of available channels in the 5GHz range, so you probably won't have to worry about interfering with other access points in your house or even your neighbor's. If you're in a big house, or an office complex, or in an old house that has stubborn walls (like me), you might have to plan very carefully where you place your access points so that the available 2.4GHz channels don't overlap. If you're using channel 1 in the front room, channel 6 in the basement and channel 11 in the kitchen at the back of the house, you might decide to use channel 6 for the upstairs. You need to make sure that when you actually are upstairs, however, that you can't see channel 6 from the basement, or you'll have a mess with channel conflicts. Thankfully, most access points allow you to decrease the radio transmit and receive power to avoid channels interfering with each other. It might seem counter-productive to decrease the power, but it's often a really great way to improve connectivity. Think of it like having a conversation. If two people are having a quiet conversation in one room, and another couple is talking in the next room, they can talk quite nicely without interfering. If everyone in the house is screaming at the top of their lungs, however, it means everyone can hear other conversations, making it confusing and awkward. It's also possible that you'll find you've worked out the perfect coverage area with the 2.4GHz frequency, but even with the radios cranked full blast, there are a few dead spots in the 5GHz range. In that case, you either can live with dead 5GHz zones or add another access point with only the 5GHz radio turned on. That will mean older client devices won't be able to connect to the additional access point, but if you already have 2.4GHz coverage everywhere, there's no need to pollute the spectrum with another unnecessary 2.4GHz radio. Configuring Clients Let's assume you've covered your entire house or office with a blanket of 2.4GHz and 5GHz signals, and you want your clients to connect to the best possible signal to which 42 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM COLUMNS k THE OPEN-SOURCE CLASSROOM they're capable of connecting. Ideally, you'd set all your access points to use the same SSID and have clients select which access point and which frequency they want to associate with automatically. Using a single SSID also means roaming around the house from access point to access point should be seamless. Client computers are designed to switch from channel to channel on the same SSID without disconnecting from the network at all. Unfortunately, in practice, not all client devices are smart enough to use 5GHz when they can. So although you might have a wonderful 5GHz signal sharing the same SSID with your 2.4GHz network, some of your compatible devices never will take advantage of the cleaner, faster network! (Sometimes they do, but I assure you, not always.) I've found the best solution, at least for me, is to have one SSID for the 2.4GHz spectrum and one SSID for the 5GHz spectrum. In my house, that means there's a "Powers" SSID in the 2.4GHz range and a "Super Powers" in the 5GHz range. If a device is capable of connecting to 5GHz networks, I connect to that SSID and force it to use the better network. You might be able to get away with a single SSID and have your clients all do the right thing. but again. I've never had much luck with that. Repeaters Versus Access Points I'm a hard-core networking nerd, and I know it. Even with our new-to-us 63-year-old house, I decided to run Ethernet cables to every access point location. (I just draped long cables around the house while testing; please don't drill holes into your house until you know where those holes should go!) For some people, running cables isn't possible. In those instances, it's possible to extend a single access point using a wireless repeater or extender (they're the same thing, basically). I urge you to avoid such devices if possible, but in a pinch, they're better than no coverage at all. How an extender works is by becoming both a client device and an access point in one. They connect to your central access point like any other client, and then using another antenna, they act as access points themselves. The problem is speed. If you connect to a repeater, you can get only half the speed of a connection to a wired access point. That's because the wireless transfer speed is split between your laptop and the repeater communicating with the distant access point. It's a little more complicated than that in practice (it has to do with WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 43 COLUMNS THE OPEN-SOURCE CLASSROOM i transmission duplexing and so on), but the end result is any connection via repeater is only half as fast as to a wired access point. If you're talking about a 5GHz, wide¬ band connection, a repeated signal might be more than adequate for Web browsing from a distant bedroom. The ability to extend a network wirelessly is really awesome, but it's important to realize that awesomeness comes at a cost. You also need to understand that if you're in a room with a weak signal, placing a repeater in that room won't help. You need to place the repeater in a location between the central access point and remote client device, so it can act as a middle man relaying signals both ways. A repeater doesn't have any stronger of an antenna than a client device, so make sure if you do place a repeater, it's in a location with decent signal strength, or you'll just be repeating a horrible signal! Use Your Noodle, and Plan Well! In my last article, I talked about the actual wireless technologies involved with Wi-Fi signals. In this article, I discussed implementation and how to get the best coverage for your particular installation. Don't forget all the stuff I covered regarding MIMO, channel width and so on. Understanding how a Wi-Fi network works means you not only can get good coverage, but you can get awesome performance as well. I'll leave you with one last note: if you're planning a wireless install for a situation that has a large number of users, be sure to include bandwidth considerations in your planning. If you have a 54Mbps 802.1 1g connection shared between 26 people, that means the maximum theoretical bandwidth each person can use is 2Mbps, which is painfully slow in most instances. You actually might need to lower the radio power and add multiple access points in order to split the load across multiple access points. Planning and installing Wi-Fi networks can be incredibly challenging, but it is also incredibly fun. Hopefully this two-part primer will help you deploy the best wireless experience possible. ■ Shawn Powers is the Associate Editor for Linux Journal. He’s also the Gadget Guy for LinuxJournal.com, and he has an interesting collection of vintage Garfield coffee mugs. Don’t let his silly hairdo fool you. he’s a pretty ordinary guy and can be reached via e-mail at shawn@linuxjournal.com. Or. swing by the #linuxjournal IRC channel on Freenode.net. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. 44 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM FREE AND OPEN SOURCE SOFTWARE EXPO FOSSCTCOnl AND TECHNOLOGY CONFERENCE 2 0 15 Come out and participate in the Second Annual Fossetcon 2015 Florida's Only Free and Open Source Conference. With in 2 minutes of Downtown Disney and other great entertainment DAYO BSD Jmulk DAY 1 FOOD, TRAINING, WORKSHOPS AND CERTIFICATIONS FOOD, KEYNOTES, EXPO HALL, SPEAKER TRACKS DAY 2 FOOD, KEYNOTES, EXPO HALL, SPEAKER TRACKS FREE FOOD, -TRAINING. * CERTIFICATIONS > AND GIVEAWAYS!!! M ^ t r f 1 < NOV 19 - NOV 21 Hilton Lake Buena Vista Orlando, FL Fossetcon 2015: The Gateway To The Open Source Community More info at www.fossetcon.org NEW PRODUCTS EXIN Specialist Certificate in OpenStack Software Neutron Building on its successful foundational certificate in OpenStack software, the independent certification institute EXIN recently released its first specialist exam in the series, dubbed EXIN Specialist Certificate in OpenStack Software Neutron. Neutron is a cloud-networking controller within the OpenStack cloud computing initiative that delivers networking as a service. This new advanced exam is aimed at experienced users of OpenStack technology who design or build infrastructure. The vendor-neutral content, which was developed in close cooperation with Hewlett-Packard, covers architecture, plug-ins and extensions, managing networks, and troubleshooting methodology and tools. EXIN's mission with the new exam on Neutron is to enable experienced professionals to advance their careers by demonstrating their specialist skills and knowledge related to OpenStack software. In 2016, EXIN expects to launch certifications for OpenStack Software Swift and Cinder. http://www.exin.com Tpamflnpct TeamQuest ’ s IQQ IIIUUwL Performance Software Carrying the simple moniker Performance Software, the latest innovation in predictive analytics from TeamQuest is a powerful application that enables organizations to assess intuitively the health and potential risks in their IT infrastructure. The secret to Performance Software's ability to warn IT management of problems before they occur stems from the deployment of lightning-fast and accurate predictive algorithms, coupled with the most popular IT data sources, including Amazon, Tivoli and HP. Customers also can perform data collection, analysis, predictive analytics and capacity planning for Ubuntu. TeamQuest calls itself the first organization that allows the existing infrastructure to remain entirely intact and augments the existing environment's operations with the industry-leading accurate risk assessment software. The firm also asserts that while competitors base their predictive and proactive capabilities on simplistic approximations of how IT infrastructure scales, only TeamQuest utilizes advanced queuing theory to predict what really matters—throughput and response time—not just resource utilization. http://www.teamquest.com 46 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 1 NEW PRODUCTS Linaro ■■■■■■ Linaro Ltd.’s Secure Media Solutions for ARM-Based SoCs The embedded developer community is the target audience for Linaro Ltd.'s new open- source secure media solution for consumption of premium content on ARM-powered devices. In this solution, with support from Microsoft and the OpenCDM project, Linaro has successfully integrated several security features required by premium content service providers with the Microsoft PlayReady Digital Rights Management (DRM). Linaro's new solution enables application developers, silicon partners, OEMs, operators and content owners to use open-source technology to build feature-rich, secure products for the pay TV market. By bringing together all of the essential secure hardware and software elements into an open-source design, OEMs can reduce their time to market and provide new opportunities for service providers to deliver premium content across more consumer devices built on ARM-based SoCs. Essential security features include the World Wide Web Consortium's Encrypted Media Extensions, which enable premium-content service providers to write their electronic programming guide applications using standard HTML5 one time and run it on myriad devices. Linaro asserts that its new solution is "a key milestone that showcases how Microsoft PlayReady DRM works cross-platform in a standard way". http://www.linaro.org iWedia’s Teatro-3.0 By integrating AllConnect streaming technology from Tuxera, iWedia's Teatro-3.0 set-top box (STB) software solution lets users take full control of the connected home and share music, photos, videos, movies and TV content to any screen. Teatro-3.0 is Linux-based with a Ul built with HTML/ CSS and specific JavaScript APIs allowing access to digital TV features. The solution features DLNA (player and renderer), access to "walled garden" Web and OTT video services (CE- HTML portals, HbbTV applications), as well as DVR and Time Shift Buffer. The streaming functionality occurs when Tuxera's AllConnect App discovers and dialogs with the DLNA Digital Media Renderer embedded in Teatro-3.0. The app then streams any content chosen by the user to the Teatro-3.0 media player. iWedia states that its STB easily can integrate into any hardware or software and is "the only solution to the market compatible with all smart TVs and STBs", including Apple TV, Android TV, Fire TV and Roku. htt p ://www. i wed i a. co m WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 47 NEW PRODUCTS Mike Barlow’s Learning to Love Data Science (O’Reilly Media) The title of Mike Barlow's new O'Reilly book, Learning to Love Data Science, implies an inherent drudgery in the material. Bah! Most Linux enthusiasts will find magnetic the material in Barlow's tome, which is subtitled Explorations of Emerging Technologies and Platforms for Predictive Analytics, Machine Learning, Digital Manufacturing and Supply Chain Optimization. Covering data for social good to data for predictive maintenance, the book's format is an anthology of reports that offer a broad overview of the data space, including the applications that have arisen in enterprise companies, non-profits and everywhere in between. Barlow discusses—for both developers and suits—the culture that creates a data-driven organization and dives deeply into some of the business, social and technological advances brought about by our ability to handle and process massive amounts of data at scale. Readers also will understand how to promote and use data science in an organization, gain insights into the role of the CIO and explore the tension between securing data and encouraging rapid innovation, among other topics. http://www.oreilly.com Learning to Love Data Science Exploring Predictive Analytics. Machme Learning. Digital Manufacturing, and Supply Cham Optimization 1 , . • •• • ••• < v •::: A'" Scott Stawski’s Inflection Point (Pearson FT Press) If you can't beat megatrends, join 'em. Such is the advice from Scott Stawski, author of the new book Inflection Point: How the Convergence of Cloud, Mobility, Apps, and Data Will Shape the Future of Business. As the executive lead for HP's largest and most strategic global accounts, Stawski enjoys an enviable perch from which to appraise the most influential trends in IT. Today a hurricane is forming, says Stawski, and businesses are headed straight into it. As the full title implies, the enormous disrupters in IT—in cloud, mobility, apps and data—are going to disrupt, and those who can harness the fierce winds of change will have them at their back and cruise toward greater competitiveness and customer value. Stawski illuminates how to go beyond inadequate incremental improvements to reduce IT spending dramatically and virtually eliminate IT capital expenditures. One meaningful step at a time, readers learn how to transform Operational IT into both a utility and a true business enabler, bringing new speed, flexibility and focus to what really matters: true core competencies. http://www. i nf orm it.com 48 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Take your Android development skills to the next level! AnDevCon The Android Developer Conference Dec. 1-3,2015 Hyatt Regency Santa Clara Get the best Android developer training anywhere! • Choose from more than 75 classes and in-depth tutorials • Meet Google and Google Development Experts • Network with speakers and other Android developers • Check out more than 50 third-party vendors • Women in Android Luncheon • Panels and keynotes • Receptions, ice cream, prizes and more (plus lots of coffee!) Whether you're an enterprise developer, work for a commercial software company, or are driving your own startup, if you want to build Android apps, you need to attend AnDevCon! AnDevCon™ is a trademark of BZ Media LLC. Android™ is a trademark of Google Inc. Google’s Android Robot is used under terms of the Creative Commons 3.0 Attribution License. ABZ Media Event noiidD #AnDevCon NEW PRODUCTS r Introversion Software’s Prison Architect In one of its Alpha videos, the lead developer of the game Prison Architect quipped: "since this is Introversion Software that we're talking about, we're likely to be in Alpha for quite some time." That's no exaggeration. Since 2012, Linux Journal received 36 monthly Alpha updates to the multi-platform game. In its 36th Alpha video. Introversion Software at last officially announced the full release of Prison Architect, a sim game in which users build and manage a maximum-security penitentiary facility. In the game, mere mortals must confront real-world challenges, such as guards under attack, prison breaks, fires in the mess hall, chaplain management and much more. Introversion takes pride in its independence from other game developers and promises a better game experience as a result. In addition to downloading Prison Architect for Linux, Windows or Mac OS, one also can become immortalized in the game as a prisoner. Sadly, the options to digital-immorto-criminalize your face or design one of the wardens are both sold out. http://www.prison-architect.com Sensoray’s Model 2224 HD/SD-SDI Audio/Video Encoder Video capturing and processing is what Sensoray's new Model 2224 HD/SD-SDI Audio/Video H.264 Encoder was built to do. The encoder's single SDI input supports a wide range of video resolutions—that is, 1080p, 1080i, 720p and NTSC/PAL. The Model 2224, featuring a USB 2.0 connection to its host CPU, offers excellent quality encoding in a convenient small form factor, says Sensoray. The Model 2224 encoder outputs H.264 High Profile Level 4 for HD and Main Profile Level 3 for SD, multiplexed in MPEG-TS (transport stream) format. The board's versatile overlay generators, integral HD/SD raw frame grabber and live preview stream make it ideally suited for a wide range of video processing applications, including High Profile DVRs, NVRs and stream servers. Furthermore, the encoder is Blu-Ray-compatible and allows for full-screen 16-bit color text/graphics overlay with transparency. The board can send an uncompressed, down-scaled video stream over USB, offering users low-latency live video previewing on the host computer with minimal CPU usage, h tt p ://www. se n so r ay. co m r i Please send information about releases of Linux-related products to newproducts@linuxjournal.com or New Products c/o Linux Journal, PO Box 980985, Houston, TX 77098. Submissions are edited for length and content. L._ 50 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Puppet Application Orchestration Automate Your Entire Infrastructure Current Balanci $ 2015 03 ACCOUNT MANAGEMENT Spending Report: ONLINE BILL-PAY VIEW TRANSACTION HISTORY Reduce the complexity of managing applications - on premise, in the cloud, on bare metal or in containers. • Model distributed application infrastructure • Coordinate ordered deployment of configurations • Control the state of your machines all in one place Learn more at puppetlabs.com AP U PR& FEATURE Managing Linux Using Puppet Managing Linux Using Puppet Manage a fleet of servers in a way that’s documented, scalable and fun with Puppet. DAVID BARTON 52 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM A t some point, you probably have installed or configured a piece of software on a server or desktop PC. Since you read Linux Journal, you've probably done a lot of this, as well as developed a range of glue shell scripts, Perl snippets and cron jobs. Unless you are more disciplined than I was, every server has a unique, hand-crafted version of those config files and scripts. It might be as simple as a backup monitor script, but each still needs to be managed and installed. Installing a new server usually involves copying over config files and glue scripts from another server until things "work". Subtle problems may persist if a particular condition appears infrequently. Any improvement is usually made on an ad hoc basis to a specific machine, and there is no way to apply improvements to all servers or desktops easily. Finally, in typical scenarios, all the learning and knowledge invested in these scripts and configuration files are scattered throughout the filesystem on each Linux system. This means there is no easy way to know how any piece of software has been customized. If you have installed a server and come back to it three years later wondering what you did, or manage a group of desktops or a private cloud of virtual machines, configuration management and Puppet can help simplify your life. Enter Configuration Management Configuration management is a solution to this problem. A complete solution provides a centralized repository that defines and documents how things are done that can be applied to any system easily and reproducibly. Improvements simply can be rolled out to systems as required. The result is that a large number of servers can be managed by one administrator with ease. Puppet Many different configuration management tools for Linux (and other platforms) exist. Puppet is one of the most popular and the one I cover in this article. Similar tools include Chef, Ansible and Salt as well as many others. Although they differ in the specifics, the general objectives are the same. Puppet's underlying philosophy is that you tell it what you want as an end result (required state), not how you want it done (the procedure), using Puppet's programming language. For example, you might WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 53 FEATURE Managing Linux Using Puppet say "I want ssh key XYZ to be able to log in to user account too." You wouldn't say "cat this string to /home/foo/.ssh/authorized_keys." In fact, the simple procedure I defined isn't even close to being reliable or correct, as the .ssh directory may not exist, the permissions could be wrong and many other things. You declare your requirements using Puppet's language in files called manifests with the suffix .pp. Your manifest states the requirements for a machine (virtual or real) using Puppet's built-in modules or your own custom modules, which also are stored in manifest files. Puppet is driven from this collection of manifests much like a program is built from code. When the puppet apply command is run, Puppet will compile the program, determine the difference in the machine's state from the desired state, and then make any changes necessary to bring the machine in line with the requirements. This approach means that if you run puppet apply on a machine that is up to date with the current manifests, nothing should happen, as there are no changes to make. Overview of the Approach Puppet is a tool (actually a whole suite of tools) that includes the Puppet execution program, the Puppet master, the Puppet database and the Puppet system information utility. There are many different ways to use it that suit different environments. In this article, I explain the basics of Puppet and the way we use it to manage our servers and desktops, in a simplified form. I use the term "machine" to refer to desktops, virtual machines and hypervisor hosts. The approach I outline here works well for 1-100 machines that are fairly similar but differ in various ways. If you are managing a cloud of 1,000 virtual servers that are identical or differ in very predictable ways, this approach is not optimized for that case (and you should write an article for the next issue of Linux Journal). This approach is based around the ideas outlined in the excellent book Puppet 3 Beginners Guide by John Arundel. The basic idea is this: ■ Store your Puppet manifests in git. This provides a great way to manage, track and distribute changes. We also use it as the way servers get their manifests (we don't use a Puppet master). You easily could use Subversion, Mercurial or any other SCM. ■ Use a separate git branch for 54 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM each machine so that machines are stable. ■ Each machine then periodically polls the git repository and runs puppet apply if there are any changes. ■ There is a manifest file for each machine that defines the desired state. Setting Up the Machine For the purposes of this article. I'm using the example of configuring developers' desktops. The example desktop machine is a clean Ubuntu 12.04 with the hostname puppet-test; however, any version of Linux should work with almost no differences. I will be working using an empty git repository on a private git server. If you are going to use GitHub for this, do not put any sensitive information in there, in particular keys or passwords. Puppet is installed on the target machine using the commands shown in Listing 1. The install simply sets up the Puppet Labs repository and installs git and Puppet. Notice that I have used specific versions of puppet-common and the puppetlabs/apt module. Unfortunately, I have found Puppet tends to break previously valid code and its own modules even with minor upgrades. For this reason, all my machines are locked to specific versions, and upgrades are done in a controlled way. Now Puppet is installed, so let's do something with it. Getting Started I usually edit the manifests on my desktop and then commit them to git and push to the origin repository. I have uploaded my repository to Listing 1. Installing Puppet wget https://apt.puppetlabs.com/puppetlabs-release-precise.deb dpkg -i puppetlabs-release-precise.deb apt-get update apt-get install -y man git puppet-common=3.7.3-lpuppetlabsl puppet module install puppetlabs/apt --version 1.8.0 WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 55 FEATURE Managing Linux Using Puppet GitHub as an easy reference at https://github.com/davidbartonau/ linuxjournal-puppet, which you may wish to copy, fork and so on. In your git repository, create the file manifests/puppet-test.pp, as shown in Listing 2. This file illustrates a few points: ■ The name of the file matches the hostname. This is not a requirement; it just helps to organize your manifests. ■ It imports the apt package, which is a module that allows you to manipulate installed software. ■ The top-level item is "node", which means it defines the state of a server(s). ■ The node name is "puppet-test", which matches the server name. This is how Puppet determines to Listing 2. manifests/puppet-test.pp include apt node 'puppet-test' { package { 'vim': ensure => ’present' } package { ’emacs 1 : ensure => ’absent’ } apply this specific node. ■ The manifest declares that it wants the vim package installed and the emacs package absent. Let the flame wars commence! Listing 3. Cloning and Running the Repository git clone git@gitserver:Puppet-Linuxjournal.git Wetc/puppet/li nuxjournal puppet apply /etc/puppet/linuxjournal/manifests **•- - module path=/etc/ puppet/1 i nuxjournal/ ^••modules / : / etc/puppet/modules/ 56 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Now you can use this Puppet configuration on the machine itself. If you ssh in to the machine (you may need ssh -A agent forwarding so you can authenticate to git), you can run the commands from Listing 3, replacing gitserver with your own. This code clones the git repository into /etc/puppet/linuxjournal and then runs puppet apply using the custom manifests directory. The puppet apply command looks for a node with a matching name and then attempts to make the machine's state match what has been specified in that node. In this case, that means installing vim, if it isn't already, and removing emacs. like this for the sake of this example). Note how the variable is preceded by $. Also the variable is substituted into strings quoted using "but not with" in the same way as bash. Let's apply the new change on the desktop by pulling the changes and re-running puppet apply as per Listing 4. /manifests/puppet-test.pp include apt node 1 puppet-test 1 { Sdeveloper = ’david’ package { 1 vim 1 : ensure => ’present } Creating Users It would be nice to create the developer user, so you can set up that configuration. Listing 4 shows an updated puppet-test.pp that creates a user as per the developer variable (this is not a good way to do it, but it's done package { 1 emacs’: ensure => ’absent’ } user { "Sdeveloper": ensure => present, comment => "Developer $developer", shell => ’ /bin/bash’, managehome => true, } WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 57 FEATURE Managing Linux Using Puppet Listing 5. Re-running Puppet cd /etc/puppet/linuxjournal git pull puppet apply /etc/puppet/linuxjournal/manifests **•- - module path=/etc/ puppet/1 i nuxjournal/ ^••modules / : / etc/puppet/modules/ Listing 6. /modules/developer_pc/manifests/init.pp class developer_pc ($developer) { user { "$developer": ensure => present, comment => "Developer $developer", shell => '/bin/bash' , managehome => true, } } Listing 5. You now should have a new user created. Creating Modules Putting all this code inside the node isn't very reusable. Let's move the user into a developer_pc module and call that from your node. To do this, create the file modules/developer_pc/ manifests/init.pp in the git repository as per Listing 6. This creates a new module called developer_pc that accepts a parameter called developer name and uses it to define the user. You then can use the module in your node as demonstrated in Listing 7. Note how you pass the developer parameter, which is then accessible inside the module. Apply the changes again, and there shouldn't be any change. All you have done is refactored the code. 58 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Listing 7. /manifests/puppet-test.pp node 'puppet-test' { package { 'vim': ensure => 'present' } package { 'emacs': ensure => 'absent' } class { 'developer_pc': developer => 'david } Listing 8. /modules/developerjDc/files/vimrc # Managed by puppet in developer_pc set nowrap Creating Static Files Say you would like to standardize your vim config for all the developers and stop word wrapping by setting up their .vimrc file. To do this in Puppet, you create the file you want to use in /modules/developer_pc/ files/vimrc as per Listing 8, and then add a file } resource in /modules/ developer_pc/manifests/ init.pp as per Listing 9. The file resource can be placed immediately below the user resource. The file resource defines a file /home/ Sdeveloper/.vimrc, which will be set from the vimrc file you created just before. You also set the Listing 9. /modules/developer_pc/manifests/init.pp file { "/home/$developer/.vimrc": source => "puppet:///modules/developer_pc/vimrc", owner => "Sdeveloper", group => "Sdeveloper", require => [ User["Sdeveloper"] ] WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 59 FEATURE Managing Linux Using Puppet owner and group on the file, since Puppet typically is run as root. The requi re clause on the file takes an array of resources and states that those resources must be processed before this file is processed (note the uppercase first letter; this is how Puppet refers to resources rather than declaring them). This dependency allows you to stop Puppet from trying to create the .vimrc file before the user has been created. When resources are adjacent, like the user and the file, they also can be "chained" using the -> operator. Apply the changes again, and you now can expect to see your custom .vimrc set up. If you run puppet apply later, if the source vimrc file hasn't changed, the .vimrc file won't change either, including the modification date. If one of the developers changes .vimrc, the next time puppet apply is run, it will be reverted to the version in Puppet. A little later, say one of the developers asks if they can ignore case as well in vim when searching. You easily can roll this out to all the desktops. Simply change the vimrc file to include set ignorecase, commit and run puppet apply on each machine. Creating Dynamically Generated Files Often you will want to create files where the content is dynamic. Puppet has support for .erb templates, which are templates containing snippets of Ruby code similar to jsp or php files. The code has access to all of the variables in Puppet, with a slightly different syntax. As an example, our build process uses $ HO M E/Projects/override. properties, which is a file that contains the name of the build root. This is typically just the user's home directory. You can set this up in Puppet using an .erb template as shown in Listing 10. The erb template is very similar to the static file, except it needs to be in the template folder, and it uses <%= %> for expressions, <% %> for code, and variables are referred to with the @ prefix. Listing 10. /modules/developer_pc/templates/override.properties.erb # Managed by Puppet dir.home=/home/<%= @developer %>/ 60 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Listing 11. /modules/developer_pc/manifests/init.pp file { "/home/$developer/Projects": ensure => 'directory', owner => "$developer", group => "$developer", require => [ User["$developer"] ] } file { "/home/$developer/Proiects/override.properties": content => template('developer owner => "$developer", group => "$developer", } You use the .erb template by adding the rules shown in Listing 11. First, you have to ensure that there is a Projects directory, and then you require the override.properties file itself. The -> operator is used to ensure that you create the directory first and then the file. Running Puppet Automatically Running Puppet each time you want to make a change doesn't work well beyond a handful of machines. To solve this, you can have each machine automatically check git for changes and then run puppet apply (you can pc/override.properties.erb’), do this only if git has changed, but that is an optional). Next, you will define a file called puppetApply.sh that does what you want and then set up a cron job to call it every ten minutes. This is done in a new module called puppet_apply in three steps: ■ Create your puppetApply.sh template in modules/puppet_apply/files/ puppetApply.sh as per Listing 12. ■ Create the puppetApply.sh file and set up the crontab entry as shown in Listing 13. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 61 FEATURE Managing Linux Using Puppet Listing 12. /modules/puppet_apply/files/puppetApply.sh # Managed by Puppet cd /etc/puppet/linuxjournal git pull puppet apply /etc/puppet/linuxjournal/manifests **•- - module path=/etc/ puppet/1 i nuxj ournal/modules/ *+■: / etc/puppet/modules/ Listing 13. /modules/puppet_apply/manifests/init.pp class puppet_apply () { file { "/usr/local/bin/puppetApply.sh": source => "puppet:///modules/puppet_apply/puppetApply.sh", mode => 'u=wrx,g=r,o=r' } - > cron { "run-puppetApply": ensure => 'present' , command => "/usr/local/bin/puppetApply.sh > Wtmp/puppetApply. log 2>&1", mi nute => ' *710 ' , } } ■ Use your puppet_apply module from your node in puppet-test.pp as per Listing 14. You will need to ensure that the server has read access to the git repository. You can do this using 62 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Listing 14. /manifests/puppet-test.pp class { 'puppet_apply': ; } an SSH key distributed via Puppet and an IdentityFile entry in /root/.ssh/config. If you apply changes now, you should see that there is an entry in root's crontab, and every ten minutes puppetApply.sh should run. Now you simply can commit your changes to git, and within ten minutes, they will be rolled out. Modifying Config Files Many times you don't want to replace a config file, but rather ensure that certain options are set to certain values. For example, I may want to change the SSH port from the default of 22 to 2022 and disallow password logins. Rather than manage the entire config file with Puppet, I can use the augeas resource to set multiple configuration options. Refer to Listing 1 5 for some code that can be added to the Listing 15. /modules/developer_pc/manifests/init.pp package { 1 openssh-server ’ : ensure => 'present' } service { 'ssh ' : ensure => running, require => [ Package["openssh-server"] ] } augeas { 'change-sshd ' : context => '/files/etc/ssh/sshd_config', changes => ['set Port 2022', 'set PasswordAuthentication no’], notify => Service[’ssh 1 ] , require => [ Package["openssh-server"] ] } WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 63 FEATURE Managing Linux Using Puppet When defining rules in Puppet, it is important to keep in mind that removing a rule for a resource is not the same as a rule that removes that resource. developer_pc class you created earlier. The code does three things: ■ Installs openssh-server (not really required, but there for completeness). ■ Ensures that SSH is running as a service. ■ Sets Port 2022 and PasswordAuthentication no in /etc/ssh/sshd_config. ■ If the file changes, the notify clause causes SSH to reload the configuration. Once puppetApply.sh automatically runs, any subsequent SSH sessions will need to connect on port 2022, and you no longer will be able to use a password. Removing Rules When defining rules in Puppet, it is important to keep in mind that removing a rule for a resource is not the same as a rule that removes that resource. For example, suppose you have a rule that creates an authorized SSH key for "developerA". Later, "developerA" leaves, so you remove the rule defining the key. Unfortunately, this does not remove the entry from author i zed_keys. In most cases, the state defined in Puppet resources is not considered definitive; changes outside Puppet are allowed. So once the rule for developerA's key has been removed, there is no way to know if it simply was added manually or if Puppet should remove it. In this case, you can use the ensure => 'absent 1 rule to ensure packages, files, directories, users and so on are deleted. The original Listing 1 showed an example of this to remove the emacs package. There is a definite difference between ensuring that emacs is absent versus no rule declaration. At our office, when a developer or administrator leaves, we replace their SSH key with an invalid key, which then immediately updates every entry for that developer. 64 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Existing Modules Many modules are listed on Puppet Forge covering almost every imaginable problem. Some are really good, and others are less so. It's always worth searching to see if there is something good and then making a decision as to whether it's better to define your own module or reuse an existing one. Managing Git We don't keep all of our machines sitting on the master branch. We use a modified gitflow approach to manage our repository. Each server has its own branch, and most of them point at master. A few are on the bleeding edge of the develop branch. Periodically, we roll a new release from develop into master and then move each machine's branch forward from the old release to the new one. Keeping separate branches for each server gives flexibility to hold specific servers back and ensures that changes aren't rolled out to servers in an ad hoc fashion. We use scripts to manage all our branches and fast-forward them to new releases. With roughly 100 machines, it works for us. On a larger scale, separate branches for each server probably is impractical. Using a single repository shared with all servers isn't ideal. Storing sensitive information encrypted in Hiera is a good idea. There was an excellent Linux Journal article covering this: "Using Hiera with Puppet" by Scott Lackey in the March 2015 issue. As your number of machines grows, using a single git repository could become a problem. The main problem for us is there is a lot of "commit noise" between reusable modules versus machine-specific configurations. Second, you may not want all your admins to be able LINUX JOURNAL for iPad and iPhone BUILD | Vehicle onitoring and • jntrol System REATE Safe to tore Your ;nsitive Data COOL PROJECTS Understanding Linux Permissions id SMS locations four art Watch Working with Django Models and Migrations mm COOL PROJECTS HOW TO: ^>9 Home Automation with Raspberry Pi Available on the App Store http://www.linuxjournal.com/ios WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 65 FEATURE Managing Linux Using Puppet to edit all the modules or machine manifests, or you may not want all manifests rolled out to each machine. Our solution is to use multiple repositories, one for generic modules, one for machine-/customer- specific configuration and one for global information. This keeps our core modules separated and under proper release management while also allowing us to release critical global changes easily. Scaling Up/Trade-offs The approach outlined in this article works well for us. I hope it works for you as well; however, you may want to consider some additional points. As our servers differ in ways that are not consistent, using Facter or metadata to drive configuration isn't suitable for us. However, if you have 100 Web servers, using the hostname of nginx-prod-099 to determine the install requirements would save a lot of time. A lot of people use the Puppet master to roll out and push changes, and this is the general approach referred to in a lot of tutorials on-line. You can combine this with PuppetDB to share information from one machine to another machine—for example, the public key of one server can be shared to another server. Conclusion This article has barely scratched the surface of what can be done using Puppet. Virtually everything about your machines can be managed using the various Puppet built-in resources or modules. After using it for a short while, you'll experience the ease of building a second server with a few commands or of rolling out a change to many servers in minutes. Once you can make changes across servers so easily, it becomes much more rewarding to build things as well as possible. For example, monitoring your cron jobs and backups can take a lot more work than the actual task itself, but with configuration management, you can build a reusable module and then use it for everything. For me. Puppet has transformed system administration from a chore into a rewarding activity because of the huge leverage you get. Give it a go; once you do, you'll never go back!* David Barton is the Managing Director of OnelT, a company specializing in custom business software development. David has been using Linux since 1998 and managing the company’s Linux servers for more than ten years. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. 66 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Where every interaction matters. break down your innovation barriers When you’re presented with new opportunities, you want to focus on turning them into successes, not whether your IT solution can support them. Peer 1 Hosting powers your business with our wholly owned FastFiber Network™, global footprint, and offers professionally managed public and private cloud solutions that are secure, scalable, and customized for your business. Unsurpassed performance and reliability help build your business foundation to be rock-solid, ready for high growth, and deliver the fast user experience your customers expect. Want more on cloud? Call: 844.855.6655 | go.peerl.com/linux | Vew Cloud Webinar: Public and Private Cloud I Managed Hosting | Dedicated Hosting | Colocation Image: © Can Stock Photo Inc. / bigbro S erver hardening. The very words conjure up images of tempering soft steel into an unbreakable blade, or taking soft clay and firing it in a kiln, producing a hardened vessel that will last many years. Indeed, server hardening is very much like that. Putting an unprotected server out on the Internet is like putting chum in the ocean water you are swimming in—it won't be long and you'll have a lot of excited sharks circling you, and the outcome is unlikely to be good. Everyone knows it, but sometimes under the pressure of deadlines, not to mention the inevitable push from the business interests to prioritize those things with more immediate visibility and that add to the bottom line, it can be difficult to keep up with even what threats you need to mitigate, much less the best techniques to use to do so. This is how corners get cut—corners that increase our risk of catastrophe. This isn't entirely inexcusable. A sysadmin must necessarily be a jack of all trades, and security is only one responsibility that must be considered, and not the one most likely to cause immediate pain. Even in organizations that have dedicated security staff, those parts of the organization dedicated to it often spend their time keeping up with the nitty gritty of the latest exploits and can't know the stack they are protecting as well as those who are knee deep in maintaining it. The more specialized and diversified the separate organizations, the more isolated each group becomes from the big picture. Without the big picture, sensible trade-offs between security and functionality are harder to make. Since a deep and thorough knowledge of the technology stack along with the business it serves is necessary to do a thorough job with security, it sometimes seems nearly hopeless. A truly comprehensive work on server hardening would be beyond the scope not only of a single article, but a single (very large) book, yet all is not lost. It is true that there can be no "one true hardening procedure" due to the many and varied environments, technologies and purposes to which those technologies are put, but it is also true that you can develop a methodology for governing those technologies and the processes that put the technology to use that can guide you toward a sane setup. You can boil down the essentials to a few principles that you then can apply across the board. In this article, I explore some examples of application. I also should say that server hardening, in itself, is almost a WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 69 FEATURE Server Hardening useless endeavor if you are going to undercut yourself with lazy choices like passwords of "abc123" or lack a holistic approach to security in the environment. Insecure coding practices can mean that the one hole you open is gaping, and users e-mailing passwords can negate all your hard work. The human element is key, and that means fostering security consciousness at all steps of the process. Security that is bolted on instead of baked in will never be as complete or as easy to maintain, but when you don't have executive support for organizational standards, bolting it on may be the best you can do. You can sleep well though knowing that at least the Linux server for which you are responsible is in fact properly if not exhaustively secured. The single most important principle of server hardening is this: minimize your attack surface. The reason is simple and intuitive: a smaller target is harder to hit. Applying this principle across all facets of the server is essential. This begins with installing only the specific packages and software that are exactly necessary for the business purpose of the server and the minimal set of management and maintenance packages. Everything present must be vetted and trusted and maintained. Every line of code that can be run is another potential exploit on your system, and what is not installed can not be used against you. Every distribution and service of which I am aware has an option for a minimal install, and this is always where you should begin. The second most important principle is like it: secure that which must be exposed. This likewise spans the environment from physical access to the hardware, to encrypting everything that you can everywhere— at rest on the disk, on the network and everywhere in between. For the physical location of the server, locks, biometrics, access logs—all the tools you can bring to bear to controlling and recording who gains physical access to your server are good things, because physical access, an accessible BIOS and a bootable USB drive are just one combination that can mean that your server might as well have grown legs and walked away with all your data on it. Rogue, hidden wireless SSIDs broadcast from a USB device can exist for some time before being stumbled upon. For the purposes of this article though, I'm going to make a few assumptions that will shrink the topics to cover a bit. Let's assume you are putting a new Linux-based server on a cloud service like AWS or 70 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Rackspace. What do you need to do first? Since this is in someone else's data center, and you already have vetted the physical security practices of the provider (right?), you begin with your distribution of choice and a minimal install—just enough to boot and start SSH so you can access your shiny new server. Within the parameters of this example scenario, there are levels of concern that differ depending on the purpose of the server, ranging from "this is a toy I'm playing with, and I don't care what happens to it" all the way to "governments will topple and masses of people die if this information is leaked", and although a different level of paranoia and effort needs to be applied in each case, the principles remain the same. Even if you don't care what ultimately happens to the server, you still don't want it joining a botnet and contributing to Internet Mayhem. If you don't care, you are bad and you should feel bad. If you are setting up a server for the latter purpose, you are probably more expert than myself and have no reason to be reading this article, so let's split the difference and assume that should your server be cracked, embarrassment, brand damage and loss of revenue (along with your job) will ensue. In any of these cases, the very first thing to do is tighten your network access. If the hosting provider provides a mechanism for this, like Amazon's "Zones", use it, but don't stop there. Underneath securing what must be exposed is another principle: layers within layers containing hurdle after hurdle. Increase the effort required to reach the final destination, and you reduce the number that are willing and able to reach it. Zones, or network firewalls, can fail due to bugs, mistakes and who knows what factors that could come into play. Maximizing redundancy and backup systems in the case of failure is a good in itself. All of the most celebrated data thefts have happened when not just some but all of the advice contained in this article was ignored, and if only one hurdle had required some effort to surmount, it is likely that those responsible would have moved on to someone else with lower hanging fruit. Don't be the lower hanging fruit. You don't always have to outrun the bear. The first principle, that which is not present (installed or running) can not be used against you, requires that you ensure you've both closed down and turned off all unnecessary services and ports in all runlevels and made them inaccessible via WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 71 FEATURE Server Hardening your server's firewall, in addition to whatever other firewalling you are doing on the network. This can be done via your distribution's tools or simply by editing filenames in /etc/rcX.d directories. If you aren't sure if you need something, turn it off, reboot, and see what breaks. But, before doing the above, make sure you have an emergency console back door first! This won't be the last time you need it. When just beginning to tinker with securing a server, it is likely you will lock yourself out more than once. If your provider doesn't provide a console that works when the network is inaccessible, the next best thing is to take an image and roll back if the server goes dark. I suggest first doing two things: running ps -ef and making sure you understand what all running processes are doing, and 1 sof -ni | grep LISTEN to make sure you understand why all the listening ports are open, and that the process you expect has opened them. For instance, on one of my servers running WordPress, the results are these: # ps -ef | grep -v \] | wc -1 39 I won't list out all of my process names, but after pulling out all the kernel processes, I have 39 other processes running, and I know exactly what all of them are and why they are running. Next I examine: # 1sof ■ -ni | grep LISTEN mysqld 1638 mysql 10U IPv4 10579 0t0 TCP 127.0.0. . 1:mysql (LISTEN) sshd 1952 root 3u IPv4 11571 0t0 TCP * : ssh (LISTEN) sshd 1952 root 4u IPv6 11573 0t0 TCP * : ssh (LISTEN) ngi nx 2319 root 7u IPv4 12400 0t0 TCP * : http (LISTEN) ngi nx 2319 root 8u IPv4 12401 0t0 TCP *:https (LISTEN) ngi nx 2319 root 9u IPv6 12402 0t0 TCP *:http (LISTEN) ngi nx 2320 WWW -data 7u IPv4 12400 0t0 TCP *: http (LISTEN) ngi nx 2320 WWW -data 8u IPv4 12401 0t0 TCP *:https (LISTEN) ngi nx 2320 WWW -data 9u IPv6 12402 0t0 TCP *: http (LISTEN) This is exactly as I expect, and it's the minimal set of ports necessary for the purpose of the server (to run WordPress). Now, to make sure only the necessary ports are open, you need to tune your firewall. Most hosting providers, if you use one of their templates, will by default have all rules set to "accept". This is bad. This defies the second principle: whatever must be exposed must be secured. If, by some accident of nature, some software opened a port you did not expect, you need to make sure it will be inaccessible. Every distribution has its tools for 72 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM managing a firewall, and others are available in most package managers. I don't bother with them, as iptables (once you gain some familiarity with it) is fairly easy to understand and use, and it is the same on all systems. Like vi, you can expect its presence everywhere, so it pays to be able to use it. A basic firewall looks something like this: # make sure forwarding is off and clear everything # also turn off ipv6 cause if you don't need it # turn it off sysctl net.ipv6.conf.all.disable_ipv6=l sysctl net.ipv4.ip_forward=0 iptables -F iptables --flush iptables -t nat --flush iptables -t mangle --flush iptables --delete-chain iptables -t nat --delete-chain iptables -t mangle --delete-chain #make the default -drop everything iptables --policy INPUT DROP iptables --policy OUTPUT ACCEPT iptables --policy FORWARD DROP #allow all in loopback iptables -A INPUT -i lo -j ACCEPT #allow related iptables -A INPUT -m state --state ^ESTABLISHED,RELATED -j ACCEPT #allow ssh iptables -A INPUT -m tcp -p tcp --dport 22 -j ACCEPT You can get fancy, wrap this in a script, drop a file in /etc/rc.d, link it to the runlevels in /etc/rcX.d, and have it start right after networking, or it might be sufficient for your purposes to run it straight out of /etc/rc.local. Then you modify this file as requirements change. For instance, to allow ssh, http and https traffic, you can switch the last line above to this one: iptables -A INPUT -p tcp -m state --state NEW -m ^multiport --dports ssh,http,https -j ACCEPT More specific rules are better. Let's say what you've built is an intranet server, and you know where your traffic will be coming from and on what interface. You instead could add something like this to the bottom of your iptables script: iptables -A INPUT -i eth0 -s 192.168.1.0/24 -p tcp **-m state --state NEW -m multiport --dports http,https There are a couple things to consider in this example that you might need to tweak. For one, WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 73 FEATURE Server Hardening this allows all outbound traffic initiated from the server. Depending on your needs and paranoia level, you may not wish to do so. Setting outbound traffic to default deny will significantly complicate maintenance for things like security updates, so weigh that complication against your level of concern about rootkits communicating outbound to phone home. Should you go with default deny for outbound, iptables is an extremely powerful and flexible tool—you can control outbound communications based on parameters like process name and owning user ID, rate limit connections—almost anything you can think of—so if you have the time to experiment, you can control your network traffic with a very high degree of granularity. Second, I'm setting the default to DROP instead of REJECT. DROP is a bit of security by obscurity. It can discourage a script kiddie if his port scan takes too long, but since you have commonly scanned ports open, it will not deter a determined attacker, and it might complicate your own troubleshooting as you have to wait for the client-side timeout in the case you've blocked a port in iptables, either on purpose or by accident. Also, as I've detailed in a previous article in Linux Journal (http://www.linuxjournal.com/ content/back-dead-si mple-bash- complex-ddos), TCP-level rejects are very useful in high traffic situations to clear out the resources used to track connections statefully on the server and on network gear farther out. Your mileage may vary. Finally, your distribution's minimal install might not have sysctl installed or on by default. You'll need that, so make sure it is on and works. It makes inspecting and changing system values much easier, as most versions support tab auto-completion. You also might need to include full paths to the binaries (usually/sbin/iptables and /sbin/sysctl), depending on the base path variable of your particular system. All of the above probably should be finished within a few minutes of bringing up the server. I recommend not opening the ports for your application until after you've installed and configured the applications you are running on the server. So at the point when you have a new minimal server with only SSH open, you should apply all updates using your distribution's method. You can decide now if you want to do this manually on a schedule or set them to automatic, which your distribution probably has a mechanism to do. If 74 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM not, a script dropped in cron.daily will do the trick. Sometimes updates break things, so evaluate carefully. Whether you do automatic updates or not, with the frequency with which critical flaws that sometimes require manual configuration changes are being uncovered right now, you need to monitor the appropriate lists and sites for critical security updates to your stack manually, and apply them as necessary. Once you've dealt with updates, you can move on and continue to evaluate your server against the two security principles of 1) minimal attack surface and 2) secure everything that must be exposed. At this point, you are pretty solid on point one. On point two, there is more you can yet do. The concept of hurdles requires that you not allow root to log in remotely. Gaining root should be at least a two-part process. This is easy enough; you simply set this line in /etc/ssh/sshd_config: PermitRootLogin no For that matter, root should not be able to log in directly at all. The account should have no password and should be accessible only via sudo— another hurdle to clear. If a user doesn't need to have remote login, don't allow it, or better said, allow only users that you know need remote access. This satisfies both principles. Use the AllowUsers and AllowGroups settings in /etc/ ssh/sshd_config to make sure you are allowing only the necessary users. You can set a password policy on your server to require a complex password for any and all users, but I believe it is generally a better idea to bypass crackable passwords altogether and use key-only login, and have the key require a complex passphrase. This raises the bar for cracking into your system, as it is virtually impossible to brute force an RSA key. The key could be physically stolen from your client system, which is why you need the complex passphrase. Without getting into a discussion of length or strength of key or passphrase, one way to create it is like this: ssh-keygen -t rsa Then when prompted, enter and re-enter the desired passphrase. Copy the public portion (id_rsa.pub or similar) into a file in the user's home directory called ~/.ssh/authorized_keys, and then in a new terminal window, try logging in, and troubleshoot as necessary. I store the key and the passphrase WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 75 FEATURE Server Hardening in a secure data vault provided by Personal, Inc. (https://personal.com), and this will allow me, even if away from home and away from my normal systems, to install the key and have the passphrase to unlock it, in case an emergency arises. (Disclaimer: Personal is the startup I work with currently.) Once it works, change this line in /etc/ssh/sshd_config: PasswordAuthentication no Now you can log in only with the key. I still recommend keeping a complex password for the users, so that when you sudo, you have that layer of protection as well. Now to take complete control of your server, an attacker needs your private key, your passphrase and your password on the server—hurdle after hurdle. In fact, in my company, we also use multi-factor authentication in addition to these other methods, so you must have the key, the passphrase, the pre-secured device that will receive the notification of the login request and the user's password. That is a pretty steep hill to climb. Encryption is a big part of keeping your server secure—encrypt everything that matters to you. Always be aware of how data, particularly authentication data, is stored and transmitted. Needless to say, you never should allow login or connections over an unencrypted channel like FTP, Telnet, rsh or other legacy protocols. These are huge no- nos that completely undo all the hard work you've put into securing your server. Anyone who can gain access to a switch nearby and perform reverse arp poisoning to mirror your traffic will own your servers. Always use sftp or scp for file transfers and ssh for secure shell access. Use https for logins to your applications, and never store passwords, only hashes. Even with strong encryption in use, in the recent past, many flaws have been found in widely used programs and protocols—get used to turning ciphers on and off in both OpenSSH and OpenSSL. I'm not covering Web servers here, but the lines of interest you would put in your /etc/ssh/sshd_config file would look something like this: Ciphers aesl28-ctr,aesl92-ctr,aes256-ctr,arcfour256,a refour128 MACs hmac-shal,umac-64@openssh.com,hmac-ripemdl60 Then you can add or remove as necessary. See man sshd_config for all the details. Depending on your level of paranoia and the purpose of your 76 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM server, you might be tempted to stop here. I wouldn't. Get used to installing, using and tuning a few more security essentials, because these last few steps will make you exponentially more secure. I'm well into principle two now (secure everything that must be exposed), and I'm bordering on the third principle: assume that every measure will be defeated. There is definitely a point of diminishing returns with the third principle, where the change to the risk does not justify the additional time and effort, but where that point falls is something you and your organization have to decide. The fact of the matter is that even though you've locked down your authentication, there still exists the chance, however small, that a configuration mistake or an update is changing/breaking your config, or by blind luck an attacker could find a way into your system, or even that the system came with a backdoor. There are a few things you can do that will further protect you from those risks. Speaking of backdoors, everything from phones to the firmware of hard drives has backdoors pre-installed. Lenovo has been caught no less than three times pre-installing rootkits, and Sony rooted customer systems in a misguided attempt at DRM. A programming mistake in OpenSSL left a hole open that the NSA has been exploiting to defeat encryption for at least a decade without informing the community, and this was apparently only one of several. In the late 2000s, someone anonymously attempted to insert a two-line programming error into the Linux kernel that would cause a remote root exploit under certain conditions. So suffice it to say, I personally do not trust anything sourced from the NSA, and I turn SELinux off because I'm a fan of warrants and the fourth amendment. The instructions are generally available, but usually all you need to do is make this change to /etc/selinux/config: #SELINUX=enforcing # comment out SELINUX=disabled # turn it off, restart the system In the spirit of turning off and blocking what isn't needed, since most of the malicious traffic on the Internet comes from just a few sources, why do you need to give them a shot at cracking your servers? I run a short script that collects various blacklists of exploited servers in botnets, Chinese and Russian CIDR ranges and so on, and creates a blocklist from them, updating once a day. Back in the day, you couldn't do WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 77 FEATURE Server Hardening this, as iptables gets bogged down matching more than a few thousand lines, so having a rule for every malicious IP out there just wasn't feasible. With the maturity of the ipset project, now it is. ipset uses a binary search algorithm that adds only one pass to the search each time the list doubles, so an arbitrarily large list can be searched efficiently for a match, although 1 believe there is a limit of 65k entries in the ipset table. To make use of it, add this at the "http://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=l.1.1.1" *►# TOR Exit Nodes "http://www.maxmind.com/en/anonymous_proxies" # MaxMind GeoIP ^Anonymous Proxies "http://danger.rulez.sk/projects/bruteforceblocker/blist.php" **# BruteForceBlocker IP List "http://rules.emergingthreats.net/blockrules/rbn-ips.txt" **# Emerging Threats - Russian Business Networks List "http://www.spamhaus.org/drop/drop.lasso" # Spamhaus Dont Route **0r Peer List (DROP) "http://cinsscore.com/list/ci-badguys.txt" # C.I. Army Malicious **IP List "http://www.openbl.org/lists/base.txt" # OpenBLOCK.org 30 day List bottom of your iptables script: "http://www.autoshun.org/files/shunlist.csv" # Autoshun Shun List "http://lists.blocklist.de/lists/all.txt" # blocklist.de attackers #create iptables blocklist rule and ipset hash ipset create blocklist hash:net iptables -I INPUT 1 -m set --match-set blocklist cd $TMP_DIR **src -j DROP # This gets the various lists Then put this somewhere executable and run it out of cron once a day: for i in "${BLOCKLISTS[@]}" do curl "$i" > $IP_TMP grep -Po '(?:\d{l,3}\.){3}\d{l,3}(?:/\d{l,2})? 1 $IP_TMP » #!/bin/bash $ IP_B LOC KLIST_TM P done PATH=$PATH:/sbin for i in 'echo $1ist'; do WD='pwd' # This section gets wizcrafts lists TMP_DIR=$WD/tmp wget --quiet http://www.wizcrafts.net/$i-iptables-blocklist.html IP_TMP=$TMP_DIR/ip.temp # Grep out all but ip blocks IP_BLOCKLIST=$WD/ip-blocklist.conf cat $i-iptables-blocklist.html | grep -v \< | grep -v \: | IP_BLOCKLIST_TMP=$TMP_DIR/ip-blocklist.temp ^grep -v \; | grep -v \# | grep [0-9] > $i.txt list="chinese nigerian russian lacnic exploited-servers" # Consolidate blocks into master list BLOCKLISTS=( cat $i.txt » $IP_B LOC KLIST_TM P "http://www.projecthoneypot.org/list_of_ips.php?t=d&rss=l" # Project done ^Honey Pot Directory of Dictionary Attacker IPs 78 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM sort $IP_BLOCKLIST_TMP -n | uniq > $IP_BLOCKLIST rm $IP_BLOCKLIST_TM P wc -1 $IP_BLOCKLIST ipset flush blocklist egrep -v " A #| A $" $IP_BLOCKLIST | while IFS= read -r ip do ipset add blocklist $ip done #cleanup rm -fR $TMP_DIR/* exit 0 It's possible you don't want all these blocked. I usually leave tor exit nodes open to enable anonymity, or if you do business in China, you certainly can't block every IP range coming from there. Remove unwanted items from the URLs to be downloaded. When I turned this on, within 24 hours, the number of banned IPs triggered by brute-force crack attempts on SSH dropped from hundreds to less than ten. Although there are many more areas to be hardened, since according to principle three we assume all measures will be defeated, I will have to leave things like locking down cron and bash as well as automating standard security configurations across environments for another day. There are a few more packages I consider security musts, including multiple methods to check for intrusion (I run both chkrootkit and rkhunter to update signatures and scan my systems at least daily). I want to conclude with one last must-use tool: Fail2ban. Fai12ban is available in virtually every distribution's repositories now, and it has become my go-to. Not only is it an extensible Swiss-army knife of brute-force authentication prevention, it comes with an additional bevy of filters to detect other attempts to do bad things to your system. If you do nothing but install it, run it, keep it updated and turn on its filters for any services you run, especially SSH, you will be far better off than you were otherwise. As for me, I have other higher-level software like WordPress log to auth.log for filtering and banning of malefactors with Fail2ban. You can custom-configure how long to ban based on how many filter matches (like failed login attempts of various kinds) and specify longer bans for "recidivist" abusers that keep coming back. Here's one example of the extensibility of the tool. During log review (another important component of a holistic security approach), I noticed many thousands of the WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 79 FEATURE Server Hardening following kinds of probes, conning especially from China: sshd[***]: Received disconnect from **.**.**.**; n : Bye Bye [preauth] sshd[***]: Received disconnect from **.**.**.**; n : Bye Bye [preauth] sshd[***]: Received disconnect from **.**.**.**; n : Bye Bye [preauth] There were two forms of this, and I could not find any explanation of a known exploit that matched this pattern, but there had to be a reason I was getting so many so quickly. It wasn't enough to be a denial of service, but it was a steady flow. Either it was a zero-day exploit or some algorithm sending malformed requests of various kinds hoping to trigger a memory problem in hopes of uncovering an exploit— in any case, there was no reason to allow them to continue. I added this line to the fai 1 regex = section of /etc/fai12ban/fi11er.d/sshd.local: A %(_prefix_line)sReceived disconnect from : *11: (Bye Bye)? \[preauth\]$ Within minutes, I had banned 20 new IP addresses, and my logs were almost completely clear of these lines going forward. By now, you've seen my three primary principles of server hardening in action enough to know that systematically applying them to your systems will have you churning out reasonably hardened systems in no time. But, just to reiterate one more time: 1. Minimize attack surface. 2. Secure whatever remains and must be exposed. 3. Assume all security measures will be defeated. Feel free to give me a shout and let me know what you thought about the article. Let me know your thoughts on what I decided to include, any major omissions I cut for the sake of space you thought should have been included, and things you'd like to see in the future!* [root@localhost:~] # whoami uid=0 Greg Bledsoe, VP of Operations, Personal, Inc CEH, CPT, lj@bledsoehome.net @geek_king https://www.linkedin.com/in/gregbledsoe 20 years of making things work good, work again when they stop, and not stop working anymore. iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. 80 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM druoaUzeme Instant Access to Premium Online Drupal Training V Instant access to hundreds of hours of Drupal training with new videos added every week! y/ Learn from industry experts with real world experience building high profile sites s/ Learn on the go wherever you are with apps for iOS, Android & Roku s/ We also offer group accounts. Give your whole team access at a discounted rate! Learn about our latest video releases and offers first by following us on Facebook and Twitter (@drupalizeme)! Go to http://drupalize.me and get Drupalized today! • nucromiker.module x mjtrom*ker j* x jquerv mscrtAtOrct u Usage: S(obj). irisertAtCursorftext); obj - a textarea or textfield text - a string to insert $.fn.extend({ insertAtCaret: function(/nylfaZi/e){ return S(this).each(function(){ // If target element is hidden. if (S(this).is(' :hidden*) II $( return; } if (document. selection) { 10 a don’t do it. ).parents(' :hidde FREE DOWNLOADS WEBCASTS Maximizing NoSQL Clusters for Large Data Sets —- "" Sponsor: IBM This follow-on webcast to Reuven M. Lerner's well-received and widely acclaimed Geek Guide, "Take Control of Growing Redis NoSQL Server Clusters", will extend the discussion and get into the nuts and bolts of optimally maximizing your NoSQL clusters working with large data sets. Reuven's deep knowledge of development and NoSQL clusters will combine with Brad Brech's inti¬ mate understanding of the intricacies of IBM's Power Systems and large data sets in a free-wheeling discussion that will answer all your questions on this complex subject. > http://geekguide.linuxjournal.com/content/maximizing-nosql-clusters-large-data-sets puppet labs Howto Build High-Performing IT Teams — Including New Data on IT Performance from Puppet Labs 2015 State of DevOps Report Sponsor: Puppet Labs DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high- anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps sanely, and get measurable results in just weeks. > http://geekguide.linuxjournal.com/content/how-build-high-performing-it-teams-including-new-data-it- performance-puppet-labs-2015-state WHITE PAPERS redislabs Comparing NoSQL Solutions In a Real-World Scenario Sponsor: RedisLabs | Topic: Web Development | Author: Avalon Consulting Specializing in cloud architecture, Emind Cloud Experts is an AWS Advanced Consulting Partner and a Google Cloud Platform Premier Partner that assists enterprises and startups in establishing secure and scalable IT operations. The following benchmark employed a real-world use case from an Emind customer. The Emind team was tasked with the following high-level requirements: • Support a real-time voting process during massive live events (e.g., televised election surveys or "America Votes" type game shows). • Keep voters' data anonymous but unique. • Ensure scalability to support surges in requests. > http://geekguide.linuxjournal.com/content/comparing-nosql-solutions-real-world-scenario 82 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM FREE DOWNLOADS WHITE PAPERS >. t Linux Management with Red Hat Satellite: reanai. Measuring Business Impact and ROI Sponsor: Red Hat | Topic: Linux Management Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to de¬ ploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows in importance in terms of value to the business, managing Linux environments to high standards of service quality — availability, security, and performance — becomes an essential requirement for business success. > http://lnxjr.nl/RHS-ROI Standardized Operating Environments “ reanai. for | T Ef fj C j ency Sponsor: Red Hat The Red Hat® Standard Operating Environment SOE helps you define, deploy, and maintain Red Hat Enterprise Linux® and third-party applications as an SOE. The SOE is fully aligned with your requirements as an effective and managed process, and fully integrated with your IT environment and processes. Benefits of an SOE: SOE is a specification for a tested, standard selection of computer hardware, software, and their configuration for use on computers within an organization. The modular nature of the Red Hat SOE lets you select the most appropriate solutions to address your business' IT needs. SOE leads to: • Dramatically reduced deployment time. • Software deployed and configured in a standardized manner. • Simplified maintenance due to standardization. • Increased stability and reduced support and management costs. • There are many benefits to having an SOE within larger environments, such as: • Less total cost of ownership (TCO) for the IT environment. • More effective support. • Faster deployment times. • Standardization. > http://lnxjr.nl/RH-SOE WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 83 EOF A How Will the Big Data Craze Play Out? DOC SEARLS And, how does it compare to what we’ve already experienced with Linux and open source? I was in the buzz-making business long before I learned how it was done. That happened here, at Linux Journal. Some of it I learned by watching kernel developers make Linux so useful that it became irresponsible for anybody doing serious development not to consider it— and, eventually, not to use it. Some I learned just by doing my job here. But most of it I learned by watching the term "open source" get adopted by the world, and participating as a journalist in the process. For a view of how quickly "open source" became popular, see Figure 1 for a look at what Google's Ngram viewer shows. Ngram plots how often a term appears in books. It goes only to 2008, but the picture is clear enough. I suspect that curve's hockey stick began to angle toward the vertical on February 8, 1998. That was when Eric S. Raymond (aka ESR), published an open letter titled "Goodbye, 'free software'; hello, 'open source'" and made sure it got plenty of coverage. The letter leveraged Netscape's announcement two weeks earlier that it would release the source code to what would become the Mozilla browser, later called Firefox. Eric wrote: It's crunch time, people. The Netscape announcement changes everything. We've broken out of the little corner we've been in for 84 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 0 , 0000140 % -i 0 . 0000120 % 0 . 0000100 % 0 . 0000080 % 0 . 0000060 % 0 . 0000040 % 0 . 0000020 % 0 . 0000000 % Google books Ngram Viewer ,Qpen source 1920 1930 1940 1950 1960 1970 1980 1990 2000 Figure 1. Google Ngram Viewer: "open source” twenty years. We're in a whole new game now, a bigger and more exciting one—and one I think we can win. Which we did. How? Well, official bodies, such as the Open Source Initiative (OSI), were founded. (See Resources for a link to more history of the OSI.) O'Reilly published books and convened conferences. We wrote a lot about it at the time and haven't stopped (this piece being one example of that). But the prime mover was Eric himself, whom Christopher Locke describes as "a rhetorician of the first water". To put this in historic context, the dot-com mania was at high ebb in 1998 and 1999, and both Linux and open source played huge roles in that. Every Linux World Expo was lavishly funded and filled by optimistic start-ups with booths of all sizes and geeks with fun new jobs. At one of those, more than 10,000 attended an SRO talk by Linus. At the Expos and other gatherings, ESR held packed rooms in rapt attention, for hours, while he held forth on Linux, the hacker ethos and much more. But his main emphasis was on open source, and the need for hackers and their employers to adopt its code and methods—which they did, in droves. (Let's also remember that two of the biggest IPOs in history were Red Hat's and VA Linux's, in August and December 1999.) Ever since witnessing those success stories, I have been alert to memes and how they spread in WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 85 EOF r Google Trends 2005 2007 2000 2011 2013 2015 Figure 2. Google Trends:" big data” IBM big data McKinsey big data Search term Search term Google Trends Figure 3. Google Trends: “IBM big data”, "McKinsey big data” the technical world. Especially "Big Data" (see Figure 2). What happened in 2011? Did Big Data spontaneously combust? Was there a campaign of some kind? A coordinated set of campaigns? Though I can't prove it (at least not in the time I have), I believe the main cause was "Big data: The next frontier for innovation, competition, and productivity", published by McKinsey in May 2011, to much fanfare. That report, and following ones by McKinsey, drove publicity in Forbes, The Economist, various O'Reilly pubs, Financial Times 86 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 1 EOF IBM big d... SAP big d... HP big da... Oracle big... Microsoft... Search term Search term Search term Search term Search term Google Trends Figure 4. Google Trends: “IBM big data”, "SAP big data”, "HP big data”, "Oracle big data”, "Microsoft big data” and many others—while providing ample sales fodder for every big vendor selling Big Data products and services. Among those big vendors, none did a better job of leveraging and generating buzz than IBM. See Resources for the results of a Google search for IBM + "Big Data", for the calendar years 2010-201 1. Note that the first publication listed in that search, "Bringing big data to the Enterprise", is dated May 16, 2011, the same month as the McKinsey report. The next, "IBM Big Data - Where do I start?" is dated November 23, 201 1. Figure 3 shows a Google Trends graph for McKinsey, IBM and "big data". See that bump for IBM in late 2010 in Figure 3? That was due to a lot of push on IBM's part, which you can see in a search for IBM and big data just in 2010—and a search just for big data. So there was clearly something in the water already. But searches, as we see, didn't pick up until 2011. That's when the craze hit the marketplace, as we see in a search for IBM and four other big data vendors (Figure 4). So, although we may not have a clear enough answer for the cause, we do have clear evidence of the effects. Next question: to whom do those companies sell their Big Data stuff? WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 87 At the very least, it's the CMO, or Chief Marketing Officer—a title that didn't come into common use until the dot-com boom and got huge after that, as marketing's share of corporate overhead went up and up. On February 12, 2012, for example, Forbes ran a story titled "Five Years From Now, CMOs Will Spend More on IT Than CIOs Do". It begins: Marketing is now a fundamental driver of IT purchasing, and that trend shows no signs of stopping— or even slowing down—any time soon. In fact, Gartner analyst Laura McLellan recently predicted that by 2017, CMOs will spend more on IT than their counterpart CIOs. At first, that prediction may sound a bit over the top. (In just five years from now, CMOs are going to be spending more on IT than CIOs do?) But, consider this: 1) as we all know, marketing is becoming increasingly technology-based; 2) harnessing and mastering Big Data is now key to achieving competitive advantage; and 3) many marketing budgets already are larger—and faster growing—than IT budgets. In June 2012, IBM's index page was headlined, "Meet the new Chief Executive Customer. That's who's driving the new science of marketing." The copy was directly addressed to the CMO. In response, I wrote "Yes, please meet the Chief Executive Customer", which challenged some of IBM's pitch at the time. (I'm glad I quoted what I did in that post, because all but one of the links now go nowhere. The one that works redirects from the original page to "Emerging trends, tools and tech guidance for the data-driven CMO".) According to Wikibon, IBM was the top Big Data vendor by 2013, raking in $1,368 billion in revenue. In February of this year (2015), Reuters reported that IBM "is targeting $40 billion in annual revenue from the cloud, big data, security and other growth areas by 2018", and that this "would represent about 44 percent of $90 billion in total revenue that analysts expect from IBM in 2018". So I'm sure all the publicity works. I am also sure there is a mania to it, especially around the wanton harvesting of personal data by all means possible, for marketing purposes. Take a look at "The Big DatastiIlery", co-published by IBM and Aberdeen, which depicts this system at work (see Resources). I wrote about 88 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM EOF i The degree to which it demeans and insults our humanity is a measure of how insane marketing mania, drunk on a diet of Big Data, has become. it in my September 2013 EOF, titled "Linux vs. Bullshit". The " datasti 11 e ry" depicts human beings as beakers on a conveyor belt being fed marketing goop and releasing gases for the "datastillery" to process into more marketing goop. The degree to which it demeans and insults our humanity is a measure of how insane marketing mania, drunk on a diet of Big Data, has become. T.Rob Wyatt, an alpha geek and IBM veteran, doesn't challenge what I say about the timing of the Big Data buzz rise or the manias around its use as a term. But he does point out that Big Data is truly different in kind from its predecessor buzzterms (such as Data Processing) and how it deserves some respect: The term Big Data in its original sense represented a complete reversal of the prevailing approach to data. Big Data specifically refers to the moment in time when the value of keeping the data exceeded the cost and the prevailing strategy changed from purging data to retaining it. He adds: CPU cycles, storage and bandwidth are now so cheap that the cost of selecting which data to omit exceeds the cost of storing it all and mining it for value later. It doesn't even have to be valuable today, we can just store data away on speculation, knowing that only a small portion of it eventually needs to return value in order to realize a profit. Whereas we used to ruthlessly discard data, today we relentlessly hoard it; even if we don't know what the hell to do with it. We just know that whatever data element we discard today will be the one we really need tomorrow when the new crop of algorithms comes out. Which gets me to the story of Bill Binney, a former analyst with WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 89 EOF r Meanwhile, I’m wondering when and how the Big Data craze will run out—or if it ever will. the NSA. His specialty with the agency was getting maximum results from minimum data, by recognizing patterns in the data. One example of that approach was ThinThread, a system he and his colleagues developed at the NSA for identifying patterns indicating likely terrorist activity. ThinThread, Binney believes, would have identified the 9/11 hijackers, had the program not been discontinued three weeks before the attacks. Instead, the NSA favored more expensive programs based on gathering and hoarding the largest possible sums of data from everywhere, which makes it all the harder to analyze. His point: you don't find better needles in bigger haystacks. Binney resigned from the NSA after ThinThread was canceled and has had a contentious relationship with the agency ever since. I've had the privilege of spending some time with him, and I believe he is A Good American—the title of an upcoming documentary about him. I've seen a pre-release version, and I recommend seeing it when it hits the theaters. Meanwhile, I'm wondering when and how the Big Data craze will run out—or if it ever will. My bet is that it will, for three reasons. First, a huge percentage of Big Data work is devoted to marketing, and people in the marketplace are getting tired of being both the sources of Big Data and the targets of marketing aimed by it. They're rebelling by blocking ads and tracking at growing rates. Given the size of this appetite, other prophylactic technologies are sure to follow. For example, Apple is adding "Content Blocking" capabilities to its mobile Safari browser. This lets developers provide ways for users to block ads and tracking on their IOS devices, and to do it at a deeper level than the current add-ons. Naturally, all of this is freaking out the surveillance- driven marketing business known as "adtech" (as a search for adtech + adblock reveals). Second, other corporate functions must be getting tired of marketing hogging so much budget, while 90 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM earning customer hate in the marketplace. After years of winning budget fights among CXOs, expect CMOs to start losing a few—or more. Third, marketing is already looking to pull in the biggest possible data cache of all, from the Internet of Things. Here's T.Rob again: loT device vendors will sell their data to shadowy aggregators who live in the background ("...we may share with our affiliates..."). These are companies that provide just enough service so the customer-facing vendor can say the aggregator is a necessary part of their business, hence an affiliate or partner. The aggregators will do something resembling "big data" but generally are more interested in state than trends (I'm guessing at that based on current architecture) and will work on very specialized data sets of actual behavior seeking not merely to predict but rather to manipulate behavior in the immediate short term future (minutes to days). Since the algorithms and data sets differ greatly from those in the past, the name will change. The pivot will be the development of Advertiser Index Thank you as always for supporting our advertisers by buying their products! ADVERTISER URL PAGE # AnDevCon http://www.AnDevCon.com/ 49 Drupalize.me http://www.drupalize.me 81 EmperorLinux http://www.emperorlinux.com 21 Fossetcon 2015 http://www.fossetcon.org 45 Peer 1 http://go.peer1 .com/linux 67 Puppet Labs http://puppetlabs.com 1,7,51 Usinex LISA https://www.usenix.org/ conference/lisa 15 19 ATTENTION ADVERTISERS The Linux Journal brand's following has grown to a monthly readership nearly one million strong. Encompassing the magazine, Web site, newsletters and much more, Linux Journal offers the ideal content environment to help you reach your marketing objectives. For more information, please visit http://www.linuxjournal.com/advertising. WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 91 EOF r Google Trends 2005 2007 2009 2011 2013 2015 Figure 5. Google Trends: “open source”, "big data” new specialist roles in gathering, aggregating, correlating, and analyzing the datasets. This is only possible because our current regulatory regime allows all new data tech by default. If we can, then we should. There is no accountability of where the data goes after it leaves the customer-facing vendor's hands. There is no accountability of data gathered about people who are not account holders or members of a service. I'm betting that both customers and non-marketing parts of companies are going to fight that. Finally, I'm concerned about what I see in Figure 5. If things go the way Google Trends expects, next year open source and big data will attract roughly equal interest from those using search engines. This might be meaningless, or it might be meaningful. I dunno. What do you think?* Doc Searls is Senior Editor of Linux Journal. He is also a fellow with the Berkman Center for Internet and Society at Harvard University and the Center for Information Technology and Society at UC Santa Barbara. Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Send comments or feedback via http://www.linuxjournal.com/contact or to ljeditor@linuxjournal.com. 92 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM Resources Eric S. Raymond: http://www.catb.org/esr “Goodbye, ’free software’; hello, ’open source’”, by Eric S. Raymond: http://www.catb.org/esr/open-source.html “Netscape Announces Plans to Make Next-Generation Communicator Source Code Available Free on the Net”: http://web.archive.org/web/20021001071727/wp.netscape.com/ newsref/pr/newsrelease558.html Open Source Initiative: http://opensource.org/about History of the OSI: http://opensource.org/history O’Reilly Books on Open Source: http://search.oreilly. com/?q=open+source O’Reilly’s OSCON: http://www.oscon.com/open-source-eu-2015 Red Hat History (Wikipedia): https://en.wikipedia. 0 rg/wiki/Red_Hat#Hist 0 ry “VA Linux Registers A 698% Price Pop”, by Terzah Ewing, Lee Gomes and Charles Gasparino {The Wall Street Journal): http://www.wsj.com/articles/SB944749135343802895 Google Trends “big data”: https://www.google.com/trends/ explore#q=big%20data “Big data: The next frontier for innovation, competition, and productivity”, by McKinsey: http://www.mckinsey.com/insights/ business_technology/big_data_the_next_frontier_for_innovation Google Search Results for IBM + “Big Data”, 2010-2011: https://www.google.com/search?q=%2BIBM+%22Big+Data %22&newwindow=1 &safe=off&biw=1267&bih=710&source =lnt&tbs=cdr%3A1 %2Ccd_min%3A1 %2F1 %2F2010%2Ccd_ max%3A12%2F31 %2F2011 &tbm= “Bringing big data to the Enterprise”: http://www-01.ibm.com/ sof t wa re/a u/d at a/b i g d at a “IBM Big Data - Where do I start?”: https://www.ibm.com/ developerworks/community/blogs/ibm-big-data/entry/ibm_big_ data_where_do_i_start?lang=en Google Trends: “IBM big data”, “McKinsey big data”: https://www.google.com/trends/explore#q=IBM%20big%20 data,%20McKinsey%20big%20data&cmpt=q&tz=Etc/GMT%2B4 Google Search Results for “IBM big data” in 2010: https://www.google.com/search?q=ibm+big+data&newwindow= 1 &safe=off&biw=1095&bih=979&source=lnt&tbs=cdr%3A1 %2C cd_min%3A1 %2F1 %2F2010%2Ccd_max%3A12%2F31 %2F2010 Google Search Results for Just “big data”: https://www.google.com/search?q=ibm+big+data&newwin dow=1 &safe=off&biw=1095&bih=979&source=lnt&tbs=cdr %3A1 %2Ccd_min%3A1 %2F1 %2F2010%2Ccd_max%3A12% 2F31 %2F2010#newwindow=1&safe=off&tbs=cdr:1 %2Ccd_ min:1 %2F1 %2F2010%2Ccd_max:12%2F31 %2F2010&q=big+data Google Trends for “IBM big data”, “SAP big data”, “HP big data”, “Oracle big data”, “Microsoft big data”: https://www.google.com/search?q=ibm+big+data&newwin dow=1 &safe=off&biw=1095&bih=979&source=lnt&tbs=cdr %3A1 %2Ccd_min%3A1 %2F1 %2F2010%2Ccd_max%3A12% 2F31 %2F2010#newwindow=1&safe=off&tbs=cdr:1 %2Ccd_ min:1 %2F1 %2F2010%2Ccd_max:12%2F31 %2F2010&q=big+data Google Books Ngram Viewer Results for “chief marketing officer” between 1900 and 2008: https://books.google.com/ngrams/graph7c ontent=chief+marketing+officer&year_start=1900&year_end=2008 &corpus=0&smoothing=3&share=&direct_url=t1 %3B%2Cchief%20 marketing %20officer%3B%2Cc0 Forbes, “Five Years From Now, CMOs Will Spend More on IT Than CIOs Do”, by Lisa Arthur: http://www.forbes.com/sites/ lisaarthur/2012/02/08/five-years-from-now-cmos-will-spend-more- on-it-than-cios-do “By 2017 the CMO will Spend More on IT Than the CIO”, hosted by Gartner Analyst Laura McLellan (Webinar): http://my.gartner.com/ portal/server. pt?open=512&objlD=202&mode=2&PagelD=5553&res ld=1871515&ref=Webinar-Calendar “Yes, please meet the Chief Executive Customer”, by Doc Searls: https://blogs.law.harvard.edu/doc/2012/06/19/yes-please-meet-the- chief-executive-customer Emerging trends, tools and tech guidance for the data-driven CMO: http://www-935.ibm.com/services/c-suite/cmo Big Data Vendor Revenue and Market Forecast 2013-2017 (Wikibon): http://wikibon. 0 rg/wiki/v/Big_Data_Vendor_Revenue_and_Market_ Forecast_2013-2017 “IBM targets $40 billion in cloud, other growth areas by 2018” (Reuters): http://www.reuters.com/article/2015/02/27/us-ibm- investors-idUSKBNOLU 1LC20150227 “The Big Datastillery: Strategies to Accelerate the Return on Digital Data”: http://www.ibmbigdatahub.com/blog/big-datastillery- strategies-accelerate-return-digital-data “Linux vs. Bullshit”, by Doc Searls, Linux Journal, September 2013: http://www.linuxjournal.com/content/linux-vs-bullshit T.Rob Wyatt: https://tdotrob.wordpress.com William Binney (U.S. intelligence official): https://en.wikipedia.org/ wiki/William_Binney_%28U.S._intelligence_official%29 ThinThread: https://en.wikipedia.org/wiki/ThinThread A Good American: http://www.imdb.com/title/tt4065414 Safari 9.0 Secure Extension Distribution (“Content Blocking”): https://developer.apple.com/library/prerelease/ios/releasenotes/ General/WhatsNewlnSafari/Articles/Safari_9.html Google Search Results for adtech adblock: https://www.google.com/search?q=adtech+adblock&gws_rd=ssl Google Trends results for “open source”, “big data”: https://www.google.com/trends/explore#q=open%20source,%20 big%20data&cmpt=q&tz=Etc/GMT%2B4 WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 93