LINUX 

IP . I 

Since 1994: The Original Magazine of the\Linu^Com\mirutV \\\\ 

a WlV \ b mv 


\\ 


WWW vj 

AD 


w 


\\> 


SERVER\ \ 
HARDENIN 

TIPS AND TRICK 

MANAGE 

LINUX 

SYSTEMS 

WITH PUPPET 


/ \ v 


£5 


Puppet 

Application 

Orchestration 

Eliminate IT complexity 


AP“PR^ 


NOVEMBER 2015 | ISSUE 259 | www.linuxjournal.com 


w 


\Vs 


ON 


\\\\ 


HOW-TO: 

WliFiWetwork 

Installation 


What's the Future 
for Big Data? 


PERFORMANCE 
TESTING FOR 
WEB APPLICATIONS 


FLASH ROMs 
WITH A 
RASPBERRY PI 


WATCH: 

ISSUE 

OVERVIEW 


Practical books 
for the most technical 
people on the planet. 


GEEK GUIDES 


Download books for free with a 
simple one-time registration. 

http://geekguide.linuxjournal.com 


Improve Business 
Processes with 
an Enterprise 
Job Scheduler 


Improve 
Business 
Processes with 
an Enterprise 
Job Scheduler 

Author: 

Mike Diehl 

Sponsor: 

Skybot 


SPONSORED BY 


x T InterMapper 

GEEK GUIDE 


Finding 
Your Way 

Mapping Your Network 
to Improve Manageability 


( 

SPONSORED BY 

r 0 GeoTrust 

3EEK GUIDE 

CO 

DIY 

Commerce Site 

Author: 

Reuven M. Lerner 

Sponsor: GeoTrust 

SPONSORED BY 

/ puppet 

GEEK GUIDE 

CO 

DIY 

Commerce 

Combating 

Infrastructure 

LINUX 

TTliTli 

Site 


Sprawl 

mux 

TrnTI 


SPONSORED BY _ 

iCJa.cS & Cintei) 


1 

SEEK GUIDE 

CO 

i 


Get in the 
Fast Lane 
with NVMe 


Get in the 
Fast Lane 
with NVMe 

Author: 

Mike Diehl 

Sponsor: 

Silicon Mechanics 
& Intel 


SPONSORED BY 


GEEK GUIDE 


Take Control 
of Growing 
Redis NoSQL 
Server Clusters 


SPONSORED BY 

Bit9 


GEEK GUIDE 


Linux in 
the Time of 
Malware 


Linux in 
the Time 
of Malware 

Author: 

Federico Kereki 

Sponsor: 

Bit9 + Carbon Black 


SPONSORED 8Y 


ff) GeoTrust 

GEEK GUIDE 

1 

CO 


Web Servers 


and SSL 

Authentication 

UNU} ( 

TTnTli 


Finding Your 
Way: Mapping 
Your Network 
to Improve 
Manageability 

Author: 

Bill Childers 

Sponsor: 

InterMapper 


Combating 

Infrastructure 

Sprawl 

Author: 

Bill Childers 

Sponsor: 

Puppet Labs 


Take Control 
of Growing 
Redis NoSQL 
Server Clusters 

Author: 

Reuven M. Lerner 

Sponsor: IBM 


Apache Web 
Servers and 
SSL Encryption 

Author: 

Reuven M. Lerner 
Sponsor: GeoTrust 


CONTENTS 


NOVEMBER 2015 
ISSUE 259 


SYSTEM ADMINISTRATION 


\ \ \ \ \ 


\\YV\ 


52 Managing Linux 
Using Puppet Vv^ 

Managing 1 ,your servers doe 


have to be a chomY/ithVupps 

David Barton \\\\\ 


68 Server Hard 


A look at some esser 
to follow t^Vnitigate 

Greg Bledsoe \ 


wmr 


ON THE COVER 

• Server Hardening Tips and Tricks, p. 68 

• Manage Linux Systems with Puppet, p. 52 

• Performance Testing for Web Applications, p. 22 

• Flash ROMs with a Raspberry Pi, p. 34 

• How-To: Wi-Fi Network Installation, p. 38 

• What’s the Future for Big Data?, p. 84 


Cover Image: © Can Stock Photo Inc. / Anterovium 


4 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


22 Reuven M. Lerner’s 
At the Forge 

Performance Testing 

28 Dave Taylor’s 
Work the Shell 

Words—We Can Make Lots 
of Words 

34 Kyle Rankin’s 
Hack and / 

Flash ROMs with a Raspberry Pi 

38 Shawn Powers’ 

The Open-Source Classroom 

Wi-Fi, Part II: the Installation 

84 Doc Searls’ EOF 

How Will the Big Data Craze 
Play Out? 


IN EVERY ISSUE 
8 Current lssue.tar.gz 
10 UPFRONT 
20 Editors’ Choice 
46 New Products 
91 Advertisers Index 


LINUX JOURNAL (ISSN 1075-3583) is published monthly by Belltown Media, Inc., PO Box 980985, Houston, TX 77098 USA. Subscription rate is $29.50/year. Subscriptions start with the next issue. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 5 


LINUX 

JOURNAL 

Subscribe to 
Linux Journal 
Digital Edition 

for only 

$2.45 an issue. 


ENJOY: 
Timely delivery 
Off-line reading 
Easy navigation 


LINUX 

JOURNAL 


Executive Editor 

Jill Franklin 
jill@linuxjournal.com 

Senior Editor 

Doc Searls 

doc@linuxjournal.com 

Associate Editor 

Shawn Powers 
shawn@linuxjournal.com 

Art Director 

Garrick Antikajian 
garrick@linuxjournal.com 

Products Editor 

James Gray 

newproducts@linuxjournal.com 

Editor Emeritus 

Don Marti 

dmarti@linuxjournal.com 

Technical Editor 

Michael Baxter 
mab@cruzio.com 

Senior Columnist 

Reuven Lerner 
reuven@lerner.co.il 

Security Editor 

Mick Bauer 
mick@visi.com 

Hack Editor 

Kyle Rankin 
lj@greenfly.net 

Virtual Editor 

Bill Childers 

bill.childers@linuxjournal.com 


Contributing Editors 

Ibrahim Haddad • Robert Love • Zack Brown • Dave Phillips • Marco Fioretti • Ludovic Marcotte 
Paul Barry • Paul McKenney • Dave Taylor • Dirk Elmendorf • Justin Ryan • Adam Monsen 


President 

Publisher 

Associate Publisher 

Director of Digital Experience 

Accountant 


Carlie Fairchild 
publisher@linuxjournal.com 

Mark Irgang 
mark@linuxjournal.com 

John Grogan 
john@linuxjournal.com 

Katherine Druckman 
webmistress@linuxjournal.com 

Candy Beauchamp 
acct@linuxjournal.com 


Linux Journal is published by, and is a registered trade name of, 
Belltown Media, Inc. 

PO Box 980985, Houston, TX 77098 USA 


Phrase search 
and highlighting 

Ability to save, clip 
and share articles 


Editorial Advisory Panel 

Nick Baronian 

Kalyana Krishna Chadalavada 
Brian Conner • Keir Davis 
Michael Eager • Victor Gregorio 
David A. Lane • Steve Marquez 
Dave McAllister • Thomas Quinlan 
Chris D. Stark • Patrick Swartz 


Embedded videos 

Android & iOS apps, 
desktop and 
e-Reader versions 


Advertising 

E-MAIL: ads@linuxjournal.com 
URL: www.linuxjournal.com/advertising 
PHONE: +1 713-344-1956 ext. 2 

Subscriptions 

E-MAIL: subs@linuxjournal.com 
URL: www.linuxjournal.com/subscribe 
MAIL: PO Box 980985, Houston, TX 77098 USA 


LINUX is a registered trademark of Linus Torvalds. 


SUBSCRIBE TODAY! 


Puppet 

Application 

Orchestration 

Application Delivery Made Simple 


Model complex, distributed applications as 
Puppet code so you can quickly and reliably 
roll out new infrastructure and applications. 


Learn more at puppetlabs.com 


AP U PR£ 


Current_lssue.tar.gz 


Get Smart 


SHAWN POWERS 


W anna get smart? Use Linux. 
(Mic drop.) 

I hope you all rolled your 
eyes a bit, because although there's 
a kernel of truth there, everyone 
knows it takes a lot more than using 
Linux to be successful in IT. It takes 
hard work, planning, strategizing, 
maintaining and a thousand other 
things system administrators, 
developers and other tech folks do 
on a daily basis. Thankfully, Linux 
makes that work a little easier and a 
lot more fun I 

Reuven M. Lerner starts off 
this issue continuing his pseudo¬ 
series on Web performance 
enhancements. The past few months 
he has described how to deal with 
bottlenecks on your systems. Here, 
he looks at some ways to help suss 
out those hard-to-find problems 
before they become showstoppers. 
Whether you're trying to test a 
product proactively or trying to 
pressure a troublesome system into 


VIDEO: 

Shawn Powers runs 
through the latest issue. 


showing its underlying problems, 
Reuven's column will be very helpful. 

Dave Taylor continues his theme 
on making words, and this month, 
he shifts the focus from wooden 
building blocks to tinier wooden 
blocks—namely, Scrabble tiles. If 
you're stuck for a word and don't 
feel like a horrible cheating liar for 
using a script to help you, Dave's 
column likely will appeal to you. 

I'm pretty sure my Aunt Linda has 
been using Dave's script for years, 
because I just can't seem to beat 
her at Words With Friends. 

Although he's normally the 
geekiest in the bunch, Kyle Rankin 
goes to a new level of awesome this 
month when he revisits Libreboot. 
This time, his new laptop can't be 
flashed using software, so instead he 
actually uses a second computer to 
flash the chip on the motherboard 
with wires! I'm not sure how I 
can get to his level of nerdery in 
my column, other than maybe 
announcing my upcoming Raspberry-Pi- 
powered moon rover. Seriously 
though, Kyle's column is a must-read. 

I finish up my Wi-Fi series in this 


8 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


CURRENT ISSUE.TAR.GZ 


issue with an article about hardware. 
Understanding theory, channel 
width and frequency penetration 
is all well and good, but if you put 
your access points in the wrong 
place, your performance will still 
suffer. Knowledge and execution 
go together like peanut butter and 
chocolate, so using last month's 
theory to build this month's network 
infrastructure should be delicious. 
Even if you already have a decent 
Wi-Fi setup in your home or office, 
my article might help you tweak a 
little more performance out of your 
existing network. 

David Barton helps teach us to be 
smarter IT professionals by giving us 
a detailed look at Puppet. DevOps is 
all the rage for a very good reason. 
Tools like Puppet can turn a regular 
system administrator into a system 
superhero and transform developers 
into solution-delivering pros. David 
shows how to manage your Linux 
servers in a way that is scalable, 
repeatable and far less complicated 
than you might think. 

Managing multiple servers is 
great, but if those servers aren't 
secure, you're just scaling up a 
disaster waiting to happen. Greg 
Bledsoe walks through the process 
of server hardening. It's a stressful 
topic, because making sure your 
servers are secure is the hallmark 


of what it means to be a successful 
administrator. Unfortunately, it's also 
a moving target that can keep you 
up at night worrying. In his article, 
Greg explores some best practices 
along with some specific things 
you can do to make your already 
awesome Linux servers more secure 
and reliable. Whether you manage a 
simple Web server or a farm of cloud 
instances delivering apps, server 
hardening is vital. 

I think Spiderman said it best: 
"With great power comes great 
responsibility." That's true in life, 
but also true in computing. It's 
easy to take Linux for granted and 
assume that it's so secure out of 
the box, you needn't worry about 
it, or assume that since Linux is 
free, there's no cost when your 
infrastructure grows. By being smart 
about how you manage computers, 
you can take advantage of all the 
awesomeness Linux has to offer 
without falling victim to being 
overwhelmed or overconfident. 

Want to get smart? Do smart things. 
That's really the only waylH 


Shawn Powers is the Associate Editor for Linux Journal. 
He’s also the Gadget Guy for LinuxJournal.com. and he has 
an interesting collection of vintage Garfield coffee mugs. 
Don’t let his silly hairdo fool you. he’s a pretty ordinary guy 
and can be reached via e-mail at shawn@linuxjournal.com 
Or. swing by the #linuxjournal IRC channel on Freenode.net. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 9 


FRONT 

NEWS + FUN 


diff -u 

WHAT’S NEW IN KERNEL DEVELOPMENT 


The NMI (non-masking interrupt) 
system in Linux has been a 
notorious patchwork for a long 
time, and Andy Lutomirski recently 
decided to try to clean it up. NMIs 
occur when something's wrong with 
the hardware underlying a running 
system. Typically in those cases, the 
NMI attempts to preserve user data 
and get the system into as orderly 
a state as possible, before an 
inevitable crash. 

Andy felt that in the current NMI 
code, there were various corner 
cases and security holes that needed 
to be straightened out, but the 
way to go about doing so was not 
obvious. For example, sometimes an 
NMI could legitimately be triggered 
within another NMI, in which case 
the interrupt code would need to 
know that it had been called from 
"NMI context" rather than from 
regular kernel space. But, the best 
way to detect NMI context was not 
so easy to determine. 

Also, Andy saw no way around a 
significant speed cost, if his goal 


were to account for all possible 
corner cases. On the other hand, 
allowing some relatively acceptable 
level of incorrectness would let the 
kernel blaze along at a fast clip. 
Should he focus on maximizing 
speed or guaranteeing correctness? 

He submitted some patches, 
favoring the more correct approach, 
but this was actually shot down by 
Linus Torvalds. Linus wanted to 
favor speed over correctness if at 
all possible, which meant analyzing 
the specific problems that a less 
correct approach would introduce. 
Would any of them lead to real 
problems, or would the issues be 
largely ignorable? 

As Linus put it, for example, 
there was one case where it was 
theoretically possible for bad code 
to loop over infinitely recursing 
NMIs, causing the stack to grow 
without bound. But, the code to do 
that would have no use whatsoever, 
so any code that did it would be 
buggy anyway. So, Linus saw no 
need for Andy's patches to guard 


10 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


[UPFRONT i 


against that possibility. 

Going further, Linus said the simplest 
approach would be to disallow nested 
NMIs—this would save the trouble of 
having to guess whether code was in 
NMI context, and it would save all the 
other usual trouble associated with 
nesting call stacks. 

Problem solved! Except, not really. Andy 
and others proved reluctant to go along 
with Linus' idea. Not because it would 
cause any problems within the kernel, 
but because it would require discarding 
certain breakpoints that might be 
encountered in the code. If the kernel 
discarded breakpoints needed by the 
GDB debugger, it would make GDB 
useless for debugging the kernel. 

Andy dug a bit deeper into the code in an 
effort to come up with a way to avoid NMI 
recursion, while simultaneously avoiding 
disabling just those breakpoints needed 
by GDB. Finally, he came up with a 
solution that was acceptable to Linus: only 
in-kernel breakpoints would be discarded. 
User breakpoints, such as those set by the 
GDB user program, still could be kept. 

The NMI code has been super thorny 
and messed up. But in general, it seems 
like more and more of the super-messed-up 
stuff is being addressed by kernel developers. 
The NMI code is a case in point. After 
years of fragility and inconsistency, it's 
on the verge of becoming much cleaner 
and more predictable.— zackbrown 


They Said It 


If a problem has no 
solution, it may not be a 
problem, but a fact—not 
to be solved, but to be 
coped with over time. 

—Shimon Peres 

Happiness lies not in 
the mere possession of 
money. It lies in the joy 
of achievement, in the 
thrill of creative effort. 

—Franklin D. Roosevelt 

Do not be too moral. 

You may cheat yourself 
out of much life. Aim 
above morality. Be not 
simply good; be good 
for something. 

—Henry David Thoreau 

If you have accomplished 
all that you planned for 
yourself, you have not 
planned enough. 

—Edward Everett Hale 

The bitterest tears shed 
over graves are for 
words left unsaid and 
deeds left undone. 

—Harriet Beecher Stowe 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 11 


[UPFRONT i 


Android Candy: If You’re Not 
Using This, Then Do That 


The "If This Then That" site has been 
around for a long time, but if you 
haven't checked it out in a while, 
you owe it to yourself to do so. The 
Android app (which had a recent 
name change to simply "IF") makes 
it easy to manipulate on the fly, and 
you're still able to interact with your 
account on its Web site. The beauty 
of IFTTT is its ability to work without 
any user interaction. 

I have recipes set up that notify 
me when someone adds a file into 
a shared Dropbox folder, which is 
far more convenient than constantly 
checking manually. I also manage 
all my social network postings 
with IFTTT, so if I post a photo via 
Instagram or want to send a text 
update to Facebook and Twitter, all 
my social networking channels are 
updated. In fact, IFTTT even allows 
you to cross-post Instagram photos 
to Twitter and have them show up as 
native Twitter images. 

If you're not using IFTTT to 
automate your life, you need to head 
over to http://ifttt.com and start 
now. If you're already using it, you 
should download the Android app. 


■1G7 

(Image via Google Play Store) 


which has an incredible interface to 
the already awesome IFTTT back end. 
Get it at the Play Store today; just 
search for "IF" or "IFTTT"—either will 
find the app. 

—SHAWN POWERS 


12 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Install Windows? 
Yeah, Open Source 
Can Do That. 

For my day job, I occasionally have to 
demonstrate concepts in a Windows 
environment. The most time-consuming part 
of the process is almost always the installation. 
Don't get me wrong; Linux takes a long time 
to install, but in order to set up a multi-system 
lab of Windows computers, it can take days! 

Thankfully, the folks over at 
https://automatedlab.codeplex.com 
have created an open-source program that 
automatically will set up an entire lab of 
servers, including domain controllers, user 
accounts, trust relationships and all the other 
Windows things I tend to forget after going 
through the process manually. Because it's 
script-based, there are lots of pre-configured 
lab options ready to click and install. Whether 
you need a simple two-server lab or a complex 
farm with redundant domain controllers, 
Automated Lab can do the heavy lifting. 

Although the tool is open source, the Microsoft 
licenses are not. You need to have the installation 
keys and ISO files in place before you can build 
the labs. Still, the amount of time and headaches 
you can save with Automated Lab makes it well 
worth the download and configuration, especially 
if you need to build test labs on a regular basis. 

—SHAWN POWERS 


LINUX 

JOURNAL 

Fit Your Service 

SUBSCRIPTIONS: Linux Journal is available 
in a variety of digital formats, including PDF, 
.epub, .mobi and an on-line digital edition, 
as well as apps for iOS and Android devices. 
Renewing your subscription, changing your 
e-mail address for issue delivery, paying your 
invoice, viewing your account details or other 
subscription inquiries can be done instantly 
on-line: http://www.linuxjournal.com/subs. 
E-mail us at subs@linuxjournal.com or reach 
us via postal mail at Linux Journal, PO Box 
980985, Houston, TX 77098 USA. Please 
remember to include your complete name 
and address when contacting us. 

ACCESSING THE DIGITAL ARCHIVE: 

Your monthly download notifications 
will have links to the various formats 
and to the digital archive. To access the 
digital archive at any time, log in at 

http://www.linuxjournal.com/digital. 

LETTERS TO THE EDITOR: We welcome your 
letters and encourage you to submit them 
at http://www.linuxjournal.com/contact or 
mail them to Linux Journal, PO Box 980985, 
Houston, TX 77098 USA. Letters may be 
edited for space and clarity. 

WRITING FOR US: We always are looking 
for contributed articles, tutorials and 
real-world stories for the magazine. 

An author's guide, a list of topics and 
due dates can be found on-line: 
http://www.linuxjournal.com/author. 

FREE e-NEWSLETTERS: Linux Journal 
editors publish newsletters on both 
a weekly and monthly basis. Receive 
late-breaking news, technical tips and 
tricks, an inside look at upcoming issues 
and links to in-depth stories featured on 
http://www.linuxjournal.com. Subscribe 
for free today: http://www.linuxjournal.com/ 
enewsletters. 

ADVERTISING: Linux Journal is a great 
resource for readers and advertisers alike. 
Request a media kit, view our current 
editorial calendar and advertising due dates, 
or learn more about other advertising 
and marketing opportunities by visiting 
us on-line: http://ww.linuxjournal.com/ 
advertising. Contact us directly for further 
information: ads@linuxjournal.com or 
+ 1 713-344-1956 ext. 2. 


r 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 13 


[UPFRONT i 


Recipy for Science 


More and more journals are 
demanding that the science being 
published be reproducible. Ideally, if 
you publish your code, that should be 
enough for someone else to reproduce 
the results you are claiming. But, 
anyone who has done any actual 
computational science knows that 
this is not true. The number of times 
you twiddle bits of your code to test 
different hypotheses, or the specific 
bits of data you use to test your 
code and then to do your actual 
analysis, grows exponentially as you 
are going through your research 
program. It becomes very difficult to 
keep track of all of those changes 
and variations over time. 

Because more and more scientific 
work is being done in Python, a new 
tool is available to help automate the 
recording of your research program. 
Recipy is a new Python module 
that you can use within your code 
development to manage the history 
of said code development. 

Recipy exists in the Python module 
repository, so installation can be as 
easy as: 

pip install recipy 


The code resides in a GitHub 
repository, so you always can 
get the latest and greatest version 
by cloning the repository and 
installing it manually. If you do 
decide to install manually, you also 
can install the requirements with 
the following using the file from 
the recipy source code: 

pip install -r requirements.txt 

Once you have it installed, using it 
is extremely easy. You can alter your 
scripts by adding this line to the top 
of the file: 

import recipy 

It needs to be the very first line of 
Python executed in order to capture 
everything else that happens within 
your program. If you don't even want 
to alter your files that much, you can 
run your code through Recipy with 
the command: 

python -m recipy my_script.py 

All of the reporting data is stored 
within a TinyDB database, in a file 
named test.npy. 


14 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


[UPFRONT i 


Once you have collected the details 
of your code, you now can start to 
play around with the results stored 
in the test.npy file. To explore this 
module, let's use the sample code 
from the recipy documentation. A 
short example is the following, saved 
in the file my_script.py: 

import recipy 

import numpy 

arr = numpy.arange (10) 

arr = arr + 500 

numpy.save( 1 test.npy 1 , arr) 

The recipy module includes a script 
called recipy that can process the 
stored data. As a first look, you can 
use the following command, which 
will pull up details about the run: 

recipy search test.npy 

On my Cygwin machine (the power 
tool for Linux users forced to use a 
Windows machine), the results look 
like this: 

Run ID: eb4de53f-d90c-4451-8e35-d765cb82d4f9 
Created by berna_000 on 2015-09-07T02:18:17 
Ran /cygdrive/c/Users/berna_000/Dropbox/writing/1j/ 
^science/recipy/my_script.py using /usr/bin/python 
Git: commit 1149a58066ee6d2b6baa88ba00fd9effcf434689, in 
^repo /cygdrive/c/Users/berna_000/Dropbox/writing, 

^with origin https://github.com/joeybernard/writing.git 


Environment: CYGWIN_NT-10.0-2.2.0-0.289-5-3-x86_64-64bit, 
^python 2.7.10 (default, Jun 1 2015, 18:05:38) 

Inputs: none 

Outputs: /cygdrive/c/Users/berna_0O0/Dropbox/writing/lj/ 

^science/recipy/test.npy 

Every time you run your program, 
a new entry is added to the test.npy 
file. When you run the search 
command again, you will get a 
message like the following to let 
you know: 

** Previous runs creating this output have been found. 
**Run with --all to show. ** 

If using a text interface isn't your 
cup of tea, there is a GUI available 
with the following command, which 
gives you a potentially nicer interface 
(Figure 1): 

recipy gui 

This GUI is actually Web-based, 
so once you are done running this 
command, you can open it in the 
browser of your choice. 

Recipy stores its configuration 
and the database files within the 
directory -/.recipy. The configuration 
is stored in the recipyrc file in 
this folder. The database files also 
are located here by default. But, 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 15 


[UPFRONT i 


O recipy/recipy X \ G recipy python - Google Se... X Introducing recipy: effortless p... X ReciPy X + 

* # 127.0.0.1:49940 © C Q .Search ☆ £ 4- ft © O 

5§l Most Visited Dashboard j Getting Started £ The Druid Network: All... ^ Feedly M Gmail Inbox Q YouTube ■ Letter Writers Alliance... ■ Open Library (Open Li... ifc MSK Forums ^ News 

RecipyGui Runs Latest run 

Search runs 

Runs 


£3 X 

= 


View details 


E 


Run ID: 926d55c7-792d-43d6-9c3b-a848c1331aaa 
Created by berna_000 on 2015/09/07 02:32 

Ran /cygdrive/c/Users/berna_000/Dropbox/writing/lj/science/recipy/my_script.py using 
/usr/bin/python 

Environment: CYGWIN_NT-10.0-2.2.0-0.289-5-3-x86_64-64bit, python 2.7.10 (default, Jun 1 
2015, 18:05:38) 

Inputs: none 

Outputs: /cygdrive/c/Users/berna_000/Dropbox/writing/lj/science/recipy/test.npy 


® 0 EE H Desktop i^i mu 


Di in- 


Search the web and Windows 

o m 


A * 


Figure 1. Recipy includes a GUI that provides a more intuitive way to work with 
your run data. 


you can change that by using the 
configuration option: 

[database] path = /path/to/file.json 

This way, you can store these 
database files in a place where 
they will be backed up and 
potentially versioned. 

You can change the amount of 
information being logged with a 
few different configuration options. 
In the [general] section, you can 
use the debug option to include 
debugging messages or quiet to 


not print any messages. 

By default, all of the metadata around 
git commands is included within the 
recorded information. You can ignore 
some of this metadata selectively with 
the configuration section [ignored 
metadata]. If you use the di f f option, 
the output from a gi t diff command 
won't be stored. If instead you wanted 
to ignore everything, you could use 
the gi t option to skip everything 
related to git commands. You can 
ignore specific modules on either 
the recorded inputs or the outputs 
by using the configuration sections 


16 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


[UPFRONT i 


[ignored inputs] and [ignored 
outputs], respectively. For example, if 
you want to skip recording any outputs 
from the numpy module, you could use: 

[ignored outputs] 
numpy 

If you want to skip everything, you 
could use the special all option for 
either section. If these options are 
stored in the main configuration file 
mentioned above, it will apply to all 
of your recipy runs. If you want to use 
different options for different projects, 
you can use a file named .recipyrc 
within the current directory with the 
specific options for the project. 

The way that recipy works is that 
it ties into the Python system for 
importing modules. It does this by using 
wrapping classes around the modules 
that you want to record. Currently, the 
supported modules are numpy, scikit- 
learn, pandas, scikit-image, matplotlib, 
pillow, GDAL and nibabel. 

The wrapper function is extremely 
simple, however, so it is an easy matter 
to add wrappers for your favorite 
scientific module. All you need to do 
is implement the PatchSimple interface 
and add lists of the input and output 
functions that you want logged. 

After reading this article, you never 
should lose track of how you reached 


your results. You can configure 
recipy to record the details you find 
most important and be able to redo 
any calculation you did in the past. 
Techniques for reproducible research 
are going to be more important in 
the future, so this is definitely one 
method to add to your toolbox. 
Seeing as it is only at version 0.1.0, 
it will be well worth following this 
project to see how it matures and 
what new functionality is added to it 
in the future.— joey Bernard 


LINUX JOURNAL 

on your e-Reader 


LINUX 

JOURNAL 


AN INDEPTH 
LOOK 
AT WI-FI 
TECHNOLOGY 


DODGE 
BLOCKED 
NETWORKS 
with an RPi 


RASPBERRY PI 

. . 


BUILD A 1 
LARGE- 1 
SCREEN 
COMMAND 
CENTER 


FIXING POTENTIAL PERSONAL 

BOTTLENECKS BOUNDARIES 

INWEBAPPS AND THE CLOUD 


Customized Kindle and Nook 
editions now available 


LEARN MORE 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 17 


[UPFRONT i 


Simple Photo Editing, 
Linux Edition! 


(Image from http://www.pinta-project.com) 


A while back I 
wrote about the 
awesome open- 
source image 
editing program 
Paint.NET, which 
is available only 
for Windows. 

Although I'm 
thrilled there 
is an open- 
source option for 
Windows users, 

Paint.NET is one 
of those apps 
that is so cool, 

I wish it worked 
in Linux! Thankfully, there's another 
app in town with similar features, 
and it's cross-platform! 

Pinta isn't exactly a Paint.NET 
clone, but it looks and functions 
very much like the Windows-only 
image editor. It has simple controls, 
but they're powerful enough to 
do most of the simple image 
editing you need to do on a 
day-to-day basis. Whether you 
want to apply artistic filters, 
autocorrect color levels or just crop 


a former friend out of a group 
photo, Pinta has you covered. 

There certainly are more robust 
image editing options available 
for Linux, but often programs like 
GIMP are overkill for simple editing. 
Pinta is designed with the "less is 
more" mentality. It's available for 
Linux, OS X, Windows and even 
BSD, so there's no reason to avoid 
trying Pinta today. Check it out 
at http://www.pinta-project.com. 

—SHAWN POWERS 


18 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


usenix 


■ 


LISA 1 17 


More craft. 
Less cruft. 


The LISA conference is where IT operations 
professionals, site reliability engineers, system 
administrators, architects, software engineers, 
and researchers come together, discuss, and 
gain real-world knowledge about designing, 
building, and maintaining the critical systems 
of our interconnected world. 

LISA15 will feature talks and training from: 

1 Mikey Dickerson, United States Digital Service 
1 Nick Feamster, Princeton University 
1 Matt Harrison, Python/Data Science Trainer, Metasnake 
1 Elizabeth Joseph, Hewlett-Packard 
1 Tom Limoncelli, SRE, Stack Exchange, Inc 
1 Dinah McNutt, Google, Inc 
1 James Mickens, Harvard University 
1 Chris Soghoian, American Civil Liberties Union 
1 John Willis, Docker 


Register Today! 


Sponsored by USENIX in cooperation with LOPSA 


Nov. 8 - 13, 2015 
Washington, D.C. 

usenix.org/lisa15 


[EDITORS’ CHOICE] 


Tiny Makers 

If you've ever dropped Mentos 
in a bottle of Coke with kids or 
grown your own rock candy in a 
jar with string, you know how 
excited children get when doing 
science. For some of us, that 
fascination never goes away, 
which is why things like Maker 
Faire exist. If you want your children 
(or someone else's children) to grow 
into awesome nerds, one of the 


best things you 
can do is get 

them involved with projects 
at http://www.makershed.com. 

Although it's true that many of 
the kits you can purchase are a bit 
too advanced for kindergartners, 
there are plenty that are perfect 
for any age. You can head over 
to http://www.makershed.com/ 
collections/beginner to see a bunch 


20 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


of pre-selected projects designed for 
beginners of all ages. All it takes is 
a dancing brush-bot or a handful of 
LED throwies to make kids fall in love 
with making things. 

Even if you don't purchase the kits 
from Maker Shed, I urge you to inspire 
the youngsters in your life into creating 
awesome things. If you guide them, 
they'll be less likely to do the sorts of 
things I did in my youth, like make a 
stun gun from an automobile ignition 
coil and take it to school to show my 
friends. Trust me, principals are far 


more impressed with an Altoid-tin 
phone charger for show and tell than 
with a duct-tape-mounted taser gun. 

You can buy pre-made kits at 
http://www.makershed.com or visit 
sites like http://instructables.com 
for homemade ideas you can make 
yourself. In fact, doing cool projects 
with kids is such an awesome thing to 
do, it gets this month's Editors' Choice 
award. Giving an idea the award 
might seem like an odd thing to do, 
but who doesn't love science projects? 
We sure do!— shawnpowers 


Powerful: Rhino 


Rhino M4800/M6800 

• Dell Precision M6800 
w/ Core i7 Quad (8 core) 

• 15.6"-17.3" QHD+ LED 
w/ X@3200xl800 

• NVidia Quadro K5100M 

• 750 GB - 1 TB hard drive 

•Up to 32 GB RAM (1866 MHz) 

• DVD±RW or Blu-ray 

• 802.11a/b/g/n 

•Starts at $1375 

• E6230, E6330, E6440, E6540 
also available 


• High performance NVidia 3-D on an QHD+ RGB/LED 

• High performance Core i7 Quad CPUs, 32 GB RAM 

• Ultimate configurability — choose your laptop's features 

• One year Linux tech support — phone and email 

• Three year manufacturer's on-site warranty 

• Choice of pre-installed Linux distribution: 


Tablet: Raven 


Raven X240 

• ThinkPad X240 by Lenovo 

• 12.5" FHD LED w/ X@1920xl080 
•2.6-2.9 GHz Core i7 

•Up to 16 GB RAM 

• 180-256 GBSSD 
•Starts at $1910 

• W540, T440, T540 also available 


{ 


Rugged: Tarantula 


Tarantula CF-31 

• Panasonic Toughbook CF-31 

• Fully rugged MIL-SPEC-810G tested: 
drops, dust, moisture & more 

• 13.1" XGA TouchScreen 
•2.4-2.8 GHz Core i5 
•Up to 16 GB RAM 

• 320-750 GB hard drive / 512 GB SSD 

• CF-19, CF-52, CF-H2, FZ-G1 available 


EmperorLinux 

0 

www.EmperorLinux.com ri 

...where Linux 8i laptops converge 

1-888-651-6686 A 

Model specifications and availability may vary. 


COLUMNS 


AT THE FORGE 


Performance 

Testing 


REUVEN M. 
LERNER 


A look at tools that push your server to its limits, 
testing loads before your users do. 


In my last few articles. I've 
considered Web application performance 
in a number of different ways. What are 
the different parts of a Web application? 
How might each be slow? What are the 
different types of slowness for which 
you can (and should) check? How much 
load can a given server (or collection of 
servers) handle? 

So in this article, I survey several 
open-source tools you can use to 
better identify how slow your Web 
applications might be running, in a 
number of different ways. I should 
add that as the Web has grown in size 
and scope, the number and types of 
ways you can check your apps' speed 
also have become highly diverse, such 
that talking about "load testing" or 
"performance testing" should beg the 
question, "Which kind of testing are 
you talking about?" 

I also should note that although 
I have tried to cover a number of 
the most popular and best-known 


tools, there are dozens (and perhaps 
hundreds) of additional tools that 
undoubtedly are useful. If I've 
neglected an excellent tool that you 
think will help others, please feel free 
to send me an e-mail or a Tweet; if 
readers suggest enough such tools, 

I'll be happy to follow up with an 
additional column on the subject. 

In my next article. I'll conclude 
this series by looking at tools and 
techniques you can use to identify and 
solve client-side problems. 

Logfiles 

One of the problems with load testing is 
that it often fails to catch the problems 
you experience in the wild. For this 
reason, some of the best tools that you 
have at your disposal are the logfiles on 
your Web server and in your database. 

I'm a bit crazy about logfiles, in that I 
enjoy having more information than I'll 
really need written in there, just in case. 
Does that tend to make my applications 


22 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


1 


COLUMNS 


AT THE FORGE 


perform a bit worse and use up more disk 
space? Absolutely—but I've often found 
that when users have problems, I'm able 
to understand what happened better, and 
why it happened, thanks to the logfiles. 

This is true in the case of application 
performance as well. Regarding Ruby 
on Rails, for example, the logfile will 
tell you how long each HTTP request 
took to be served, breaking that down 
further into how much time was spent 
in the database and creating the HTML 
output ("view"). This doesn't mean 
you can avoid digging deeper in many 
cases, but it does allow you to look 
through the logfile and get a basic 
sense of how long different queries 
are taking and understand where you 
should focus your efforts. 

In the case of databases, logfiles 
are also worth a huge amount. In 
particular, you'll want to turn on 
your database's system that logs 
queries that take longer than a certain 
threshold. MySQL has the "slow query 
log", and PostgreSQL has the 
log_min_duration_statement 
configuration option. In the case 
of PostgreSQL, you can set 
log_min_duratiorestatement to be 
any number of ms you like, enabling 
you to see, in the database's log, any 
query that takes longer than (for 
example) 500 ms. I often set this 
number to be 200 or 300 ms when I 


first work on an application, and then 
reduce it as I optimize the database, 
allowing me to find only those that 
are truly taking a long time. 

It's true that logfiles aren't quite 
part of load testing, but they are an 
invaluable part of any analysis you might 
perform, in production or even in 
your load tests. Indeed, when you run 
the load tests, you'll need to understand 
and determine where the problems 
and bottlenecks are. Being able to 
look at (and understand) the logs will 
give you an edge in such analysis. 

Apachebench 

Once you've set up your logfiles, you are 
ready to begin some basic load testing. 
Apachebench (ab) is one of the oldest 
load-testing programs, coming with the 
source code for Apache httpd. It's not 
the smartest or the most flexible, but ab 
is so easy to use that it's almost certainly 
worth trying it for some basic tests. 

ab takes a number of different 
options, but the most useful ones are 
as follows: 

■ n: the total number of requests 
to send. 

■ c: the number of requests to 
make concurrently. 

■ i: use a HEAD request instead of GET. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 23 


COLUMNS 


AT THE FORGE 


Thus, if I want to start testing the 
load on a system, I can say: 


ab -n 10000 -c 100 -i http://myserver.example.com/ 


Note that if you're requesting the 
home page from an HTTP server, you 
need to have the trailing slash, or ab 
will pretend it didn't see a URL. As 
it runs, ab will produce output as it 
passes every 10% milestone. 

ab produces a table full of useful 
information when you run it. Here are 
some parts that I got from running it 
against an admittedly small, slow box: 


Concurrency Level: 
Time taken for tests: 
Complete requests: 
Failed requests: 

Total transferred: 
HTML transferred: 
Requests per second: 
Time per request: 

Time per request: 


100 

36.938 seconds 
1000 
0 

1118000 bytes 
0 bytes 

27.07 [#/sec] (mean) 

3693.795 [ms] (mean) 

36.938 [ms] (mean, across all concurrent 


^requests) 

Transfer rate: 29.56 [Kbytes/sec] received 


In other words, my piddling Web 
server was able to handle all 1,000 
requests. But it was able to handle only 
27 simultaneous requests, meaning 
that about 75% of the concurrent 
requests sent to my box were being 
ignored. It took 3.6 seconds, on 


average, to respond to each request, 
which was also pretty sad and slow. 

Just from these results, you can 
imagine that this box needs to be 
running more copies of Apache (more 
processes or threads, depending on the 
configuration), just to handle a larger 
number of incoming requests. You 
also can imagine that I need to check 
it to see why going to the home page 
of this site takes so long. Perhaps the 
database hasn't been configured or 
optimized, or perhaps the home page 
contains a huge amount of server-side 
code that could be optimized away. 

Now, it's tempting to raise the 
concurrency level (-c option) to 
something really large, but if you're 
running a standard Linux box, you'll 
find that your system quickly runs out 
of file descriptors. In such cases, you 
either can reconfigure your system or 
you can use Bees with Machine Guns, 
described below. 

So, what's wrong with ab? Nothing 
in particular, other than the fact that 
you're dealing with a simple HTTP 
request. True, using ab's various 
options, you can pass an HTTP 
authentication string (user name and 
password), set cookies (names and 
values), and even send POST and PUT 
requests whose inputs come from 
specified files. But if you're looking 
to check the timing and performance 


24 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


1 


COLUMNS 


AT THE FORGE 


of a set of user actions, rather than a 
single URL request, ab isn't going to 
be enough for you. 

That said, given that the Web is 
stateless, and that you're likely to 
be focusing on a few particular URLs 
that might be causing problems, 
ab still might be sufficient for your 
needs, assuming that you can set the 
authentication and cookies appropriately. 

The above also fails to take into 
account how users perceive the 
speed of your Web site, ab measured 
only the time it took to do all of the 
server-side processing. Assuming 
that network latency is zero and that 
JavaScript executes infinitely fast, 
you don't need to worry about such 
things. But of course, this is the real 
world, which means that client-side 
operations are no less important, as 
you'll see in my next article. 

Bees with Machine Guns (BWMG) 

If there's an award for best open-source 
project name, I think that it must go to 
Bees with Machine Guns. Just saying 
this project's name is almost guaranteed 
to get me to laugh out loud. And yet, it 
does something very serious, in a very 
clever way. It allows you to orchestrate 
a distributed denial-of-service (DDOS) 
attack against your own servers. 

The documentation for BWMG states 
this, but I'll add to the warnings. This 


tool has the potential to be used for 
evil, in that you can very easily set up 
a DDOS attack against any site you 
wish on the Internet. I have to imagine 
that you'll get caught pretty quickly 
if you do so, given that BWMG uses 
Amazon's EC2 cloud servers, which ties 
the servers you use to your name and 
credit card. But even if you won't get 
caught, you really shouldn't do this to 
a site that's not your own. 

In any event, Bees assumes that you 
have an account with Amazon. It's 
written in Python, and as such, it can 
be installed with the pip command: 

pip install beeswithmachineguns 

The basic idea of Bees is that it 
fires up a (user-configurable) number 
of EC2 machines. It then makes a 
number of HTTP requests, similar to 
ab, from each of those machines. You 
then power down the EC2 machines 
and get your results. 

In order for this to work, you'll need 
at least one AWS keypair (.pern file), 
which Bees will look for (by default) in 
your personal ~/.ssh directory. You can, 
of course, put it elsewhere. Bees relies 
on Boto, a Python package that allows 
for automated work with AWS, so 
you'll also need to define a ~/.boto file 
containing your AWS key and secret 
(that is, user name and password). 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 25 


COLUMNS 


AT THE FORGE 


Once you have the keypair and .boto 
files in place, you then can set up your 
Bees test. I strongly suggest that you 
put this in a shell script, thus ensuring 
that everything runs. You really 
don't want to fire up a bunch of EC2 
machines with the bees up command, 
only to discover the following month 
that you forgot to turn it off. 

Bees uses the bees command for 
everything, so every line of your 
script will start with the word bees. 
Some of the commands you can issue 
include the following: 

■ bees up: start up one or more EC2 
servers. You can specify the -s option 
to indicate the number of servers, 
the -g option to indicate the security 
group, and -k to tell Bees where to 
look for your EC2 keypair file. 

■ bees attack: much like ab, you'll 
use the -n option to indicate the 
number of requests you want to 
make and the -c option to indicate 
the level of concurrency. 

■ bees down: shut down all of the EC2 
servers you started in this session. 

So, if you want to do the same 
thing as before (that is, 1,000 
requests), but now divided across ten 
different servers, you would say: 


bees up -s 10 -g beesgroup -k beespair 

bees attack -n 100 -c 10 -u http://myserver.example.com/ 

bees down 

When you run Bees, the fun really 
begins. You get a verbose printout 
indicating that bees are joining 
the swarm, that they're attacking 
(bang bang!) and that they're done 
("offensive complete"). 

The report at the conclusion of 
this attack, similar to ab, will indicate 
whether all of the HTTP requests were 
completed successfully, how many 
requests the server could handle per 
second, and how long it took to respond 
to various proportions of bees attacking. 

Bees is a fantastic tool and can 
be used in at least two different 
ways. First, you can use it to double¬ 
check that your server will handle a 
particular load. For example, if you 
know that you're likely to get 100,000 
concurrent requests across your server 
farm, you can use Bees to load that 
up on 1,000 different EC2 machines. 

But another way to use Bees, or any 
load-testing tool, is to probe the limits 
of your system—that is, to overwhelm 
your server intentionally, to find out 
how many simultaneous requests it can 
take before failing over. This simply 
might be to understand the limits of 
the application's current architecture 
and implementation, or it might provide 


26 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


1 


COLUMNS 


AT THE FORGE 


you with insights into which parts of 
the application will fail first, so that you 
can address those issues. Regardless, in 
this scenario, you run your load-testing 
tool at repeatedly higher levels of 
concurrency until the system breaks—at 
which point you try to identify what 
broke, improve it and then overwhelm 
your server once again. 

A possible alternative to Bees with 
Machine Guns, which I have played 
with but never used in production, 
is Locust. Locust can run on a single 
machine (like ab) or on multiple 
machines, in a distributed fashion (like 
Bees). It's configured using Python 
and provides a Web-based monitoring 
interface that allows you to see the 
current progress and state of the 
requests. Locust uses Python objects, 
and it allows you to write Python 
functions that execute HTTP requests 
and then chain them together for 
complex interactions with a site. 

Conclusion 

If you're interested in testing your 
servers, there are several high-quality, 
open-source tools at your disposal. 

Here, I looked at several systems for 
exploring your server's limits, and also 
how you can configure your database to 
log when it has problems. You're likely 
going to want to use multiple tools to 
test your system, since each exposes a 


different set of potential problems. 

In my next article, I'll look at a 
variety of tools that let you identify 
problems and slowness within the 
client side of your Web application. ■ 


Reuven M. Lerner trains companies around the world in Python. 
PostgreSQL. Git and Ruby. His ebook. “Practice Makes Python”, 
contains 50 of his favorite exercises to sharpen your Python skills. 
Reuven blogs regularly at http://blog.lerner.co.il and tweets as 
@reuvenmlerner. Reuven has a PhD in Learning Sciences from 
Northwestern University, and he lives in Modi’in. Israel, with his 
wife and three children. 

Resources 

Apachebench is part of the HTTP 
server project at the Apache Software 
Foundation. That server is hosted at 

https://httpd.apache.org. ab is part of the 
source code package for Apache httpd. 

Bees with Machine Guns is hosted on 
GitHub at https://github.com/newsapps/ 
beeswithmachineguns. That page contains 
a README with basic information about how 
to use the program. It assumes familiarity 
with Amazon’s EC2 service and a working 
set of keys. 

Locust is hosted at http://locust.io, where 
there also is extensive documentation and 
examples. You will need to know Python, 
including the creation of functions and 
classes, in order to use Locust. 

Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 27 


COLUMNS 


WORK THE SHELL 


Words-We 
Can Make 
Lots of Words 


DAVE TAYLOR 


In this article, Dave Taylor shows complicated script code to 
complete the findwords script. Now you’ll be ready to crush 
everyone in Scrabble and Words with Friends . 


It was a dark and stormy night 

when I started this series here in 
Linux Journal —at least two months 
ago, and in Internet terms, that's 
quite a while. And just wait until our 
robot overlords are running the show, 
because then two months will be 10-20 
generations of robot evolution and quite 
frankly, the T-2000 probably could have 
solved this problem already anyway. 

Puny humans! 

But, we haven't yet reached the 
singularity—at least, I don't think so. I 
asked Siri, and she said we hadn't, so 
that's good enough, right? Let's dive 
back in to this programming project 
because the end is nigh! Well, for this 
topic at least. 

The challenge started out as trying 
to make words from a combination 
of letter blocks. You know, the 


wooden blocks that babies play 
with (or, alternatively, hurl at you if 
you're within 20 feet of them). Those 
give you six letters per space, but I 
simplified the problem down to the 
Scrabble tiles example: you have a set 
of letters on your rack; what words 
can you make with them? 

I've talked about algorithms for the 
last few months, so this time, let's 
really dig in to the code for findwords, 
the resultant script. After discarding 
various solutions, the one I've 
implemented has two phases: 

■ Identify a list of all words that 
are composed only of the letters 
started with (so "axe" wouldn't 
match the starting letters abcdefg). 

■ For each word that matches, check 


28 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


WORK THE SHELL 


that the number of letters needed 
to spell the word match up with 
the occurrences of letters in the 
starting pattern (so "frogger" can't 
be made from forger—but almost). 

Let's have a look at the code blocks, 
because it turns out that this is non¬ 
trivial to implement, but we have 
learned to bend The Force to do our 
bidding (in other words, we used 
regular expressions). 

First we step through the dictionary 
to identify n-letter words that don't 
contain letters excluded from the set, 
with the additional limitation that 
the word is between (length-3) and 
(length) letters long: 

unique="$(echo $1 | sed 's/./&\ 

/g' | tr '[[:upper:]] ' '[[:lower:]]' | sort | uniq | \ 
fmt | tr -C -d ’ [ [:alpha:]]')" 

while [ $minlength -It $length ] 
do 

regex=" A [$unique]{$minlength}$" 
if [ $verbose ] ; then 

echo "Raw word list of length $minlength for \ 
letterset $unique:" 

grep -E $regex "$dictionary" | tee -a $testwords 
else 

grep -E $regex "$dictionary" >> $testwords 
fi 

minlength="$(( $minlength + 1 ))" 
done 


I explained how this block works in 
my column in the last issue (October 
201 5), if you want to flip back and 
read it, but really, the hard work 
involves the very first line, creating 
the variable Sunique, which is 
a sorted, de-duped list of letters 
from the original pattern. Given 
"messages", for example, Sunique 
would be "aegms". 

Indeed, given "messages", here 
are the words that are identified as 
possibilities by findwords: 

Raw word list of length 6 for letterset aegms: 

assess 

mammas 

masses 

messes 

sesame 

Raw word list of length 7 for letterset aegms: 

amasses 

massage 

message 

Raw word list of length 8 for letterset aegms: 

assesses 

massages 

messages 

Clearly there's more work to do, 
because it's not possible to make the 
word "massages" from the starting 
pattern "messages", since there aren't 
enough occurrences of the letter "a". 
That's the job of the second part of 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 29 


COLUMNS 


WORK THE SHELL 


the code, so I'm just going to show 
you the whole thing, and then I'll 
explain specific sections: 

pattern="$(echo $1 | sed ’s/./&\ 

/g 1 | tr ’[[:upper:]]’ 1 [[:lower:]]' | sort | fmt 
sed 's/ //g')" 

for word in $( cat $testwords ) 
do 

simplified="$(echo $word | sed 's/./&\ 

/g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt 
sed 's/ //g 1 )" 

## PART THREE: do all letters of the word appear 

# in the pattern once and exactly once? Easy way: 

# loop through and remove each letter as used, 

# then compare end states 

indx=l; failed=0 
before=$pattern 

while [ $indx -It ${#simplified} ] 
do 

ltr=${simplified:$indx:1} 
after=$(echo $before | sed "s/$ltr/-/") 
if [ $before = $after ] ; then 
failed=l 
else 

before=$after 
fi 

indx=$(( $indx + 1 )) 
done 

if [ $failed -eq 0 ] ; then 

echo "SUCCESS: You can make the word $word" 
fi 
done 


The first rather gnarly expression to 
create Spattern from the specified 
starting argument ($1) normalizes 
the pattern to all lowercase, sorts the 
letters alphabetically, then reassembles 
it. In this case, "messages" would 
become "aeegmsss". Why? Because 
we can do that to each of the possible 
words too, and then the comparison 
test becomes easy. 

The list of possible words was 
created in part one and is stored in the 
temporary file Stestwords, so the 
"for" loop steps us through. For each 
word, $simplified becomes a similarly 
normalized pattern to check. For each 
letter in the proposed word, we replace 
that letter with a dash in the pattern, 
using two variables, $before and 
$af ter, to stage the change so we can 
ensure that something really did change 
for each letter. That's what's done here: 

after=$(echo $before | sed "s/$ltr/-/") 

If $before = Safter, then the 
needed letter from the proposed word 
wasn't found in the pattern, and the 
word can't be assembled from the 
pattern. On the other hand, if there 
are extra letters in the pattern after 
we're done analyzing the word, that's 
fine. That's the situation where we 
can make, for example, "games" from 
"messages", and that's perfectly valid, 


30 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


WORK THE SHELL 


even with the leftover letters. 

I've added some debugging statements 
so you can get a sense of what's going 
on in this example invocation: 

$ sh findwords.sh messages 

Raw word list of length 5 for letterset aegms: 

amass 

asses 

eases 

games 

gamma 

gases 

geese 

mamma 


sages 

seams 

seems 

Raw word list of length 6 for letterset aegms 

assess 

mammas 

masses 

messes 

sesame 

Raw word list of length 7 for letterset aegms 

amasses 

massage 

message 

Raw word list of length 8 for letterset aegms 
assesses 


LINUX JOURNAL 

on your Android device 


Download the app 
now from the 

Google Play Store. 


RASPBERRY PI 


Virtual 

Private 

Cloud 


www.linuxjournal.com/android 


For more information about advertising opportunities within Linux Journal iPhone, iPad and 
Android apps, contact John Grogan at +1-713-344-1956 x2 or ads@linuxjournal.com. 


COLUMNS 


WORK THE SHELL i 


massages 

messages 


created pattern aeegmsss 


SUCCESS: 

You 

can 

make 

the 

word 

asses 

SUCCESS: 

You 

can 

make 

the 

word 

eases 

SUCCESS: 

You 

can 

make 

the 

word 

games 

SUCCESS: 

You 

can 

make 

the 

word 

gases 

SUCCESS: 

You 

can 

make 

the 

word 

sages 

SUCCESS: 

You 

can 

make 

the 

word 

seams 

SUCCESS: 

You 

can 

make 

the 

word 

seems 

SUCCESS: 

You 

can 

make 

the 

word 

masses 

SUCCESS: 

You 

can 

make 

the 

word 

messes 

SUCCESS: 

You 

can 

make 

the 

word 

sesame 

SUCCESS: 

You 

can 

make 

the 

word 

message 

SUCCESS: 

You 

can 

make 

the 

word 

messages 


So, we can make a dozen different 
words out of the word "messages", 
including the word messages itself. What 
about the original pattern we were using 
in previous columns: "chicken"? For this 
one, let's skip the potential words and 
just look at the solution: 


SUCCESS: 

You 

can 

make 

the 

word 

chic 

SUCCESS: 

You 

can 

make 

the 

word 

chi n 

SUCCESS: 

You 

can 

make 

the 

word 

heck 

SUCCESS: 

You 

can 

make 

the 

word 

hick 

SUCCESS: 

You 

can 

make 

the 

word 

hike 

SUCCESS: 

You 

can 

make 

the 

word 

i nch 

SUCCESS: 

You 

can 

make 

the 

word 

neck 

SUCCESS: 

You 

can 

make 

the 

word 

nice 

SUCCESS: 

You 

can 

make 

the 

word 

nick 

SUCCESS: 

You 

can 

make 

the 

word 

check 

SUCCESS: 

You 

can 

make 

the 

word 

chick 


SUCCESS: You can make the word chink 

SUCCESS: You can make the word niche 

SUCCESS: You can make the word chicken 

Impressive! 

To make this work a bit better, I've 
added some error checking, included an 
-f flag so we can have the script also 
output failures, not just successes, 
and left in some additional debugging 
output if $verbose is set to true. 

See Listing 1 for the complete 
code. It's also available at 
http://www.linuxjournal.com/ 
extra/findwords. 

That's it. Now we have a nice tool 
that can help us figure out what to 
play the next time we're stuck on 
Scrabble, Words with Friends, or even 
looking at a big stack of letter blocks. 

Next month. I'll turn my attention 
to a different scripting challenge. 

Do you have an idea? Send it to 
ljeditor@linuxjournal.com.B 


Dave Taylor has been hacking shell scripts since the dawn of the 
computer era. Well, not really, but still. 30 years is a long time! 

He’s the author of the popular Wicked Cool Shell Scripts 
(10th anniversary update coming very soon from O’Reilly and 
NoStarch Press) and can be found on Twitter as @DaveTaylor and 
more generally at his tech site http://www.AskDaveTaylor.com. 

Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


32 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


WORK THE SHELL 


Listing 1. findwords.sh 

#!/bin/sh 

# Findwords -- given a set of letters, try to find all the words you can 

# spell 

dictionary="/Users/taylor/Documents/Linux Journal/dictionary.txt" 

testwords=$(mktemp /tmp/findwords.XXXXXX) || exit 1 

if [ -z "$1" ] ; then 

echo "Usage: findwords [sequence of letters]" 
exit 0 
fi 

if [ "$1" = "-f" ] ; then 
showfaiIs—1 
shift 
fi 

## PART ONE: make the regular expression 

length="$(echo "$1" | wc -c)" 

minlength=$(( Slength - 4 )) # we can ignore a max of 2 letters 

if [ $minlength -It 3 ] ; then 
echo "Error: sequence must be at least 5 letters long" 
exit 0 
fi 

unique="$(echo $1 | sed 's/./&\ 

/g' | tr '[[:upper:]]' '[[:lower:]]' | sort | uniq | fmt | \ 
tr -C -d '[[:alpha:]]')" 

while [ Sminlength -It Slength ] 
do 

regex=" A [$unique]{$minlength}$" 

if [ Sverbose ] ; then 

echo "Raw word list of length Sminlength for letterset Sunique:" 
grep -E Sregex "Sdictionary" | tee -a Stestwords 
else 

grep -E Sregex "Sdictionary" >> Stestwords 
fi 

minlength="$(( Sminlength + 1 ))" 
done 

## PART TWO: sort letters for validity filter 


pattern="$(echo $1 | sed 's/./&\ 

/g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt | sed 's/ //g')" 

for word in $( cat Stestwords ) 
do 

# echo "checking Sword for validity" 

simplified="$(echo Sword | sed 's/./&\ 

/g' | tr '[[:upper:]]' '[[:lower:]]' | sort | fmt | sed 's/ //g')" 

## PART THREE: do all letters of the word appear in the pattern 

# once and exactly once? Easy way: loop through and 

# remove each letter as used, then compare end states 

indx=l 

failed=0 

before=$pattern 

while [ Sindx -It ${#simplified} ] 
do 

ltr=${simplified:$indx:l} 
after=$(echo Sbefore | sed "s/$ltr/-/") 
if [ Sbefore = Safter ] ; then 

# nothing changed, so we don't have that 

# letter available any more 
if [ Sshowfails ] ; then 

echo "FAILURE: came close, but can't make Sword" 
fi 

failed=l 

else 

before=$after 

fi 

indx=$(( Sindx + 1 )) 
done 

if [ $failed -eq 0 ] ; then 
echo "SUCCESS: You can make the word Sword" 
fi 

done 

/bin/rm -f Stestwords 

exit 0 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 33 


COLUMNS 


Flash ROMs 
with a 

Raspberry Pi 


KYLE RANKIN 


It’s always so weird seeing a bunch of wires between 
your laptop and a Raspberry Pi. 


Earlier this year, I wrote a series of 
columns about my experience flashing 
a ThinkPad X60 laptop with Libreboot. 
Since then, the Libreboot project has 
expanded its hardware support to 
include the newer ThinkPad X200 
series, so I decided to upgrade. The 
main challenge with switching over to 
the X200 was that unlike the X60, you 
can't perform the initial Libreboot flash 
with software. Instead, you actually 
need to disassemble the laptop to 
expose the BIOS chip, clip a special clip 
called a Pomona clip to it that's wired to 
some device that can flash chips, cross 
your fingers and flash. 

I'm not generally a hardware hacker, 
so I didn't have any of the special- 
purpose hardware-flashing tools that 
you typically would use to do this 
right. I did, however, have a Raspberry 
Pi (well, many Raspberry Pis if I'm 


being honest), and it turns out that 
both it and the Beaglebone Black are 
platforms that have been used with 
flashrom successfully. So in this article, 

I describe the steps I performed to 
turn a regular Raspberry Pi running 
Raspbian into a BIOS-flashing machine. 

The Hardware 

To hardware-flash a BIOS chip, you 
need two main pieces of hardware: 
a Raspberry Pi and the appropriate 
Pomona clip for your chip. The Pomona 
clip actually clips over the top of your 
chip and has little teeth that make 
connections with each of the chip's 
pins. You then can wire up the other 
end of the clip to your hardware¬ 
flashing device, and it allows you to 
reprogram the chip without having to 
remove it. In my case, my BIOS chip 
had 16 pins (although some X200s use 


34 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


8-pin BIOS chips), so I ordered a 16-pin 
Pomona clip on-line at almost the same 
price as a Raspberry Pi! 

There is actually a really good guide 
on-line for flashing a number of different 
ThinkPads using a Raspberry Pi and the 
NOOBS distribution; see Resources if you 
want more details. Unfortunately, that 
guide didn't exist when I first wanted 
to do this, so instead I had to piece 
together what to do (specifically which 
GPIO pins to connect to which pins on 
the clip) by combining a general-purpose 
article on using flashrom on a Raspberry 
Pi with an article on flashing an X200 
with a Beaglebone Black. So although 
the guide I link to at the end of this 
article goes into more depth and looks 
correct, I can't directly vouch for it since I 
haven't followed its steps. The steps I list 
here are what worked for me. 

Pomona Clip Pinouts 

The guide I link to in the Resources 
section has a great graphic that goes 
into detail about the various pinouts 
you may need to use for various chips. 
Not all pins on the clip actually need to 
be connected for the X200. In my case, 
the simplified form is shown in Table 1 


for my 16-pin Pomona clip. 

So when I wired things up, I connected 
pin 2 of the Pomona clip to GPIO pin 
17, but in other guides, they use GPIO 
pin 1 for 3.3V. I list both because pin 17 
worked for me (and I imagine any 3.3V 
power source might work), but in case 
you want an alternative pin, there it is. 

Build Flashrom 

There are two main ways to build 
flashrom. If you intend to build and flash 
a Libreboot image from source, you can 
use the version of flashrom that comes 
with the Libreboot source. You also can 
just build flashrom directly from its git 
repository. Either way, you first will need 
to pull down all the build dependencies: 

$ sudo apt-get install build-essential pciutils 
^usbutils libpci-dev libusb-dev libftdil 
^libftdi-dev zliblg-dev subversion 

If you want to build flashrom 
directly from its source, do this: 

$ svn co svn://flashrom.org/ 
flashrom/trunk flashrom 
$ cd flashrom 
$ make 


Table 1. Pomona Clip Pinouts 


SPI Pin Name 

3.3V 

CS# 

SO/SIOI 

GND 

S1/SIOO 

SCLK 

Pomona Clip Pin # 

2 

7 

8 

10 

15 

16 

Raspberry Pi GPIO Pin # 

1 (17*) 

24 

21 

25 

19 

23 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 35 


COLUMNS 


Otherwise, if you want to build from 
the flashrom source included with 
Libreboot, do this: 

$ git clone http://libreboot.org/ 
**li breboot. gi t 
$ cd libreboot 
$ ./download flashrom 
$ ./build module flashrom 

In either circumstance, at the end 
of the process, you should have a 
flashrom binary compiled for the 
Raspberry Pi ready to use. 

Enable SPI 

The next step is to load two SPI 
modules so you can use the GPIO 
pins to flash. In my case, the 
Raspbian image I used did not 
default to enabling that device at 
boot, so I had to edit /boot/config.txt 
as root and make sure that the 
file contained dtparam=spi=on 
and then reboot. 

Once I rebooted, I then could load 
the two spi modules: 

$ sudo modprobe spi_bcm2708 
$ sudo modprobe spidev 

Now that the modules loaded 
successfully, I was ready to power 
down the Raspberry Pi and wire 
everything up. 


Wire Everything Up 

To wire everything up, I opened 
up my X200 (unplugged and with 
the battery removed, of course), 
found the BIOS chip (it is right 
under the front wrist rest) and 
attached the clip. If you attach the 
clip while the Raspberry Pi is still 
on, note that it will reboot. It's 
better to make all of the connections 
while everything is turned off. Once 
I was done, it looked like what you 
see in Figure 1. 

Then I booted the Raspberry Pi, 
loaded the two SPI modules and was 
able to use flashrom to read off a 
copy of my existing BIOS: 

sudo ./flashrom -p linux_spi:dev=/dev/spidev0.0 
factoryl.rom 

Now, the thing about using these 
clips to flash hardware is that 
sometimes the connections aren't 
perfect, and I've found that in some 
instances, I had to perform a flash 
many times before it succeeded. In 
the above case, I'd recommend that 
once it succeeds, you perform it a 
few more times and save a couple 
different copies of your existing 
BIOS (at least three), and then use 
a tool like sha256sum to compare 
them all. You may find that one or 
more of your copies don't match the 


36 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


Figure 1. Laptop Surgery 


rest. Once you get a few consistent 
copies that agree, you can be 
assured that you got a good copy. 

After you have a good backup 
copy of your existing BIOS, you can 
attempt a flash. It turns out that 
quite a bit has changed with the 
Libreboot-flashing process since 
the last time I wrote about it, so 
in a future column, I will revisit 
the topic with the more up-to-date 


method to flash Libreboot.a 


Kyle Rankin is a Sr. Systems Administrator in the San Francisco 
Bay Area and the author of a number of books, including The 
Official Ubuntu Server Book, Knoppix Hacks and Ubuntu Hacks. 
He is currently the president of the North Bay Linux Users’ Group. 

Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


Resources 

Hardware Flashing with Raspberry Pi: 

https://github.com/bibanon/Coreboot-ThinkPads/wiki/Hardware-Flashing-with-Raspberry-Pi 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 37 


COLUMNS 


THE OPEN-SOURCE CLASSROOM 


Wi-Fi, Part II: 
the Installation 

Moving from theoretical Wi-Fi to blinky lights! 


SHAWN POWERS 


Researching my last article, 

I learned more about Wi-Fi than most 
people learn in a lifetime. Although 
that knowledge is incredibly helpful 
when it comes to a real-world 
implementation, there still are a few 
caveats that are important as you take 
the theoretical to the physical. One 
of the most frustrating parts of a new 
installation is that you're required to 
put the cart before the horse. 

What do I mean by that? Well, 
when I set up my first Wi-Fi network 
in a school district, I paid a company 
to send technicians into the buildings 
with their fancy (and expensive) set of 
tools in order to give me a survey of 
the buildings so I'd know how many 
access points I'd need to cover things. 
What they failed to mention is that in 
order to determine how many access 
points I'd have to add, they tested 
my existing coverage and showed me 
dead spots. Since this was a brand- 
new installation, and I didn't have any 
access points to begin with, the survey 


result was "you need access points 
everywhere". Needless to say, I was 
less than impressed. 

So in order to set up a proper 
wireless network, the easiest thing 
to do is guess how many access 
points you'll need and put that many 
in place. Then you can do a site 
survey and figure out how well you 
guessed. Thankfully, your guesses can 
be educated guesses. In fact, if you 
understand how Wi-Fi antennas work, 
you can improve your coverage area 
drastically just by knowing how to 
position the access points. 

Antenna Signal Shape 

It would be simple if Wi-Fi signals 
came out of the access points in a 
big sphere, like a giant beach ball of 
signal. Unfortunately, that's not how 
it actually happens. Whether you 
have internal antennas or external 
positionable antennas, the signal is 
"shaped" like a donut with its hole 
over the antenna (Figure 1). While it 


38 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


k THE OPEN-SOURCE CLASSROOM 


Antenna Pattern 


i 

I Wi-Fi Product 


Figure 1. Knowing what the signal looks like 
(image from http://ampedwireless.com). 

still partially resembles a sphere, it's 
important to note where the signal 
isn't. Namely, there's a dead zone 
directly at the end of the antenna. If 
you've ever considered pointing the 
antenna at your distant laptop, trying 
to shoot the signal out the end of the 
antenna like a magic wand, you can 
see why people should leave magic 
wands to Harry Potter. 

I also want to mention long-range 
access points. When you purchase a 
long-range AP, it sounds like you're 
getting a more powerful unit. It's a 
little like a vacuum cleaner with two 
speeds—why would anyone ever 
want to use the low setting? With 
long-range access points, however, 
you're not getting any increased 
power. The trick is with how the 
antenna radiates its signal. Rather 


helps with placement 

than a big round donut shape, LR 
access points squish the donut so 
that it has the same general shape, 
but is more like a pancake. It reaches 
farther out to the sides, but sacrifices 
how "tall" the signal pattern reaches. 
So if you have a two-story house, 
changing to a long-range access 
point might get signal to your 
backyard, but the folks upstairs won't 
be able to check their e-mail. 

One last important aspect of 
antenna placement to consider is 
polarity. Wi-Fi antennas talk most 
efficiently when they have similar 
polarity. That means their "donuts" 
are on the same plane. So if you 
have your access point's antennas 
positioned horizontally (perhaps you 
have a very tall, very skinny building), 
any client antennas pointing vertically 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 39 


COLUMNS 


THE OPEN-SOURCE CLASSROOM 


will have a different polarity from your 
access point. They'll still be able to 
talk, but it will be less efficient. It's 
sort of like if you turned this article 
sideways. You still could read it, but it 
would be slower and a bit awkward. 

Since it's far better to have 
mismatched polarity than no signal 
at all, understanding the antenna 
pattern on your access points means 
you can position them for maximum 
coverage. If you have multiple 
antennas, you should consider where 
you want coverage as you position 
them vertically, horizontally or even 
at 45-degree angles (remember, a 
45-degree angle will mess up polarity, 
but it might mean that distant upstairs 
bedroom has coverage it might not 
get otherwise). 

If your access point doesn't have 
external antennas, it's most likely 
designed to have the "donut" stretch 
out to the sides, as if the antenna 
were pointing straight up. For units 
that can mount on the ceiling or wall, 
keep that in mind as you consider 
their positions, and realize coverage 
will be very different if you change 
from ceiling mount to wall mount. 

The Big Guessing Game 

Armed with an understanding of 
how Wi-Fi signal radiates out from 
the access points, the next step is to 


make your best guess on where you 
should place them. I usually start 
with a single access point in the 
middle of a house (or hallway in the 
case of a school), and see how far 
the signal penetrates. Unfortunately, 
2.4GHz and 5GHz don't penetrate 
walls the same. You'll likely find 
that 2.4GHz will go through more 
obstacles before the signal degrades. 
If you have access points with both 
2.4GHz and 5GHz, be sure to test 
both frequencies so you can estimate 
what you might need to cover your 
entire area. 

Thankfully, testing coverage is easy. 
Some access points, like my UniFi 
system, have planning apps built in 
(Figure 2), but they are just planning 
and don't actually test anything. 
There are programs for Windows, 

OS X and Android that will allow you 
to load up your floor plan, and then 
you can walk around the building 
marking your location to create an 
actual "heat map" of coverage. 

Those programs are really nice for 
creating a visual representation of 
your coverage, but honestly, they're 
not required if you just want to get 
the job done. Assuming you know 
the floor plan, you can walk from 
room to room using an Android 
phone or tablet with WiFi Analyzer 
and see the signal strength in any 


40 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


THE OPEN-SOURCE CLASSROOM 


u 

1 f 


Figure 2. Since this was a fairly new house, the UniFi planning tool did a nice job of 
accurately predicting coverage area. 


given location. Just make sure the 
app you choose supports 2.4GHz and 
5GHz, and that your phone or tablet 
has both as well! 

If you do want the heat map solution, 
Windows users will like HeatMapper 
from http://www.ekahau.com, and 
OS X users should try NetSpot from 
http://www.netspotapp.com. 
Android users should just search the 
Google Play store for "Wi-Fi heat 
map" or "Wi-Fi mapping". I don't 
know of a Linux-native heat map 
app that works from a laptop, but if 
anyone knows of a good one, please 
write in, and I'll try to include it in a 
future Letters section. 


Some Tough Purchase Decisions 

Here's where installing Wi-Fi starts 
to get ugly. If you read my last 
article (in the October 2015 issue), 
you'll know that with 2.4GHz, 
there are only three channels you 
should be using. If you live in 
close proximity to other people 
(apartments, subdivisions and so 
on), your channel availability might 
be even worse. When you add the 
variable coverage distance between 
2.4GHz and 5GHz, it means placing 
access points is really a game of 
compromise. There are a couple 
ways to handle the problem, but 
none are perfect. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 41 


COLUMNS 


THE OPEN-SOURCE CLASSROOM 


In a home where two or three 
access points is going to be enough, 
you generally can place them in 
the best locations (after testing, of 
course) and crank the power up to 
full blast on the 2.4GHz and 5GHz 
radios. You'll likely have plenty 
of available channels in the 5GHz 
range, so you probably won't have 
to worry about interfering with other 
access points in your house or even 
your neighbor's. If you're in a big 
house, or an office complex, or in an 
old house that has stubborn walls 
(like me), you might have to plan 
very carefully where you place your 
access points so that the available 
2.4GHz channels don't overlap. If 
you're using channel 1 in the front 
room, channel 6 in the basement 
and channel 11 in the kitchen at the 
back of the house, you might decide 
to use channel 6 for the upstairs. 

You need to make sure that when 
you actually are upstairs, however, 
that you can't see channel 6 from 
the basement, or you'll have a mess 
with channel conflicts. 

Thankfully, most access points 
allow you to decrease the radio 
transmit and receive power to avoid 
channels interfering with each other. 
It might seem counter-productive 
to decrease the power, but it's 
often a really great way to improve 


connectivity. Think of it like having 
a conversation. If two people are 
having a quiet conversation in one 
room, and another couple is talking 
in the next room, they can talk 
quite nicely without interfering. If 
everyone in the house is screaming 
at the top of their lungs, however, 
it means everyone can hear other 
conversations, making it confusing 
and awkward. 

It's also possible that you'll find 
you've worked out the perfect 
coverage area with the 2.4GHz 
frequency, but even with the radios 
cranked full blast, there are a few 
dead spots in the 5GHz range. In 
that case, you either can live with 
dead 5GHz zones or add another 
access point with only the 5GHz 
radio turned on. That will mean 
older client devices won't be 
able to connect to the additional 
access point, but if you already 
have 2.4GHz coverage everywhere, 
there's no need to pollute the 
spectrum with another unnecessary 
2.4GHz radio. 

Configuring Clients 

Let's assume you've covered your 
entire house or office with a blanket 
of 2.4GHz and 5GHz signals, and 
you want your clients to connect 
to the best possible signal to which 


42 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


COLUMNS 


k THE OPEN-SOURCE CLASSROOM 


they're capable of connecting. Ideally, 
you'd set all your access points to 
use the same SSID and have clients 
select which access point and which 
frequency they want to associate with 
automatically. Using a single SSID also 
means roaming around the house 
from access point to access point 
should be seamless. Client computers 
are designed to switch from channel 
to channel on the same SSID without 
disconnecting from the network at all. 

Unfortunately, in practice, not all 
client devices are smart enough to 
use 5GHz when they can. So although 
you might have a wonderful 5GHz 
signal sharing the same SSID with 
your 2.4GHz network, some of your 
compatible devices never will take 
advantage of the cleaner, faster 
network! (Sometimes they do, but I 
assure you, not always.) 

I've found the best solution, at 
least for me, is to have one SSID for 
the 2.4GHz spectrum and one SSID 
for the 5GHz spectrum. In my house, 
that means there's a "Powers" SSID 
in the 2.4GHz range and a "Super 
Powers" in the 5GHz range. If a 
device is capable of connecting to 
5GHz networks, I connect to that 
SSID and force it to use the better 
network. You might be able to get 
away with a single SSID and have 
your clients all do the right thing. 


but again. I've never had much luck 
with that. 

Repeaters Versus Access Points 

I'm a hard-core networking nerd, and 
I know it. Even with our new-to-us 
63-year-old house, I decided to run 
Ethernet cables to every access point 
location. (I just draped long cables 
around the house while testing; please 
don't drill holes into your house until 
you know where those holes should 
go!) For some people, running cables 
isn't possible. In those instances, it's 
possible to extend a single access 
point using a wireless repeater or 
extender (they're the same thing, 
basically). I urge you to avoid such 
devices if possible, but in a pinch, 
they're better than no coverage at all. 

How an extender works is by 
becoming both a client device and 
an access point in one. They connect 
to your central access point like any 
other client, and then using another 
antenna, they act as access points 
themselves. The problem is speed. If 
you connect to a repeater, you can get 
only half the speed of a connection to 
a wired access point. That's because 
the wireless transfer speed is split 
between your laptop and the repeater 
communicating with the distant access 
point. It's a little more complicated 
than that in practice (it has to do with 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 43 


COLUMNS 


THE OPEN-SOURCE CLASSROOM i 


transmission duplexing and so on), 
but the end result is any connection 
via repeater is only half as fast as to a 
wired access point. 

If you're talking about a 5GHz, wide¬ 
band connection, a repeated signal 
might be more than adequate for Web 
browsing from a distant bedroom. The 
ability to extend a network wirelessly 
is really awesome, but it's important to 
realize that awesomeness comes at a 
cost. You also need to understand that 
if you're in a room with a weak signal, 
placing a repeater in that room won't 
help. You need to place the repeater 
in a location between the central 
access point and remote client device, 
so it can act as a middle man relaying 
signals both ways. A repeater doesn't 
have any stronger of an antenna than 
a client device, so make sure if you do 
place a repeater, it's in a location with 
decent signal strength, or you'll just be 
repeating a horrible signal! 

Use Your Noodle, and Plan Well! 

In my last article, I talked about the 
actual wireless technologies involved 
with Wi-Fi signals. In this article, I 
discussed implementation and how 
to get the best coverage for your 
particular installation. Don't forget 
all the stuff I covered regarding 
MIMO, channel width and so on. 
Understanding how a Wi-Fi network 


works means you not only can get 
good coverage, but you can get 
awesome performance as well. 

I'll leave you with one last note: if 
you're planning a wireless install for 
a situation that has a large number of 
users, be sure to include bandwidth 
considerations in your planning. If you 
have a 54Mbps 802.1 1g connection 
shared between 26 people, that 
means the maximum theoretical 
bandwidth each person can use is 
2Mbps, which is painfully slow in most 
instances. You actually might need 
to lower the radio power and add 
multiple access points in order to split 
the load across multiple access points. 

Planning and installing Wi-Fi 
networks can be incredibly 
challenging, but it is also incredibly 
fun. Hopefully this two-part primer 
will help you deploy the best 
wireless experience possible. ■ 


Shawn Powers is the Associate Editor for Linux Journal. 

He’s also the Gadget Guy for LinuxJournal.com, and he has an 
interesting collection of vintage Garfield coffee mugs. Don’t let 
his silly hairdo fool you. he’s a pretty ordinary guy and can be 
reached via e-mail at shawn@linuxjournal.com. Or. swing by 
the #linuxjournal IRC channel on Freenode.net. 

Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


44 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


FREE AND OPEN SOURCE SOFTWARE EXPO 
FOSSCTCOnl AND TECHNOLOGY CONFERENCE 


2 0 15 


Come out and participate in the Second Annual Fossetcon 2015 
Florida's Only Free and Open Source Conference. With in 
2 minutes of Downtown Disney and other great entertainment 


DAYO 


BSD 

Jmulk 


DAY 1 


FOOD, TRAINING, 

WORKSHOPS AND CERTIFICATIONS 


FOOD, KEYNOTES, EXPO HALL, 
SPEAKER TRACKS 


DAY 2 


FOOD, KEYNOTES, EXPO HALL, 
SPEAKER TRACKS 


FREE FOOD, 
-TRAINING. 

* CERTIFICATIONS 
> AND GIVEAWAYS!!! 

M ^ t r f 1 < 


NOV 19 - NOV 21 

Hilton Lake Buena Vista Orlando, FL 

Fossetcon 2015: The Gateway To The Open Source Community 


More info at 

www.fossetcon.org 


NEW PRODUCTS 


EXIN Specialist Certificate in 
OpenStack Software Neutron 

Building on its successful foundational certificate in OpenStack software, the independent 
certification institute EXIN recently released its first specialist exam in the series, dubbed 
EXIN Specialist Certificate in OpenStack Software Neutron. Neutron is a cloud-networking 
controller within the OpenStack cloud computing initiative that delivers networking as a 
service. This new advanced exam is aimed at experienced users of OpenStack technology 
who design or build infrastructure. The vendor-neutral content, which was developed 
in close cooperation with Hewlett-Packard, covers architecture, plug-ins and extensions, 
managing networks, and troubleshooting methodology and tools. EXIN's mission with 
the new exam on Neutron is to enable experienced professionals to advance their careers 
by demonstrating their specialist skills and knowledge related to OpenStack software. In 
2016, EXIN expects to launch certifications for OpenStack Software Swift and Cinder. 
http://www.exin.com 


Tpamflnpct TeamQuest ’ s 

IQQ IIIUUwL Performance Software 

Carrying the simple moniker Performance Software, the latest innovation in predictive 
analytics from TeamQuest is a powerful application that enables organizations to 
assess intuitively the health and potential risks in their IT infrastructure. The secret to 
Performance Software's ability to warn IT management of problems before they occur 
stems from the deployment of lightning-fast and accurate predictive algorithms, coupled 
with the most popular IT data sources, including Amazon, Tivoli and HP. Customers 
also can perform data collection, analysis, predictive analytics and capacity planning for 
Ubuntu. TeamQuest calls itself the first organization that allows the existing infrastructure 
to remain entirely intact and augments the existing environment's operations with the 
industry-leading accurate risk assessment software. The firm also asserts that while 
competitors base their predictive and proactive capabilities on simplistic approximations of 
how IT infrastructure scales, only TeamQuest utilizes advanced queuing theory to predict 
what really matters—throughput and response time—not just resource utilization. 
http://www.teamquest.com 


46 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


1 


NEW PRODUCTS 


Linaro 

■■■■■■ 


Linaro Ltd.’s Secure Media 
Solutions for ARM-Based SoCs 


The embedded developer community is the target audience for Linaro Ltd.'s new open- 
source secure media solution for consumption of premium content on ARM-powered 
devices. In this solution, with support from Microsoft and the OpenCDM project, Linaro 
has successfully integrated several security features required by premium content service 
providers with the Microsoft PlayReady Digital Rights Management (DRM). Linaro's new 
solution enables application developers, silicon partners, OEMs, operators and content 
owners to use open-source technology to build feature-rich, secure products for the 
pay TV market. By bringing together all of the essential secure hardware and software 
elements into an open-source design, OEMs can reduce their time to market and provide 
new opportunities for service providers to deliver premium content across more consumer 
devices built on ARM-based SoCs. Essential security features include the World Wide Web 
Consortium's Encrypted Media Extensions, which enable premium-content service providers 
to write their electronic programming guide applications using standard HTML5 one time 
and run it on myriad devices. Linaro asserts that its new solution is "a key milestone that 
showcases how Microsoft PlayReady DRM works cross-platform in a standard way". 
http://www.linaro.org 


iWedia’s Teatro-3.0 


By integrating AllConnect streaming technology from 
Tuxera, iWedia's Teatro-3.0 set-top box (STB) software 

solution lets users take full control of the connected home and share music, photos, videos, 
movies and TV content to any screen. Teatro-3.0 is Linux-based with a Ul built with HTML/ 
CSS and specific JavaScript APIs allowing access to digital TV features. The solution features 
DLNA (player and renderer), access to "walled garden" Web and OTT video services (CE- 
HTML portals, HbbTV applications), as well as DVR and Time Shift Buffer. The streaming 
functionality occurs when Tuxera's AllConnect App discovers and dialogs with the DLNA 
Digital Media Renderer embedded in Teatro-3.0. The app then streams any content chosen 
by the user to the Teatro-3.0 media player. iWedia states that its STB easily can integrate 
into any hardware or software and is "the only solution to the market compatible with all 
smart TVs and STBs", including Apple TV, Android TV, Fire TV and Roku. 
htt p ://www. i wed i a. co m 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 47 


NEW PRODUCTS 


Mike Barlow’s Learning to Love 
Data Science (O’Reilly Media) 

The title of Mike Barlow's new O'Reilly book, Learning to Love Data Science, 
implies an inherent drudgery in the material. Bah! Most Linux enthusiasts 
will find magnetic the material in Barlow's tome, which is subtitled 
Explorations of Emerging Technologies and Platforms for Predictive Analytics, Machine Learning, 
Digital Manufacturing and Supply Chain Optimization. Covering data for social good to data 
for predictive maintenance, the book's format is an anthology of reports that offer a broad 
overview of the data space, including the applications that have arisen in enterprise companies, 
non-profits and everywhere in between. Barlow discusses—for both developers and suits—the 
culture that creates a data-driven organization and dives deeply into some of the business, 
social and technological advances brought about by our ability to handle and process massive 
amounts of data at scale. Readers also will understand how to promote and use data science in 
an organization, gain insights into the role of the CIO and explore the tension between securing 
data and encouraging rapid innovation, among other topics. 
http://www.oreilly.com 


Learning 
to Love 

Data Science 

Exploring Predictive Analytics. 
Machme Learning. Digital Manufacturing, 
and Supply Cham Optimization 


1 , . 

• •• • ••• < 

v •::: 


A'" 


Scott Stawski’s Inflection Point 
(Pearson FT Press) 

If you can't beat megatrends, join 'em. Such is the advice from Scott 
Stawski, author of the new book Inflection Point: How the Convergence 
of Cloud, Mobility, Apps, and Data Will Shape the Future of Business. As 
the executive lead for HP's largest and most strategic global accounts, Stawski enjoys an 
enviable perch from which to appraise the most influential trends in IT. Today a hurricane 
is forming, says Stawski, and businesses are headed straight into it. As the full title implies, 
the enormous disrupters in IT—in cloud, mobility, apps and data—are going to disrupt, and 
those who can harness the fierce winds of change will have them at their back and cruise 
toward greater competitiveness and customer value. Stawski illuminates how to go beyond 
inadequate incremental improvements to reduce IT spending dramatically and virtually 
eliminate IT capital expenditures. One meaningful step at a time, readers learn how to 
transform Operational IT into both a utility and a true business enabler, bringing new speed, 
flexibility and focus to what really matters: true core competencies. 
http://www. i nf orm it.com 


48 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Take your Android development 
skills to the next level! 


AnDevCon 

The Android Developer Conference 

Dec. 1-3,2015 

Hyatt Regency Santa Clara 


Get the best Android 
developer training anywhere! 

• Choose from more than 75 classes and in-depth tutorials 

• Meet Google and Google Development Experts 

• Network with speakers and other Android developers 

• Check out more than 50 third-party vendors 

• Women in Android Luncheon 

• Panels and keynotes 

• Receptions, ice cream, prizes and more (plus lots of coffee!) 


Whether you're an enterprise developer, work for a 
commercial software company, or are driving your own startup, 
if you want to build Android apps, you need to attend AnDevCon! 


AnDevCon™ is a trademark of BZ Media LLC. Android™ is a trademark of Google Inc. Google’s Android Robot is used under terms of the Creative Commons 3.0 Attribution License. 


ABZ Media Event noiidD #AnDevCon 


NEW PRODUCTS 


r 


Introversion Software’s Prison Architect 

In one of its Alpha videos, the lead developer of the game Prison 
Architect quipped: "since this is Introversion Software that we're talking about, we're likely 
to be in Alpha for quite some time." That's no exaggeration. Since 2012, Linux Journal 
received 36 monthly Alpha updates to the multi-platform game. In its 36th Alpha video. 
Introversion Software at last officially announced the full release of Prison Architect, a sim 
game in which users build and manage a maximum-security penitentiary facility. In the 
game, mere mortals must confront real-world challenges, such as guards under attack, 
prison breaks, fires in the mess hall, chaplain management and much more. Introversion 
takes pride in its independence from other game developers and promises a better game 
experience as a result. In addition to downloading Prison Architect for Linux, Windows or 
Mac OS, one also can become immortalized in the game as a prisoner. Sadly, the options to 
digital-immorto-criminalize your face or design one of the wardens are both sold out. 
http://www.prison-architect.com 


Sensoray’s Model 2224 
HD/SD-SDI Audio/Video Encoder 

Video capturing and processing is what Sensoray's new Model 2224 
HD/SD-SDI Audio/Video H.264 Encoder was built to do. The encoder's 
single SDI input supports a wide range of video resolutions—that is, 1080p, 1080i, 720p and 
NTSC/PAL. The Model 2224, featuring a USB 2.0 connection to its host CPU, offers excellent 
quality encoding in a convenient small form factor, says Sensoray. The Model 2224 encoder 
outputs H.264 High Profile Level 4 for HD and Main Profile Level 3 for SD, multiplexed in 
MPEG-TS (transport stream) format. The board's versatile overlay generators, integral HD/SD 
raw frame grabber and live preview stream make it ideally suited for a wide range of video 
processing applications, including High Profile DVRs, NVRs and stream servers. Furthermore, 
the encoder is Blu-Ray-compatible and allows for full-screen 16-bit color text/graphics overlay 
with transparency. The board can send an uncompressed, down-scaled video stream over USB, 
offering users low-latency live video previewing on the host computer with minimal CPU usage, 
h tt p ://www. se n so r ay. co m 


r i 

Please send information about releases of Linux-related products to newproducts@linuxjournal.com or 
New Products c/o Linux Journal, PO Box 980985, Houston, TX 77098. Submissions are edited for length and content. 

L._ 


50 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Puppet 

Application 

Orchestration 

Automate Your Entire Infrastructure 


Current Balanci 


$ 2015 03 


ACCOUNT MANAGEMENT 


Spending Report: 


ONLINE BILL-PAY 


VIEW TRANSACTION HISTORY 


Reduce the complexity of managing applications - on premise, 
in the cloud, on bare metal or in containers. 

• Model distributed application infrastructure 

• Coordinate ordered deployment of configurations 

• Control the state of your machines all in one place 


Learn more at puppetlabs.com 


AP U PR& 


FEATURE Managing Linux Using Puppet 


Managing 

Linux 

Using 

Puppet 

Manage a fleet of servers in 
a way that’s documented, 
scalable and fun with Puppet. 

DAVID BARTON 


52 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


A t some point, you probably 
have installed or configured 
a piece of software on a 
server or desktop PC. Since you read 
Linux Journal, you've probably done 
a lot of this, as well as developed 
a range of glue shell scripts, Perl 
snippets and cron jobs. 

Unless you are more disciplined 
than I was, every server has a 
unique, hand-crafted version of 
those config files and scripts. It 
might be as simple as a backup 
monitor script, but each still needs 
to be managed and installed. 

Installing a new server usually 
involves copying over config files 
and glue scripts from another 
server until things "work". Subtle 
problems may persist if a particular 
condition appears infrequently. Any 
improvement is usually made on an ad 
hoc basis to a specific machine, and 
there is no way to apply improvements 
to all servers or desktops easily. 

Finally, in typical scenarios, all the 
learning and knowledge invested 
in these scripts and configuration 
files are scattered throughout the 
filesystem on each Linux system. 

This means there is no easy way to 
know how any piece of software has 
been customized. 

If you have installed a server 
and come back to it three years 


later wondering what you did, or 
manage a group of desktops or a 
private cloud of virtual machines, 
configuration management and 
Puppet can help simplify your life. 

Enter Configuration Management 

Configuration management is a 
solution to this problem. A complete 
solution provides a centralized 
repository that defines and documents 
how things are done that can be 
applied to any system easily and 
reproducibly. Improvements simply can 
be rolled out to systems as required. 
The result is that a large number 
of servers can be managed by one 
administrator with ease. 

Puppet 

Many different configuration 
management tools for Linux (and 
other platforms) exist. Puppet is one 
of the most popular and the one I 
cover in this article. Similar tools 
include Chef, Ansible and Salt as 
well as many others. Although they 
differ in the specifics, the general 
objectives are the same. 

Puppet's underlying philosophy 
is that you tell it what you want as 
an end result (required state), not 
how you want it done (the procedure), 
using Puppet's programming 
language. For example, you might 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 53 


FEATURE Managing Linux Using Puppet 


say "I want ssh key XYZ to be able 
to log in to user account too." 

You wouldn't say "cat this string to 
/home/foo/.ssh/authorized_keys." In 
fact, the simple procedure I defined 
isn't even close to being reliable or 
correct, as the .ssh directory may 
not exist, the permissions could be 
wrong and many other things. 

You declare your requirements 
using Puppet's language in files called 
manifests with the suffix .pp. Your 
manifest states the requirements for a 
machine (virtual or real) using Puppet's 
built-in modules or your own custom 
modules, which also are stored in 
manifest files. Puppet is driven from 
this collection of manifests much like 
a program is built from code. When 
the puppet apply command is run, 
Puppet will compile the program, 
determine the difference in the 
machine's state from the desired 
state, and then make any changes 
necessary to bring the machine in 
line with the requirements. 

This approach means that if you run 
puppet apply on a machine that is 
up to date with the current manifests, 
nothing should happen, as there are 
no changes to make. 

Overview of the Approach 

Puppet is a tool (actually a whole 
suite of tools) that includes the 


Puppet execution program, the Puppet 
master, the Puppet database and the 
Puppet system information utility. 
There are many different ways to use 
it that suit different environments. 

In this article, I explain the basics 
of Puppet and the way we use it to 
manage our servers and desktops, 
in a simplified form. I use the term 
"machine" to refer to desktops, 
virtual machines and hypervisor hosts. 

The approach I outline here works 
well for 1-100 machines that are fairly 
similar but differ in various ways. If 
you are managing a cloud of 1,000 
virtual servers that are identical or 
differ in very predictable ways, this 
approach is not optimized for that 
case (and you should write an article 
for the next issue of Linux Journal). 

This approach is based around the 
ideas outlined in the excellent book 
Puppet 3 Beginners Guide by John 
Arundel. The basic idea is this: 

■ Store your Puppet manifests in 
git. This provides a great way 
to manage, track and distribute 
changes. We also use it as the 
way servers get their manifests 
(we don't use a Puppet master). 

You easily could use Subversion, 
Mercurial or any other SCM. 

■ Use a separate git branch for 


54 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


each machine so that machines 
are stable. 

■ Each machine then periodically 
polls the git repository and 
runs puppet apply if there 
are any changes. 

■ There is a manifest file for 
each machine that defines the 
desired state. 

Setting Up the Machine 

For the purposes of this article. I'm 
using the example of configuring 
developers' desktops. The example 
desktop machine is a clean Ubuntu 
12.04 with the hostname puppet-test; 
however, any version of Linux 
should work with almost no 
differences. I will be working using 
an empty git repository on a private 
git server. If you are going to use 
GitHub for this, do not put any 


sensitive information in there, in 
particular keys or passwords. 

Puppet is installed on the target 
machine using the commands shown 
in Listing 1. The install simply sets 
up the Puppet Labs repository and 
installs git and Puppet. Notice 
that I have used specific versions of 
puppet-common and the puppetlabs/apt 
module. Unfortunately, I have found 
Puppet tends to break previously 
valid code and its own modules even 
with minor upgrades. For this reason, 
all my machines are locked to specific 
versions, and upgrades are done in 
a controlled way. 

Now Puppet is installed, so let's do 
something with it. 

Getting Started 

I usually edit the manifests on my 
desktop and then commit them to 
git and push to the origin repository. 

I have uploaded my repository to 


Listing 1. Installing Puppet 

wget https://apt.puppetlabs.com/puppetlabs-release-precise.deb 
dpkg -i puppetlabs-release-precise.deb 
apt-get update 

apt-get install -y man git puppet-common=3.7.3-lpuppetlabsl 
puppet module install puppetlabs/apt --version 1.8.0 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 55 


FEATURE Managing Linux Using Puppet 


GitHub as an easy reference at 
https://github.com/davidbartonau/ 
linuxjournal-puppet, which you may 
wish to copy, fork and so on. 

In your git repository, create the file 
manifests/puppet-test.pp, as shown in 
Listing 2. This file illustrates a few points: 

■ The name of the file matches 
the hostname. This is not a 
requirement; it just helps to 
organize your manifests. 

■ It imports the apt package, which 
is a module that allows you to 
manipulate installed software. 

■ The top-level item is "node", 
which means it defines the state 
of a server(s). 

■ The node name is "puppet-test", 
which matches the server name. 
This is how Puppet determines to 


Listing 2. manifests/puppet-test.pp 

include apt 

node 'puppet-test' { 
package { 'vim': 

ensure => ’present' 

} 

package { ’emacs 1 : 

ensure => ’absent’ 

} 


apply this specific node. 

■ The manifest declares that it wants 
the vim package installed and the 
emacs package absent. Let the 
flame wars commence! 


Listing 3. Cloning and Running the Repository 

git clone git@gitserver:Puppet-Linuxjournal.git 
Wetc/puppet/li nuxjournal 
puppet apply /etc/puppet/linuxjournal/manifests 
**•- - module path=/etc/ puppet/1 i nuxjournal/ 
^••modules / : / etc/puppet/modules/ 


56 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Now you can use this Puppet 
configuration on the machine 
itself. If you ssh in to the machine 
(you may need ssh -A agent 
forwarding so you can authenticate 
to git), you can run the commands 
from Listing 3, replacing gitserver 
with your own. 

This code clones the git repository 
into /etc/puppet/linuxjournal 
and then runs puppet 
apply using the custom 
manifests directory. The 
puppet apply command 
looks for a node with a 
matching name and then 
attempts to make the 
machine's state match 
what has been specified 
in that node. In this case, 
that means installing vim, 
if it isn't already, and 
removing emacs. 


like this for the sake of this 
example). Note how the variable 
is preceded by $. Also the variable 
is substituted into strings quoted 
using "but not with" in the same 
way as bash. 

Let's apply the new change on the 
desktop by pulling the changes and 
re-running puppet apply as per 


Listing 4. /manifests/puppet-test.pp 

include apt 

node 1 puppet-test 1 { 

Sdeveloper = ’david’ 

package { 1 vim 1 : 

ensure => ’present 

} 


Creating Users 

It would be nice to create 
the developer user, so 
you can set up that 
configuration. Listing 
4 shows an updated 
puppet-test.pp that 
creates a user as per 
the developer variable 
(this is not a good way 
to do it, but it's done 


package { 1 emacs’: 

ensure => ’absent’ 

} 

user { "Sdeveloper": 
ensure => present, 
comment => "Developer $developer", 
shell => ’ /bin/bash’, 
managehome => true, 

} 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 57 


FEATURE Managing Linux Using Puppet 


Listing 5. Re-running Puppet 

cd /etc/puppet/linuxjournal 
git pull 

puppet apply /etc/puppet/linuxjournal/manifests 
**•- - module path=/etc/ puppet/1 i nuxjournal/ 
^••modules / : / etc/puppet/modules/ 


Listing 6. /modules/developer_pc/manifests/init.pp 

class developer_pc ($developer) { 
user { "$developer": 
ensure => present, 
comment => "Developer $developer", 
shell => '/bin/bash' , 
managehome => true, 

} 

} 


Listing 5. You now should have a 
new user created. 

Creating Modules 

Putting all this code inside the node 
isn't very reusable. Let's move the user 
into a developer_pc module and 
call that from your node. To do this, 
create the file modules/developer_pc/ 
manifests/init.pp in the git repository 
as per Listing 6. This creates a new 


module called developer_pc that 
accepts a parameter called developer 
name and uses it to define the user. 

You then can use the module in 
your node as demonstrated in Listing 
7. Note how you pass the developer 
parameter, which is then accessible 
inside the module. 

Apply the changes again, and there 
shouldn't be any change. All you have 
done is refactored the code. 


58 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Listing 7. /manifests/puppet-test.pp 

node 'puppet-test' { 
package { 'vim': 

ensure => 'present' 

} 

package { 'emacs': 

ensure => 'absent' 

} 

class { 'developer_pc': developer => 'david 

} 


Listing 8. /modules/developerjDc/files/vimrc 

# Managed by puppet in developer_pc 
set nowrap 


Creating Static Files 

Say you would like 
to standardize your 
vim config for all the 
developers and stop word 
wrapping by setting up 
their .vimrc file. To do this 
in Puppet, you create the 
file you want to use in 
/modules/developer_pc/ 
files/vimrc as per Listing 
8, and then add a file 
} resource in /modules/ 

developer_pc/manifests/ 
init.pp as per Listing 9. The 
file resource can be placed 
immediately below the 
user resource. 

The file resource 
defines a file /home/ 
Sdeveloper/.vimrc, which 
will be set from the vimrc 
file you created just 
before. You also set the 


Listing 9. /modules/developer_pc/manifests/init.pp 

file { "/home/$developer/.vimrc": 

source => "puppet:///modules/developer_pc/vimrc", 

owner => "Sdeveloper", 

group => "Sdeveloper", 

require => [ User["Sdeveloper"] ] 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 59 


FEATURE Managing Linux Using Puppet 


owner and group on the file, since 
Puppet typically is run as root. 

The requi re clause on the file 
takes an array of resources and 
states that those resources must be 
processed before this file is processed 
(note the uppercase first letter; this 
is how Puppet refers to resources 
rather than declaring them). This 
dependency allows you to stop Puppet 
from trying to create the .vimrc file 
before the user has been created. 
When resources are adjacent, like the 
user and the file, they also can be 
"chained" using the -> operator. 

Apply the changes again, and 
you now can expect to see your 
custom .vimrc set up. If you run 
puppet apply later, if the source 
vimrc file hasn't changed, the .vimrc 
file won't change either, including 
the modification date. If one of the 
developers changes .vimrc, the next 
time puppet apply is run, it will be 
reverted to the version in Puppet. 

A little later, say one of the developers 
asks if they can ignore case as well in 


vim when searching. You easily can 
roll this out to all the desktops. Simply 
change the vimrc file to include set 
ignorecase, commit and run puppet 
apply on each machine. 

Creating Dynamically Generated Files 

Often you will want to create files 
where the content is dynamic. Puppet 
has support for .erb templates, which 
are templates containing snippets of 
Ruby code similar to jsp or php files. 
The code has access to all of the 
variables in Puppet, with a slightly 
different syntax. 

As an example, our build process uses 
$ HO M E/Projects/override. properties, 
which is a file that contains the name of 
the build root. This is typically just the 
user's home directory. You can set this 
up in Puppet using an .erb template as 
shown in Listing 10. The erb template 
is very similar to the static file, except 
it needs to be in the template folder, 
and it uses <%= %> for expressions, 

<% %> for code, and variables are 

referred to with the @ prefix. 


Listing 10. /modules/developer_pc/templates/override.properties.erb 

# Managed by Puppet 

dir.home=/home/<%= @developer %>/ 


60 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Listing 11. /modules/developer_pc/manifests/init.pp 


file { "/home/$developer/Projects": 
ensure => 'directory', 
owner => "$developer", 
group => "$developer", 
require => [ User["$developer"] ] 

} 


file { "/home/$developer/Proiects/override.properties": 


content => template('developer 
owner => "$developer", 
group => "$developer", 

} 

You use the .erb template by 
adding the rules shown in Listing 11. 
First, you have to ensure that there 
is a Projects directory, and then you 
require the override.properties file 
itself. The -> operator is used to 
ensure that you create the directory 
first and then the file. 

Running Puppet Automatically 

Running Puppet each time you want 
to make a change doesn't work well 
beyond a handful of machines. To 
solve this, you can have each machine 
automatically check git for changes 
and then run puppet apply (you can 


pc/override.properties.erb’), 


do this only if git has changed, but 
that is an optional). 

Next, you will define a file called 
puppetApply.sh that does what you 
want and then set up a cron job to 
call it every ten minutes. This is done 
in a new module called puppet_apply 
in three steps: 

■ Create your puppetApply.sh template 
in modules/puppet_apply/files/ 
puppetApply.sh as per Listing 12. 

■ Create the puppetApply.sh file and 
set up the crontab entry as shown 
in Listing 13. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 61 


FEATURE Managing Linux Using Puppet 


Listing 12. /modules/puppet_apply/files/puppetApply.sh 

# Managed by Puppet 

cd /etc/puppet/linuxjournal 
git pull 

puppet apply /etc/puppet/linuxjournal/manifests 
**•- - module path=/etc/ puppet/1 i nuxj ournal/modules/ 
*+■: / etc/puppet/modules/ 


Listing 13. /modules/puppet_apply/manifests/init.pp 

class puppet_apply () { 

file { "/usr/local/bin/puppetApply.sh": 

source => "puppet:///modules/puppet_apply/puppetApply.sh", 
mode => 'u=wrx,g=r,o=r' 


} 


- > 


cron { "run-puppetApply": 
ensure => 'present' , 

command => "/usr/local/bin/puppetApply.sh > 
Wtmp/puppetApply. log 2>&1", 
mi nute => ' *710 ' , 


} 


} 


■ Use your puppet_apply module 
from your node in puppet-test.pp 
as per Listing 14. 


You will need to ensure that the 
server has read access to the git 
repository. You can do this using 


62 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Listing 14. /manifests/puppet-test.pp 

class { 'puppet_apply': ; } 

an SSH key distributed via Puppet 
and an IdentityFile entry 
in /root/.ssh/config. 

If you apply changes now, you 
should see that there is an entry 
in root's crontab, and every ten 
minutes puppetApply.sh should 
run. Now you simply can commit 
your changes to git, and within ten 


minutes, they will be rolled out. 

Modifying Config Files 

Many times you don't want to replace 
a config file, but rather ensure that 
certain options are set to certain 
values. For example, I may want to 
change the SSH port from the default 
of 22 to 2022 and disallow password 
logins. Rather than manage the entire 
config file with Puppet, I can use 
the augeas resource to set multiple 
configuration options. 

Refer to Listing 1 5 for some 
code that can be added to the 


Listing 15. /modules/developer_pc/manifests/init.pp 

package { 1 openssh-server ’ : 
ensure => 'present' 

} 

service { 'ssh ' : 

ensure => running, 

require => [ Package["openssh-server"] ] 

} 

augeas { 'change-sshd ' : 

context => '/files/etc/ssh/sshd_config', 

changes => ['set Port 2022', 'set PasswordAuthentication no’], 

notify => Service[’ssh 1 ] , 

require => [ Package["openssh-server"] ] 

} 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 63 


FEATURE Managing Linux Using Puppet 


When defining rules in Puppet, it is important to 
keep in mind that removing a rule for a resource is 
not the same as a rule that removes that resource. 


developer_pc class you created 
earlier. The code does three things: 

■ Installs openssh-server 

(not really required, but there 
for completeness). 

■ Ensures that SSH is running 
as a service. 

■ Sets Port 2022 and 
PasswordAuthentication 
no in /etc/ssh/sshd_config. 

■ If the file changes, the notify 
clause causes SSH to reload 
the configuration. 

Once puppetApply.sh automatically 
runs, any subsequent SSH sessions 
will need to connect on port 2022, 
and you no longer will be able to 
use a password. 

Removing Rules 

When defining rules in Puppet, it 
is important to keep in mind that 
removing a rule for a resource is 
not the same as a rule that removes 


that resource. For example, suppose 
you have a rule that creates an 
authorized SSH key for "developerA". 
Later, "developerA" leaves, so you 
remove the rule defining the key. 
Unfortunately, this does not remove 
the entry from author i zed_keys. 

In most cases, the state defined in 
Puppet resources is not considered 
definitive; changes outside Puppet 
are allowed. So once the rule for 
developerA's key has been removed, 
there is no way to know if it simply 
was added manually or if Puppet 
should remove it. 

In this case, you can use the 
ensure => 'absent 1 rule to ensure 
packages, files, directories, users 
and so on are deleted. The original 
Listing 1 showed an example of this 
to remove the emacs package. There 
is a definite difference between 
ensuring that emacs is absent versus 
no rule declaration. 

At our office, when a developer or 
administrator leaves, we replace their 
SSH key with an invalid key, which 
then immediately updates every entry 
for that developer. 


64 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Existing Modules 

Many modules are listed on Puppet 
Forge covering almost every 
imaginable problem. Some are really 
good, and others are less so. It's 
always worth searching to see if 
there is something good and then 
making a decision as to whether 
it's better to define your own 
module or reuse an existing one. 

Managing Git 

We don't keep all of our machines 
sitting on the master branch. We 
use a modified gitflow approach to 
manage our repository. Each server 
has its own branch, and most of them 
point at master. A few are on the 
bleeding edge of the develop branch. 
Periodically, we roll a new release 
from develop into master and then 
move each machine's branch forward 
from the old release to the new one. 
Keeping separate branches for each 
server gives flexibility to hold specific 
servers back and ensures that changes 
aren't rolled out to servers in an ad 
hoc fashion. 

We use scripts to manage all our 
branches and fast-forward them 
to new releases. With roughly 100 
machines, it works for us. On a larger 
scale, separate branches for each 
server probably is impractical. 

Using a single repository shared 


with all servers isn't ideal. Storing 
sensitive information encrypted in 
Hiera is a good idea. There was an 
excellent Linux Journal article covering 
this: "Using Hiera with Puppet" by 
Scott Lackey in the March 2015 issue. 

As your number of machines 
grows, using a single git repository 
could become a problem. The main 
problem for us is there is a lot of 
"commit noise" between reusable 
modules versus machine-specific 
configurations. Second, you may 
not want all your admins to be able 


LINUX JOURNAL 

for iPad and iPhone 


BUILD 

| Vehicle 
onitoring and 
• jntrol System 


REATE 

Safe to 
tore Your 
;nsitive Data 


COOL PROJECTS 


Understanding 
Linux Permissions 


id SMS 

locations 

four 

art Watch 


Working 
with Django 
Models and 
Migrations 


mm 


COOL PROJECTS 


HOW TO: 

^>9 Home 
Automation 
with 
Raspberry Pi 


Available on the 

App Store 


http://www.linuxjournal.com/ios 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 65 


FEATURE Managing Linux Using Puppet 


to edit all the modules or machine 
manifests, or you may not want 
all manifests rolled out to each 
machine. Our solution is to use 
multiple repositories, one for generic 
modules, one for machine-/customer- 
specific configuration and one for 
global information. This keeps our 
core modules separated and under 
proper release management while also 
allowing us to release critical global 
changes easily. 

Scaling Up/Trade-offs 

The approach outlined in this article 
works well for us. I hope it works for 
you as well; however, you may want 
to consider some additional points. 

As our servers differ in ways that 
are not consistent, using Facter or 
metadata to drive configuration isn't 
suitable for us. However, if you have 
100 Web servers, using the hostname 
of nginx-prod-099 to determine the 
install requirements would save a 
lot of time. 

A lot of people use the Puppet 
master to roll out and push changes, 
and this is the general approach 
referred to in a lot of tutorials 
on-line. You can combine this with 
PuppetDB to share information from 
one machine to another machine—for 
example, the public key of one server 
can be shared to another server. 


Conclusion 

This article has barely scratched the 
surface of what can be done using 
Puppet. Virtually everything about 
your machines can be managed using 
the various Puppet built-in resources 
or modules. After using it for a short 
while, you'll experience the ease of 
building a second server with a few 
commands or of rolling out a change 
to many servers in minutes. 

Once you can make changes across 
servers so easily, it becomes much 
more rewarding to build things as well 
as possible. For example, monitoring 
your cron jobs and backups can take 
a lot more work than the actual 
task itself, but with configuration 
management, you can build a reusable 
module and then use it for everything. 

For me. Puppet has transformed 
system administration from a chore 
into a rewarding activity because of the 
huge leverage you get. Give it a go; 
once you do, you'll never go back!* 


David Barton is the Managing Director of OnelT, a company 
specializing in custom business software development. David 
has been using Linux since 1998 and managing the company’s 
Linux servers for more than ten years. 

Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


66 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Where every interaction matters. 


break down 

your innovation barriers 


When you’re presented with new opportunities, you want to focus on turning 
them into successes, not whether your IT solution can support them. 


Peer 1 Hosting powers your business with our wholly owned FastFiber Network™, 
global footprint, and offers professionally managed public and private cloud 
solutions that are secure, scalable, and customized for your business. 

Unsurpassed performance and reliability help build your business foundation to 
be rock-solid, ready for high growth, and deliver the fast user experience your 
customers expect. 


Want more on cloud? 

Call: 844.855.6655 | go.peerl.com/linux | Vew Cloud Webinar: 


Public and Private Cloud I Managed Hosting | Dedicated Hosting | Colocation 


Image: © Can Stock Photo Inc. / bigbro 


S erver hardening. The very 

words conjure up images of 
tempering soft steel into an 
unbreakable blade, or taking soft 
clay and firing it in a kiln, producing 
a hardened vessel that will last many 
years. Indeed, server hardening is very 
much like that. Putting an unprotected 
server out on the Internet is like 
putting chum in the ocean water 
you are swimming in—it won't be 
long and you'll have a lot of excited 
sharks circling you, and the outcome 
is unlikely to be good. Everyone 
knows it, but sometimes under the 
pressure of deadlines, not to mention 
the inevitable push from the business 
interests to prioritize those things with 
more immediate visibility and that add 
to the bottom line, it can be difficult 
to keep up with even what threats 
you need to mitigate, much less the 
best techniques to use to do so. This 
is how corners get cut—corners that 
increase our risk of catastrophe. 

This isn't entirely inexcusable. 

A sysadmin must necessarily be a 
jack of all trades, and security is 
only one responsibility that must be 
considered, and not the one most 
likely to cause immediate pain. Even 
in organizations that have dedicated 
security staff, those parts of the 
organization dedicated to it often 
spend their time keeping up with 


the nitty gritty of the latest exploits 
and can't know the stack they are 
protecting as well as those who are 
knee deep in maintaining it. The 
more specialized and diversified the 
separate organizations, the more 
isolated each group becomes from the 
big picture. Without the big picture, 
sensible trade-offs between security 
and functionality are harder to make. 
Since a deep and thorough knowledge 
of the technology stack along with 
the business it serves is necessary to 
do a thorough job with security, it 
sometimes seems nearly hopeless. 

A truly comprehensive work on 
server hardening would be beyond the 
scope not only of a single article, but 
a single (very large) book, yet all is 
not lost. It is true that there can be no 
"one true hardening procedure" due 
to the many and varied environments, 
technologies and purposes to which 
those technologies are put, but it 
is also true that you can develop a 
methodology for governing those 
technologies and the processes that 
put the technology to use that can 
guide you toward a sane setup. You 
can boil down the essentials to a few 
principles that you then can apply 
across the board. In this article, I 
explore some examples of application. 

I also should say that server 
hardening, in itself, is almost a 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 69 


FEATURE Server Hardening 


useless endeavor if you are going to 
undercut yourself with lazy choices 
like passwords of "abc123" or lack 
a holistic approach to security in 
the environment. Insecure coding 
practices can mean that the one 
hole you open is gaping, and users 
e-mailing passwords can negate all 
your hard work. The human element 
is key, and that means fostering 
security consciousness at all steps of 
the process. Security that is bolted 
on instead of baked in will never be 
as complete or as easy to maintain, 
but when you don't have executive 
support for organizational standards, 
bolting it on may be the best you 
can do. You can sleep well though 
knowing that at least the Linux server 
for which you are responsible is in fact 
properly if not exhaustively secured. 

The single most important principle 
of server hardening is this: minimize 
your attack surface. The reason is 
simple and intuitive: a smaller target 
is harder to hit. Applying this principle 
across all facets of the server is 
essential. This begins with installing 
only the specific packages and 
software that are exactly necessary for 
the business purpose of the server and 
the minimal set of management and 
maintenance packages. Everything 
present must be vetted and trusted 
and maintained. Every line of code 


that can be run is another potential 
exploit on your system, and what is 
not installed can not be used against 
you. Every distribution and service of 
which I am aware has an option for 
a minimal install, and this is always 
where you should begin. 

The second most important principle 
is like it: secure that which must 
be exposed. This likewise spans the 
environment from physical access 
to the hardware, to encrypting 
everything that you can everywhere— 
at rest on the disk, on the network 
and everywhere in between. For the 
physical location of the server, locks, 
biometrics, access logs—all the tools 
you can bring to bear to controlling 
and recording who gains physical 
access to your server are good things, 
because physical access, an accessible 
BIOS and a bootable USB drive are just 
one combination that can mean that 
your server might as well have grown 
legs and walked away with all your 
data on it. Rogue, hidden wireless 
SSIDs broadcast from a USB device 
can exist for some time before being 
stumbled upon. 

For the purposes of this article 
though, I'm going to make a few 
assumptions that will shrink the 
topics to cover a bit. Let's assume 
you are putting a new Linux-based 
server on a cloud service like AWS or 


70 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Rackspace. What do you need to do 
first? Since this is in someone else's 
data center, and you already have 
vetted the physical security practices 
of the provider (right?), you begin 
with your distribution of choice and a 
minimal install—just enough to boot 
and start SSH so you can access your 
shiny new server. 

Within the parameters of this 
example scenario, there are levels 
of concern that differ depending on 
the purpose of the server, ranging 
from "this is a toy I'm playing with, 
and I don't care what happens to 
it" all the way to "governments will 
topple and masses of people die 
if this information is leaked", and 
although a different level of paranoia 
and effort needs to be applied in 
each case, the principles remain the 
same. Even if you don't care what 
ultimately happens to the server, you 
still don't want it joining a botnet and 
contributing to Internet Mayhem. If 
you don't care, you are bad and you 
should feel bad. If you are setting up 
a server for the latter purpose, you 
are probably more expert than myself 
and have no reason to be reading this 
article, so let's split the difference 
and assume that should your server 
be cracked, embarrassment, brand 
damage and loss of revenue (along 
with your job) will ensue. 


In any of these cases, the very first 
thing to do is tighten your network 
access. If the hosting provider 
provides a mechanism for this, like 
Amazon's "Zones", use it, but don't 
stop there. Underneath securing what 
must be exposed is another principle: 
layers within layers containing hurdle 
after hurdle. Increase the effort 
required to reach the final destination, 
and you reduce the number that are 
willing and able to reach it. Zones, 
or network firewalls, can fail due to 
bugs, mistakes and who knows what 
factors that could come into play. 
Maximizing redundancy and backup 
systems in the case of failure is a good 
in itself. All of the most celebrated 
data thefts have happened when 
not just some but all of the advice 
contained in this article was ignored, 
and if only one hurdle had required 
some effort to surmount, it is likely 
that those responsible would have 
moved on to someone else with lower 
hanging fruit. Don't be the lower 
hanging fruit. You don't always have 
to outrun the bear. 

The first principle, that which is 
not present (installed or running) can 
not be used against you, requires 
that you ensure you've both closed 
down and turned off all unnecessary 
services and ports in all runlevels 
and made them inaccessible via 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 71 


FEATURE Server Hardening 


your server's firewall, in addition to 
whatever other firewalling you are 
doing on the network. This can be 
done via your distribution's tools or 
simply by editing filenames in 
/etc/rcX.d directories. If you aren't 
sure if you need something, turn it 
off, reboot, and see what breaks. 

But, before doing the above, make 
sure you have an emergency console 
back door first! This won't be the last 
time you need it. When just beginning 
to tinker with securing a server, it is 
likely you will lock yourself out more 
than once. If your provider doesn't 
provide a console that works when 
the network is inaccessible, the next 
best thing is to take an image and roll 
back if the server goes dark. 

I suggest first doing two things: 
running ps -ef and making sure you 
understand what all running processes 
are doing, and 1 sof -ni | grep 
LISTEN to make sure you understand 
why all the listening ports are open, 
and that the process you expect has 
opened them. 

For instance, on one of my 
servers running WordPress, the 
results are these: 

# ps -ef | grep -v \] | wc -1 
39 

I won't list out all of my process 


names, but after pulling out all the 
kernel processes, I have 39 other 
processes running, and I know exactly 
what all of them are and why they are 
running. Next I examine: 


# 1sof ■ 

-ni | grep 

LISTEN 


mysqld 

1638 


mysql 

10U 

IPv4 

10579 

0t0 

TCP 


127.0.0. 

. 1:mysql 

(LISTEN) 


sshd 

1952 


root 

3u 

IPv4 

11571 

0t0 

TCP 

* : ssh (LISTEN) 

sshd 

1952 


root 

4u 

IPv6 

11573 

0t0 

TCP 

* : ssh (LISTEN) 

ngi nx 

2319 


root 

7u 

IPv4 

12400 

0t0 

TCP 

* : http 

(LISTEN) 

ngi nx 

2319 


root 

8u 

IPv4 

12401 

0t0 

TCP 

*:https 

(LISTEN) 

ngi nx 

2319 


root 

9u 

IPv6 

12402 

0t0 

TCP 

*:http 

(LISTEN) 

ngi nx 

2320 

WWW 

-data 

7u 

IPv4 

12400 

0t0 

TCP 

*: http 

(LISTEN) 

ngi nx 

2320 

WWW 

-data 

8u 

IPv4 

12401 

0t0 

TCP 

*:https 

(LISTEN) 

ngi nx 

2320 

WWW 

-data 

9u 

IPv6 

12402 

0t0 

TCP 

*: http 

(LISTEN) 


This is exactly as I expect, and 
it's the minimal set of ports 
necessary for the purpose of the 
server (to run WordPress). 

Now, to make sure only the 
necessary ports are open, you need 
to tune your firewall. Most hosting 
providers, if you use one of their 
templates, will by default have all 
rules set to "accept". This is bad. 

This defies the second principle: 
whatever must be exposed must 
be secured. If, by some accident of 
nature, some software opened a port 
you did not expect, you need to make 
sure it will be inaccessible. 

Every distribution has its tools for 


72 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


managing a firewall, and others are 
available in most package managers. 

I don't bother with them, as iptables 
(once you gain some familiarity 
with it) is fairly easy to understand 
and use, and it is the same on all 
systems. Like vi, you can expect its 
presence everywhere, so it pays to be 
able to use it. A basic firewall looks 
something like this: 

# make sure forwarding is off and clear everything 

# also turn off ipv6 cause if you don't need it 

# turn it off 

sysctl net.ipv6.conf.all.disable_ipv6=l 

sysctl net.ipv4.ip_forward=0 

iptables -F 

iptables --flush 

iptables -t nat --flush 

iptables -t mangle --flush 

iptables --delete-chain 

iptables -t nat --delete-chain 

iptables -t mangle --delete-chain 

#make the default -drop everything 
iptables --policy INPUT DROP 
iptables --policy OUTPUT ACCEPT 
iptables --policy FORWARD DROP 

#allow all in loopback 
iptables -A INPUT -i lo -j ACCEPT 

#allow related 


iptables -A INPUT -m state --state 
^ESTABLISHED,RELATED -j ACCEPT 

#allow ssh 

iptables -A INPUT -m tcp -p tcp --dport 22 -j ACCEPT 

You can get fancy, wrap this in a 
script, drop a file in /etc/rc.d, link 
it to the runlevels in /etc/rcX.d, and 
have it start right after networking, 
or it might be sufficient for your 
purposes to run it straight out of 
/etc/rc.local. Then you modify this 
file as requirements change. For 
instance, to allow ssh, http and 
https traffic, you can switch the last 
line above to this one: 

iptables -A INPUT -p tcp -m state --state NEW -m 
^multiport --dports ssh,http,https -j ACCEPT 

More specific rules are better. Let's 
say what you've built is an intranet 
server, and you know where your 
traffic will be coming from and on 
what interface. You instead could add 
something like this to the bottom of 
your iptables script: 

iptables -A INPUT -i eth0 -s 192.168.1.0/24 -p tcp 
**-m state --state NEW -m multiport --dports http,https 

There are a couple things to 
consider in this example that you 
might need to tweak. For one, 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 73 


FEATURE Server Hardening 


this allows all outbound traffic 
initiated from the server. Depending 
on your needs and paranoia 
level, you may not wish to do so. 
Setting outbound traffic to default 
deny will significantly complicate 
maintenance for things like security 
updates, so weigh that complication 
against your level of concern about 
rootkits communicating outbound 
to phone home. Should you go 
with default deny for outbound, 
iptables is an extremely powerful 
and flexible tool—you can control 
outbound communications based 
on parameters like process name 
and owning user ID, rate limit 
connections—almost anything you 
can think of—so if you have the 
time to experiment, you can control 
your network traffic with a very high 
degree of granularity. 

Second, I'm setting the default 
to DROP instead of REJECT. DROP 
is a bit of security by obscurity. It 
can discourage a script kiddie if his 
port scan takes too long, but since 
you have commonly scanned ports 
open, it will not deter a determined 
attacker, and it might complicate your 
own troubleshooting as you have 
to wait for the client-side timeout 
in the case you've blocked a port 
in iptables, either on purpose or by 
accident. Also, as I've detailed in 


a previous article in Linux Journal 
(http://www.linuxjournal.com/ 
content/back-dead-si mple-bash- 
complex-ddos), TCP-level rejects are 
very useful in high traffic situations to 
clear out the resources used to track 
connections statefully on the server 
and on network gear farther out. Your 
mileage may vary. 

Finally, your distribution's minimal 
install might not have sysctl installed 
or on by default. You'll need that, 
so make sure it is on and works. It 
makes inspecting and changing system 
values much easier, as most versions 
support tab auto-completion. You also 
might need to include full paths to the 
binaries (usually/sbin/iptables and 
/sbin/sysctl), depending on the base 
path variable of your particular system. 

All of the above probably should 
be finished within a few minutes of 
bringing up the server. I recommend 
not opening the ports for your 
application until after you've installed 
and configured the applications 
you are running on the server. So 
at the point when you have a new 
minimal server with only SSH open, 
you should apply all updates using 
your distribution's method. You can 
decide now if you want to do this 
manually on a schedule or set them 
to automatic, which your distribution 
probably has a mechanism to do. If 


74 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


not, a script dropped in cron.daily 
will do the trick. Sometimes updates 
break things, so evaluate carefully. 
Whether you do automatic updates 
or not, with the frequency with which 
critical flaws that sometimes require 
manual configuration changes are 
being uncovered right now, you need 
to monitor the appropriate lists and 
sites for critical security updates to 
your stack manually, and apply them 
as necessary. 

Once you've dealt with updates, you 
can move on and continue to evaluate 
your server against the two security 
principles of 1) minimal attack surface 
and 2) secure everything that must be 
exposed. At this point, you are pretty 
solid on point one. On point two, 
there is more you can yet do. 

The concept of hurdles requires 
that you not allow root to log in 
remotely. Gaining root should be 
at least a two-part process. This is 
easy enough; you simply set this line 
in /etc/ssh/sshd_config: 

PermitRootLogin no 

For that matter, root should not 
be able to log in directly at all. The 
account should have no password and 
should be accessible only via sudo— 
another hurdle to clear. 

If a user doesn't need to have 


remote login, don't allow it, or better 
said, allow only users that you know 
need remote access. This satisfies 
both principles. Use the AllowUsers 
and AllowGroups settings in /etc/ 
ssh/sshd_config to make sure you are 
allowing only the necessary users. 

You can set a password policy on 
your server to require a complex 
password for any and all users, but I 
believe it is generally a better idea to 
bypass crackable passwords altogether 
and use key-only login, and have the 
key require a complex passphrase. This 
raises the bar for cracking into your 
system, as it is virtually impossible to 
brute force an RSA key. The key could 
be physically stolen from your client 
system, which is why you need the 
complex passphrase. Without getting 
into a discussion of length or strength 
of key or passphrase, one way to 
create it is like this: 

ssh-keygen -t rsa 

Then when prompted, enter 
and re-enter the desired 
passphrase. Copy the public portion 
(id_rsa.pub or similar) into a file 
in the user's home directory called 
~/.ssh/authorized_keys, and then in 
a new terminal window, try logging 
in, and troubleshoot as necessary. 

I store the key and the passphrase 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 75 


FEATURE Server Hardening 


in a secure data vault provided by 
Personal, Inc. (https://personal.com), 
and this will allow me, even if away 
from home and away from my 
normal systems, to install the key 
and have the passphrase to unlock 
it, in case an emergency arises. 
(Disclaimer: Personal is the startup 
I work with currently.) 

Once it works, change this line in 
/etc/ssh/sshd_config: 

PasswordAuthentication no 

Now you can log in only with the 
key. I still recommend keeping a 
complex password for the users, so 
that when you sudo, you have that 
layer of protection as well. Now to 
take complete control of your server, 
an attacker needs your private key, 
your passphrase and your password 
on the server—hurdle after hurdle. 

In fact, in my company, we also use 
multi-factor authentication in addition 
to these other methods, so you must 
have the key, the passphrase, the 
pre-secured device that will receive 
the notification of the login request 
and the user's password. That is a 
pretty steep hill to climb. 

Encryption is a big part of 
keeping your server secure—encrypt 
everything that matters to you. 

Always be aware of how data, 


particularly authentication data, is 
stored and transmitted. Needless to 
say, you never should allow login or 
connections over an unencrypted 
channel like FTP, Telnet, rsh or other 
legacy protocols. These are huge no- 
nos that completely undo all the hard 
work you've put into securing your 
server. Anyone who can gain access to 
a switch nearby and perform reverse 
arp poisoning to mirror your traffic 
will own your servers. Always use 
sftp or scp for file transfers and ssh 
for secure shell access. Use https for 
logins to your applications, and never 
store passwords, only hashes. 

Even with strong encryption in 
use, in the recent past, many flaws 
have been found in widely used 
programs and protocols—get used 
to turning ciphers on and off in 
both OpenSSH and OpenSSL. I'm 
not covering Web servers here, but 
the lines of interest you would put 
in your /etc/ssh/sshd_config file 
would look something like this: 

Ciphers aesl28-ctr,aesl92-ctr,aes256-ctr,arcfour256,a refour128 
MACs hmac-shal,umac-64@openssh.com,hmac-ripemdl60 

Then you can add or remove as 
necessary. See man sshd_config 
for all the details. 

Depending on your level of 
paranoia and the purpose of your 


76 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


server, you might be tempted to 
stop here. I wouldn't. Get used to 
installing, using and tuning a few 
more security essentials, because 
these last few steps will make you 
exponentially more secure. I'm 
well into principle two now (secure 
everything that must be exposed), 
and I'm bordering on the third 
principle: assume that every measure 
will be defeated. There is definitely 
a point of diminishing returns with 
the third principle, where the change 
to the risk does not justify the 
additional time and effort, but where 
that point falls is something you and 
your organization have to decide. 

The fact of the matter is that 
even though you've locked down 
your authentication, there still exists 
the chance, however small, that a 
configuration mistake or an update 
is changing/breaking your config, or 
by blind luck an attacker could find a 
way into your system, or even that the 
system came with a backdoor. There 
are a few things you can do that will 
further protect you from those risks. 

Speaking of backdoors, everything 
from phones to the firmware of hard 
drives has backdoors pre-installed. 
Lenovo has been caught no less than 
three times pre-installing rootkits, 
and Sony rooted customer systems 
in a misguided attempt at DRM. A 


programming mistake in OpenSSL left 
a hole open that the NSA has been 
exploiting to defeat encryption for at 
least a decade without informing the 
community, and this was apparently 
only one of several. In the late 2000s, 
someone anonymously attempted 
to insert a two-line programming 
error into the Linux kernel that 
would cause a remote root exploit 
under certain conditions. So suffice 
it to say, I personally do not trust 
anything sourced from the NSA, 
and I turn SELinux off because I'm 
a fan of warrants and the fourth 
amendment. The instructions are 
generally available, but usually all 
you need to do is make this change 
to /etc/selinux/config: 

#SELINUX=enforcing # comment out 

SELINUX=disabled # turn it off, restart the system 

In the spirit of turning off and 
blocking what isn't needed, since 
most of the malicious traffic on 
the Internet comes from just a few 
sources, why do you need to give 
them a shot at cracking your servers? 

I run a short script that collects 
various blacklists of exploited servers 
in botnets, Chinese and Russian 
CIDR ranges and so on, and creates a 
blocklist from them, updating once a 
day. Back in the day, you couldn't do 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 77 


FEATURE Server Hardening 


this, as iptables gets bogged down 
matching more than a few thousand 
lines, so having a rule for every 
malicious IP out there just wasn't 
feasible. With the maturity of the 
ipset project, now it is. ipset uses a 
binary search algorithm that adds only 
one pass to the search each time the 
list doubles, so an arbitrarily large 
list can be searched efficiently for a 
match, although 1 believe there is a 
limit of 65k entries in the ipset table. 

To make use of it, add this at the 

"http://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=l.1.1.1" 

*►# TOR Exit Nodes 

"http://www.maxmind.com/en/anonymous_proxies" # MaxMind GeoIP 

^Anonymous Proxies 

"http://danger.rulez.sk/projects/bruteforceblocker/blist.php" 

**# BruteForceBlocker IP List 

"http://rules.emergingthreats.net/blockrules/rbn-ips.txt" 

**# Emerging Threats - Russian Business Networks List 

"http://www.spamhaus.org/drop/drop.lasso" # Spamhaus Dont Route 

**0r Peer List (DROP) 

"http://cinsscore.com/list/ci-badguys.txt" # C.I. Army Malicious 

**IP List 

"http://www.openbl.org/lists/base.txt" # OpenBLOCK.org 30 day List 

bottom of your iptables script: 

"http://www.autoshun.org/files/shunlist.csv" # Autoshun Shun List 

"http://lists.blocklist.de/lists/all.txt" # blocklist.de attackers 


#create iptables blocklist rule and ipset hash 


ipset create blocklist hash:net 


iptables -I INPUT 1 -m set --match-set blocklist 

cd $TMP_DIR 

**src -j DROP 

# This gets the various lists 

Then put this somewhere executable 
and run it out of cron once a day: 

for i in "${BLOCKLISTS[@]}" 

do 

curl "$i" > $IP_TMP 

grep -Po '(?:\d{l,3}\.){3}\d{l,3}(?:/\d{l,2})? 1 $IP_TMP » 

#!/bin/bash 

$ IP_B LOC KLIST_TM P 

done 

PATH=$PATH:/sbin 

for i in 'echo $1ist'; do 

WD='pwd' 

# This section gets wizcrafts lists 

TMP_DIR=$WD/tmp 

wget --quiet http://www.wizcrafts.net/$i-iptables-blocklist.html 

IP_TMP=$TMP_DIR/ip.temp 

# Grep out all but ip blocks 

IP_BLOCKLIST=$WD/ip-blocklist.conf 

cat $i-iptables-blocklist.html | grep -v \< | grep -v \: | 

IP_BLOCKLIST_TMP=$TMP_DIR/ip-blocklist.temp 

^grep -v \; | grep -v \# | grep [0-9] > $i.txt 

list="chinese nigerian russian lacnic exploited-servers" 

# Consolidate blocks into master list 

BLOCKLISTS=( 

cat $i.txt » $IP_B LOC KLIST_TM P 

"http://www.projecthoneypot.org/list_of_ips.php?t=d&rss=l" # Project 

done 

^Honey Pot Directory of Dictionary Attacker IPs 


78 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


sort $IP_BLOCKLIST_TMP -n | uniq > $IP_BLOCKLIST 
rm $IP_BLOCKLIST_TM P 
wc -1 $IP_BLOCKLIST 

ipset flush blocklist 

egrep -v " A #| A $" $IP_BLOCKLIST | while IFS= read -r ip 
do 

ipset add blocklist $ip 

done 

#cleanup 

rm -fR $TMP_DIR/* 

exit 0 

It's possible you don't want all 
these blocked. I usually leave tor exit 
nodes open to enable anonymity, 
or if you do business in China, 
you certainly can't block every IP 
range coming from there. Remove 
unwanted items from the URLs to 
be downloaded. When I turned this 
on, within 24 hours, the number of 
banned IPs triggered by brute-force 
crack attempts on SSH dropped from 
hundreds to less than ten. 

Although there are many more 
areas to be hardened, since according 
to principle three we assume all 
measures will be defeated, I will have 
to leave things like locking down 
cron and bash as well as automating 
standard security configurations 
across environments for another 


day. There are a few more packages 
I consider security musts, including 
multiple methods to check for 
intrusion (I run both chkrootkit and 
rkhunter to update signatures and 
scan my systems at least daily). I want 
to conclude with one last must-use 
tool: Fail2ban. 

Fai12ban is available in virtually 
every distribution's repositories now, 
and it has become my go-to. Not only 
is it an extensible Swiss-army knife of 
brute-force authentication prevention, 
it comes with an additional bevy of 
filters to detect other attempts to do 
bad things to your system. If you do 
nothing but install it, run it, keep it 
updated and turn on its filters for any 
services you run, especially SSH, you 
will be far better off than you were 
otherwise. As for me, I have other 
higher-level software like WordPress 
log to auth.log for filtering and 
banning of malefactors with Fail2ban. 
You can custom-configure how long 
to ban based on how many filter 
matches (like failed login attempts 
of various kinds) and specify longer 
bans for "recidivist" abusers that keep 
coming back. 

Here's one example of the 
extensibility of the tool. During log 
review (another important component 
of a holistic security approach), 

I noticed many thousands of the 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 79 


FEATURE Server Hardening 


following kinds of probes, conning 
especially from China: 

sshd[***]: Received disconnect from **.**.**.**; n : Bye Bye [preauth] 
sshd[***]: Received disconnect from **.**.**.**; n : Bye Bye [preauth] 
sshd[***]: Received disconnect from **.**.**.**; n : Bye Bye [preauth] 

There were two forms of this, and 
I could not find any explanation 
of a known exploit that matched 
this pattern, but there had to be 
a reason I was getting so many 
so quickly. It wasn't enough to be 
a denial of service, but it was a 
steady flow. Either it was a zero-day 
exploit or some algorithm sending 
malformed requests of various kinds 
hoping to trigger a memory problem 
in hopes of uncovering an exploit— 
in any case, there was no reason to 
allow them to continue. 

I added this line to the fai 1 regex = 
section of /etc/fai12ban/fi11er.d/sshd.local: 

A %(_prefix_line)sReceived disconnect from <H0ST>: 

*11: (Bye Bye)? \[preauth\]$ 

Within minutes, I had banned 20 
new IP addresses, and my logs were 
almost completely clear of these lines 
going forward. 

By now, you've seen my three 
primary principles of server 
hardening in action enough to 
know that systematically applying 


them to your systems will have you 
churning out reasonably hardened 
systems in no time. But, just to 
reiterate one more time: 

1. Minimize attack surface. 

2. Secure whatever remains and 
must be exposed. 

3. Assume all security measures 
will be defeated. 

Feel free to give me a shout and 
let me know what you thought 
about the article. Let me know 
your thoughts on what I decided to 
include, any major omissions I cut for 
the sake of space you thought should 
have been included, and things you'd 
like to see in the future!* 


[root@localhost:~] # whoami 
uid=0 

Greg Bledsoe, VP of Operations, Personal, Inc 
CEH, CPT, lj@bledsoehome.net 
@geek_king 

https://www.linkedin.com/in/gregbledsoe 

20 years of making things work good, work again when 

they stop, and not stop working anymore. 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


80 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


druoaUzeme 

Instant Access to Premium 
Online Drupal Training 


V Instant access to hundreds of hours of Drupal 
training with new videos added every week! 

y/ Learn from industry experts with real world 
experience building high profile sites 

s/ Learn on the go wherever you are with apps 
for iOS, Android & Roku 

s/ We also offer group accounts. Give your 
whole team access at a discounted rate! 

Learn about our latest video releases and 
offers first by following us on Facebook and 
Twitter (@drupalizeme)! 

Go to http://drupalize.me and 
get Drupalized today! 


• nucromiker.module x mjtrom*ker j* x jquerv mscrtAtOrct u 


Usage: S(obj). irisertAtCursorftext); 
obj - a textarea or textfield 
text - a string to insert 


$.fn.extend({ 

insertAtCaret: function(/nylfaZi/e){ 
return S(this).each(function(){ 

// If target element is hidden. 

if (S(this).is(' :hidden*) II $( 
return; 

} 

if (document. selection) { 


10 a 


don’t do it. 

).parents(' :hidde 


FREE DOWNLOADS 


WEBCASTS 


Maximizing NoSQL Clusters for Large Data Sets 

—- "" Sponsor: IBM 


This follow-on webcast to Reuven M. Lerner's well-received and widely acclaimed Geek Guide, "Take Control of Growing Redis 
NoSQL Server Clusters", will extend the discussion and get into the nuts and bolts of optimally maximizing your NoSQL clusters 
working with large data sets. Reuven's deep knowledge of development and NoSQL clusters will combine with Brad Brech's inti¬ 
mate understanding of the intricacies of IBM's Power Systems and large data sets in a free-wheeling discussion that will answer all 
your questions on this complex subject. 


> http://geekguide.linuxjournal.com/content/maximizing-nosql-clusters-large-data-sets 


puppet 

labs 


Howto Build High-Performing IT Teams — 
Including New Data on IT Performance from 
Puppet Labs 2015 State of DevOps Report 


Sponsor: Puppet Labs 


DevOps represents a profound change from the way most IT departments have traditionally worked: from siloed teams and high- 
anxiety releases to everyone collaborating on uneventful and more frequent releases of higher-quality code. It doesn't matter how 
large or small an organization is, or even whether it's historically slow moving or risk averse — there are ways to adopt DevOps 
sanely, and get measurable results in just weeks. 


> http://geekguide.linuxjournal.com/content/how-build-high-performing-it-teams-including-new-data-it- 
performance-puppet-labs-2015-state 


WHITE PAPERS 


redislabs Comparing NoSQL Solutions 

In a Real-World Scenario 

Sponsor: RedisLabs | Topic: Web Development | Author: Avalon Consulting 

Specializing in cloud architecture, Emind Cloud Experts is an AWS Advanced Consulting Partner and a Google Cloud Platform 
Premier Partner that assists enterprises and startups in establishing secure and scalable IT operations. The following benchmark 
employed a real-world use case from an Emind customer. The Emind team was tasked with the following high-level requirements: 

• Support a real-time voting process during massive live events 

(e.g., televised election surveys or "America Votes" type game shows). 

• Keep voters' data anonymous but unique. 

• Ensure scalability to support surges in requests. 

> http://geekguide.linuxjournal.com/content/comparing-nosql-solutions-real-world-scenario 


82 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


FREE DOWNLOADS 


WHITE PAPERS 


>. t Linux Management with Red Hat Satellite: 
reanai. Measuring Business Impact and ROI 

Sponsor: Red Hat | Topic: Linux Management 

Linux has become a key foundation for supporting today's rapidly growing IT environments. Linux is being used to de¬ 
ploy business applications and databases, trading on its reputation as a low-cost operating environment. For many IT 
organizations, Linux is a mainstay for deploying Web servers and has evolved from handling basic file, print, and utility 
workloads to running mission-critical applications and databases, physically, virtually, and in the cloud. As Linux grows 
in importance in terms of value to the business, managing Linux environments to high standards of service quality — 
availability, security, and performance — becomes an essential requirement for business success. 

> http://lnxjr.nl/RHS-ROI 


Standardized Operating Environments 
“ reanai. for | T Ef fj C j ency 

Sponsor: Red Hat 

The Red Hat® Standard Operating Environment SOE helps you define, deploy, and maintain Red Hat Enterprise Linux® 
and third-party applications as an SOE. The SOE is fully aligned with your requirements as an effective and managed 
process, and fully integrated with your IT environment and processes. 

Benefits of an SOE: 

SOE is a specification for a tested, standard selection of computer hardware, software, and their configuration for use 
on computers within an organization. The modular nature of the Red Hat SOE lets you select the most appropriate 
solutions to address your business' IT needs. 

SOE leads to: 

• Dramatically reduced deployment time. 

• Software deployed and configured in a standardized manner. 

• Simplified maintenance due to standardization. 

• Increased stability and reduced support and management costs. 

• There are many benefits to having an SOE within larger environments, such as: 

• Less total cost of ownership (TCO) for the IT environment. 

• More effective support. 

• Faster deployment times. 

• Standardization. 

> http://lnxjr.nl/RH-SOE 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 83 


EOF 

A 


How Will the 
Big Data Craze 
Play Out? 


DOC SEARLS 


And, how does it compare to what we’ve already experienced 
with Linux and open source? 


I was in the buzz-making 

business long before I learned 
how it was done. That 
happened here, at Linux Journal. 
Some of it I learned by watching 
kernel developers make Linux so 
useful that it became irresponsible 
for anybody doing serious 
development not to consider it— 
and, eventually, not to use it. Some 
I learned just by doing my job here. 
But most of it I learned by watching 
the term "open source" get adopted 
by the world, and participating as a 
journalist in the process. 

For a view of how quickly "open 
source" became popular, see Figure 
1 for a look at what Google's Ngram 
viewer shows. 

Ngram plots how often a term 


appears in books. It goes only to 
2008, but the picture is clear enough. 

I suspect that curve's hockey stick 
began to angle toward the vertical 
on February 8, 1998. That was when 
Eric S. Raymond (aka ESR), published 
an open letter titled "Goodbye, 

'free software'; hello, 'open source'" 
and made sure it got plenty of 
coverage. The letter leveraged 
Netscape's announcement two weeks 
earlier that it would release the 
source code to what would become 
the Mozilla browser, later called 
Firefox. Eric wrote: 

It's crunch time, people. The 
Netscape announcement changes 
everything. We've broken out of 
the little corner we've been in for 


84 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


0 , 0000140 % -i 


0 . 0000120 % 

0 . 0000100 % 

0 . 0000080 % 

0 . 0000060 % 

0 . 0000040 % 

0 . 0000020 % 

0 . 0000000 % 


Google books Ngram Viewer 


,Qpen source 


1920 


1930 


1940 1950 1960 1970 1980 


1990 


2000 


Figure 1. Google Ngram Viewer: "open source” 


twenty years. We're in a whole 
new game now, a bigger and more 
exciting one—and one I think we 
can win. 

Which we did. 

How? Well, official bodies, such as 
the Open Source Initiative (OSI), were 
founded. (See Resources for a link 
to more history of the OSI.) O'Reilly 
published books and convened 
conferences. We wrote a lot about it 
at the time and haven't stopped (this 
piece being one example of that). But 
the prime mover was Eric himself, 
whom Christopher Locke describes as 
"a rhetorician of the first water". 

To put this in historic context, 
the dot-com mania was at high ebb 
in 1998 and 1999, and both Linux 
and open source played huge roles 


in that. Every Linux World Expo 
was lavishly funded and filled by 
optimistic start-ups with booths of all 
sizes and geeks with fun new jobs. 

At one of those, more than 10,000 
attended an SRO talk by Linus. At the 
Expos and other gatherings, ESR held 
packed rooms in rapt attention, for 
hours, while he held forth on Linux, 
the hacker ethos and much more. 

But his main emphasis was on open 
source, and the need for hackers and 
their employers to adopt its code and 
methods—which they did, in droves. 
(Let's also remember that two of 
the biggest IPOs in history were Red 
Hat's and VA Linux's, in August and 
December 1999.) 

Ever since witnessing those 
success stories, I have been alert 
to memes and how they spread in 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 85 


EOF 


r 


Google Trends 


2005 2007 

2000 

2011 

2013 

2015 

Figure 2. Google Trends:" 

big data” 


IBM big data 

McKinsey big data 


Search term 

Search term 


Google Trends 


Figure 3. Google Trends: “IBM big data”, "McKinsey big data” 


the technical world. Especially "Big 
Data" (see Figure 2). 

What happened in 2011? Did Big 
Data spontaneously combust? Was 
there a campaign of some kind? A 
coordinated set of campaigns? 

Though I can't prove it (at least 
not in the time I have), I believe the 


main cause was "Big data: The 
next frontier for innovation, 
competition, and productivity", 
published by McKinsey in May 2011, 
to much fanfare. That report, and 
following ones by McKinsey, drove 
publicity in Forbes, The Economist, 
various O'Reilly pubs, Financial Times 


86 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


1 


EOF 


IBM big d... 

SAP big d... 

HP big da... 

Oracle big... 

Microsoft... 

Search term 

Search term 

Search term 

Search term 

Search term 


Google Trends 


Figure 4. Google Trends: “IBM big data”, "SAP big data”, "HP big data”, "Oracle big data”, 
"Microsoft big data” 


and many others—while providing 
ample sales fodder for every big 
vendor selling Big Data products 
and services. 

Among those big vendors, none 
did a better job of leveraging and 
generating buzz than IBM. See 
Resources for the results of a Google 
search for IBM + "Big Data", for 
the calendar years 2010-201 1. 

Note that the first publication listed 
in that search, "Bringing big data 
to the Enterprise", is dated May 
16, 2011, the same month as the 
McKinsey report. The next, "IBM Big 
Data - Where do I start?" is dated 
November 23, 201 1. 

Figure 3 shows a Google Trends graph 


for McKinsey, IBM and "big data". 

See that bump for IBM in late 2010 
in Figure 3? That was due to a lot 
of push on IBM's part, which you 
can see in a search for IBM and big 
data just in 2010—and a search just 
for big data. So there was clearly 
something in the water already. But 
searches, as we see, didn't pick up 
until 2011. That's when the craze 
hit the marketplace, as we see in a 
search for IBM and four other big 
data vendors (Figure 4). 

So, although we may not have a 
clear enough answer for the cause, we 
do have clear evidence of the effects. 

Next question: to whom do those 
companies sell their Big Data stuff? 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 87 


At the very least, it's the CMO, or 
Chief Marketing Officer—a title that 
didn't come into common use until 
the dot-com boom and got huge 
after that, as marketing's share of 
corporate overhead went up and up. 
On February 12, 2012, for example, 
Forbes ran a story titled "Five Years 
From Now, CMOs Will Spend More on 
IT Than CIOs Do". It begins: 

Marketing is now a fundamental 
driver of IT purchasing, and that 
trend shows no signs of stopping— 
or even slowing down—any time 
soon. In fact, Gartner analyst Laura 
McLellan recently predicted that by 
2017, CMOs will spend more on IT 
than their counterpart CIOs. 

At first, that prediction may sound 
a bit over the top. (In just five years 
from now, CMOs are going to be 
spending more on IT than CIOs 
do?) But, consider this: 1) as we 
all know, marketing is becoming 
increasingly technology-based; 2) 
harnessing and mastering Big Data 
is now key to achieving competitive 
advantage; and 3) many marketing 
budgets already are larger—and 
faster growing—than IT budgets. 

In June 2012, IBM's index page 
was headlined, "Meet the new Chief 


Executive Customer. That's 
who's driving the new science of 
marketing." The copy was directly 
addressed to the CMO. In response, 

I wrote "Yes, please meet the 
Chief Executive Customer", which 
challenged some of IBM's pitch at 
the time. (I'm glad I quoted what I 
did in that post, because all but one 
of the links now go nowhere. The 
one that works redirects from the 
original page to "Emerging trends, 
tools and tech guidance for the 
data-driven CMO".) 

According to Wikibon, IBM 
was the top Big Data vendor by 
2013, raking in $1,368 billion in 
revenue. In February of this year 
(2015), Reuters reported that IBM 
"is targeting $40 billion in annual 
revenue from the cloud, big data, 
security and other growth areas 
by 2018", and that this "would 
represent about 44 percent of $90 
billion in total revenue that analysts 
expect from IBM in 2018". 

So I'm sure all the publicity works. 

I am also sure there is a mania to 
it, especially around the wanton 
harvesting of personal data by 
all means possible, for marketing 
purposes. Take a look at "The Big 
DatastiIlery", co-published by IBM and 
Aberdeen, which depicts this system 
at work (see Resources). I wrote about 


88 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


EOF 


i 


The degree to which it demeans and insults our 
humanity is a measure of how insane marketing 
mania, drunk on a diet of Big Data, has become. 


it in my September 2013 EOF, titled 
"Linux vs. Bullshit". The " datasti 11 e ry" 
depicts human beings as beakers on 
a conveyor belt being fed marketing 
goop and releasing gases for the 
"datastillery" to process into more 
marketing goop. The degree to which 
it demeans and insults our humanity 
is a measure of how insane marketing 
mania, drunk on a diet of Big Data, 
has become. 

T.Rob Wyatt, an alpha geek and 
IBM veteran, doesn't challenge what 
I say about the timing of the Big 
Data buzz rise or the manias around 
its use as a term. But he does point 
out that Big Data is truly different in 
kind from its predecessor buzzterms 
(such as Data Processing) and how it 
deserves some respect: 

The term Big Data in its original 
sense represented a complete 
reversal of the prevailing approach 
to data. Big Data specifically 
refers to the moment in time 
when the value of keeping the 
data exceeded the cost and the 


prevailing strategy changed from 
purging data to retaining it. 

He adds: 

CPU cycles, storage and 
bandwidth are now so cheap 
that the cost of selecting which 
data to omit exceeds the cost of 
storing it all and mining it for 
value later. It doesn't even have 
to be valuable today, we can just 
store data away on speculation, 
knowing that only a small portion 
of it eventually needs to return 
value in order to realize a profit. 
Whereas we used to ruthlessly 
discard data, today we relentlessly 
hoard it; even if we don't know 
what the hell to do with it. We 
just know that whatever data 
element we discard today will be 
the one we really need tomorrow 
when the new crop of algorithms 
comes out. 

Which gets me to the story of 
Bill Binney, a former analyst with 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 89 


EOF 


r 


Meanwhile, I’m wondering when and how the Big 
Data craze will run out—or if it ever will. 


the NSA. His specialty with the 
agency was getting maximum 
results from minimum data, by 
recognizing patterns in the data. 
One example of that approach was 
ThinThread, a system he and his 
colleagues developed at the NSA for 
identifying patterns indicating likely 
terrorist activity. ThinThread, Binney 
believes, would have identified the 
9/11 hijackers, had the program 
not been discontinued three weeks 
before the attacks. Instead, the NSA 
favored more expensive programs 
based on gathering and hoarding 
the largest possible sums of data 
from everywhere, which makes it 
all the harder to analyze. His point: 
you don't find better needles in 
bigger haystacks. 

Binney resigned from the NSA 
after ThinThread was canceled and 
has had a contentious relationship 
with the agency ever since. I've 
had the privilege of spending some 
time with him, and I believe he is 
A Good American—the title of an 
upcoming documentary about him. 
I've seen a pre-release version, and 
I recommend seeing it when it hits 


the theaters. 

Meanwhile, I'm wondering when 
and how the Big Data craze will 
run out—or if it ever will. 

My bet is that it will, for 
three reasons. 

First, a huge percentage of Big 
Data work is devoted to marketing, 
and people in the marketplace are 
getting tired of being both the 
sources of Big Data and the targets 
of marketing aimed by it. They're 
rebelling by blocking ads and tracking 
at growing rates. Given the size of 
this appetite, other prophylactic 
technologies are sure to follow. For 
example, Apple is adding "Content 
Blocking" capabilities to its mobile 
Safari browser. This lets developers 
provide ways for users to block ads 
and tracking on their IOS devices, 
and to do it at a deeper level than 
the current add-ons. Naturally, all of 
this is freaking out the surveillance- 
driven marketing business known 
as "adtech" (as a search for adtech 
+ adblock reveals). 

Second, other corporate functions 
must be getting tired of marketing 
hogging so much budget, while 


90 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


earning customer hate in the 
marketplace. After years of winning 
budget fights among CXOs, expect 
CMOs to start losing a few—or more. 

Third, marketing is already looking 
to pull in the biggest possible data 
cache of all, from the Internet of 
Things. Here's T.Rob again: 

loT device vendors will sell their 
data to shadowy aggregators who 
live in the background ("...we 
may share with our affiliates..."). 
These are companies that provide 
just enough service so the 
customer-facing vendor can say 
the aggregator is a necessary 
part of their business, hence an 
affiliate or partner. 

The aggregators will do something 
resembling "big data" but 
generally are more interested in 
state than trends (I'm guessing at 
that based on current architecture) 
and will work on very specialized 
data sets of actual behavior 
seeking not merely to predict but 
rather to manipulate behavior 
in the immediate short term 
future (minutes to days). Since 
the algorithms and data sets 
differ greatly from those in the 
past, the name will change. The 
pivot will be the development of 


Advertiser 

Index 

Thank you as always for supporting our 
advertisers by buying their products! 


ADVERTISER 

URL 

PAGE # 

AnDevCon 

http://www.AnDevCon.com/ 

49 

Drupalize.me 

http://www.drupalize.me 

81 

EmperorLinux 

http://www.emperorlinux.com 

21 

Fossetcon 2015 

http://www.fossetcon.org 

45 

Peer 1 

http://go.peer1 .com/linux 

67 

Puppet Labs 

http://puppetlabs.com 

1,7,51 

Usinex LISA 

https://www.usenix.org/ 
conference/lisa 15 

19 


ATTENTION ADVERTISERS 


The Linux Journal brand's following has 
grown to a monthly readership nearly 
one million strong. Encompassing the 
magazine, Web site, newsletters and 
much more, Linux Journal offers the 
ideal content environment to help you 
reach your marketing objectives. For 
more information, please visit 
http://www.linuxjournal.com/advertising. 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 91 


EOF 


r 


Google Trends 


2005 2007 2009 2011 2013 2015 


Figure 5. Google Trends: “open source”, "big data” 


new specialist roles in gathering, 
aggregating, correlating, and 
analyzing the datasets. 

This is only possible because our 
current regulatory regime allows 
all new data tech by default. If 
we can, then we should. There 
is no accountability of where 
the data goes after it leaves the 
customer-facing vendor's hands. 
There is no accountability of data 
gathered about people who are 
not account holders or members 
of a service. 

I'm betting that both customers and 
non-marketing parts of companies are 
going to fight that. 


Finally, I'm concerned about what I 
see in Figure 5. 

If things go the way Google Trends 
expects, next year open source and 
big data will attract roughly equal 
interest from those using search 
engines. This might be meaningless, 
or it might be meaningful. I dunno. 
What do you think?* 


Doc Searls is Senior Editor of Linux Journal. He is also a fellow 
with the Berkman Center for Internet and Society at Harvard 
University and the Center for Information Technology and Society 
at UC Santa Barbara. 

Illlllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Send comments or feedback via 

http://www.linuxjournal.com/contact 
or to ljeditor@linuxjournal.com. 


92 / NOVEMBER 2015 / WWW.LINUXJOURNAL.COM 


Resources 

Eric S. Raymond: http://www.catb.org/esr 

“Goodbye, ’free software’; hello, ’open source’”, 

by Eric S. Raymond: http://www.catb.org/esr/open-source.html 

“Netscape Announces Plans to Make Next-Generation 
Communicator Source Code Available Free on the Net”: 
http://web.archive.org/web/20021001071727/wp.netscape.com/ 
newsref/pr/newsrelease558.html 

Open Source Initiative: http://opensource.org/about 

History of the OSI: http://opensource.org/history 

O’Reilly Books on Open Source: http://search.oreilly. 
com/?q=open+source 

O’Reilly’s OSCON: http://www.oscon.com/open-source-eu-2015 

Red Hat History (Wikipedia): 
https://en.wikipedia. 0 rg/wiki/Red_Hat#Hist 0 ry 

“VA Linux Registers A 698% Price Pop”, by Terzah Ewing, 

Lee Gomes and Charles Gasparino {The Wall Street Journal): 
http://www.wsj.com/articles/SB944749135343802895 

Google Trends “big data”: https://www.google.com/trends/ 
explore#q=big%20data 

“Big data: The next frontier for innovation, competition, and 
productivity”, by McKinsey: http://www.mckinsey.com/insights/ 
business_technology/big_data_the_next_frontier_for_innovation 

Google Search Results for IBM + “Big Data”, 2010-2011: 
https://www.google.com/search?q=%2BIBM+%22Big+Data 
%22&newwindow=1 &safe=off&biw=1267&bih=710&source 
=lnt&tbs=cdr%3A1 %2Ccd_min%3A1 %2F1 %2F2010%2Ccd_ 
max%3A12%2F31 %2F2011 &tbm= 

“Bringing big data to the Enterprise”: http://www-01.ibm.com/ 
sof t wa re/a u/d at a/b i g d at a 

“IBM Big Data - Where do I start?”: https://www.ibm.com/ 

developerworks/community/blogs/ibm-big-data/entry/ibm_big_ 

data_where_do_i_start?lang=en 

Google Trends: “IBM big data”, “McKinsey big data”: 

https://www.google.com/trends/explore#q=IBM%20big%20 

data,%20McKinsey%20big%20data&cmpt=q&tz=Etc/GMT%2B4 

Google Search Results for “IBM big data” in 2010: 
https://www.google.com/search?q=ibm+big+data&newwindow= 
1 &safe=off&biw=1095&bih=979&source=lnt&tbs=cdr%3A1 %2C 
cd_min%3A1 %2F1 %2F2010%2Ccd_max%3A12%2F31 %2F2010 

Google Search Results for Just “big data”: 
https://www.google.com/search?q=ibm+big+data&newwin 
dow=1 &safe=off&biw=1095&bih=979&source=lnt&tbs=cdr 
%3A1 %2Ccd_min%3A1 %2F1 %2F2010%2Ccd_max%3A12% 

2F31 %2F2010#newwindow=1&safe=off&tbs=cdr:1 %2Ccd_ 
min:1 %2F1 %2F2010%2Ccd_max:12%2F31 %2F2010&q=big+data 


Google Trends for “IBM big data”, “SAP big data”, “HP big data”, 
“Oracle big data”, “Microsoft big data”: 
https://www.google.com/search?q=ibm+big+data&newwin 
dow=1 &safe=off&biw=1095&bih=979&source=lnt&tbs=cdr 
%3A1 %2Ccd_min%3A1 %2F1 %2F2010%2Ccd_max%3A12% 

2F31 %2F2010#newwindow=1&safe=off&tbs=cdr:1 %2Ccd_ 
min:1 %2F1 %2F2010%2Ccd_max:12%2F31 %2F2010&q=big+data 

Google Books Ngram Viewer Results for “chief marketing officer” 
between 1900 and 2008: https://books.google.com/ngrams/graph7c 
ontent=chief+marketing+officer&year_start=1900&year_end=2008 
&corpus=0&smoothing=3&share=&direct_url=t1 %3B%2Cchief%20 
marketing %20officer%3B%2Cc0 

Forbes, “Five Years From Now, CMOs Will Spend More on IT 
Than CIOs Do”, by Lisa Arthur: http://www.forbes.com/sites/ 
lisaarthur/2012/02/08/five-years-from-now-cmos-will-spend-more- 
on-it-than-cios-do 

“By 2017 the CMO will Spend More on IT Than the CIO”, hosted by 
Gartner Analyst Laura McLellan (Webinar): http://my.gartner.com/ 
portal/server. pt?open=512&objlD=202&mode=2&PagelD=5553&res 
ld=1871515&ref=Webinar-Calendar 

“Yes, please meet the Chief Executive Customer”, by Doc Searls: 

https://blogs.law.harvard.edu/doc/2012/06/19/yes-please-meet-the- 

chief-executive-customer 

Emerging trends, tools and tech guidance for the data-driven CMO: 
http://www-935.ibm.com/services/c-suite/cmo 

Big Data Vendor Revenue and Market Forecast 2013-2017 (Wikibon): 
http://wikibon. 0 rg/wiki/v/Big_Data_Vendor_Revenue_and_Market_ 
Forecast_2013-2017 

“IBM targets $40 billion in cloud, other growth areas by 2018” 
(Reuters): http://www.reuters.com/article/2015/02/27/us-ibm- 
investors-idUSKBNOLU 1LC20150227 

“The Big Datastillery: Strategies to Accelerate the Return on 

Digital Data”: http://www.ibmbigdatahub.com/blog/big-datastillery- 

strategies-accelerate-return-digital-data 

“Linux vs. Bullshit”, by Doc Searls, Linux Journal, September 2013: 
http://www.linuxjournal.com/content/linux-vs-bullshit 

T.Rob Wyatt: https://tdotrob.wordpress.com 

William Binney (U.S. intelligence official): https://en.wikipedia.org/ 
wiki/William_Binney_%28U.S._intelligence_official%29 

ThinThread: https://en.wikipedia.org/wiki/ThinThread 

A Good American: http://www.imdb.com/title/tt4065414 

Safari 9.0 Secure Extension Distribution (“Content Blocking”): 

https://developer.apple.com/library/prerelease/ios/releasenotes/ 

General/WhatsNewlnSafari/Articles/Safari_9.html 

Google Search Results for adtech adblock: 
https://www.google.com/search?q=adtech+adblock&gws_rd=ssl 

Google Trends results for “open source”, “big data”: 

https://www.google.com/trends/explore#q=open%20source,%20 

big%20data&cmpt=q&tz=Etc/GMT%2B4 


WWW.LINUXJOURNAL.COM / NOVEMBER 2015 / 93