Linux Blimp – Page 94 – One more Linux Magazine

January 15, 2019January 24, 2019

Python Testing with pytest: Fixtures and Coverage

Python

Improve your Python testing even more.

In my last two articles, I introduced pytest, a library for testing Python code (see “Testing Your Code with Python’s pytest” Part I and Part II). pytest has become quite popular, in no small part because it’s so easy to write tests and integrate those tests into your software development process. I’ve become a big fan, mostly because after years of saying I should get better about testing my software, pytest finally has made it possible.

So in this article, I review two features of pytest that I haven’t had a chance to cover yet: fixtures and code coverage, which will (I hope) convince you that pytest is worth exploring and incorporating into your work.

Fixtures

When you’re writing tests, you’re rarely going to write just one or two. Rather, you’re going to write an entire “test suite”, with each test aiming to check a different path through your code. In many cases, this means you’ll have a few tests with similar characteristics, something that pytest handles with “parametrized tests”.

But in other cases, things are a bit more complex. You’ll want to have some objects available to all of your tests. Those objects might contain data you want to share across tests, or they might involve the network or filesystem. These are often known as “fixtures” in the testing world, and they take a variety of different forms.

In pytest, you define fixtures using a combination of the pytest.fixture decorator, along with a function definition. For example, say you have a file that returns a list of lines from a file, in which each line is reversed:


def reverse_lines(f):
   return [one_line.rstrip()[::-1] + '\n'
           for one_line in f]

Note that in order to avoid the newline character from being placed at the start of the line, you remove it from the string before reversing and then add a '\n' in each returned string. Also note that although it probably would be a good idea to use a generator expression rather than a list comprehension, I’m trying to keep things relatively simple here.

If you’re going to test this function, you’ll need to pass it a file-like object. In my last article, I showed how you could use a StringIO object for such a thing, and that remains the case. But rather than defining global variables in your test file, you can create a fixture that’ll provide your test with the appropriate object at the right time.

Here’s how that looks in pytest:


@pytest.fixture
def simple_file():
   return StringIO('\n'.join(['abc', 'def', 'ghi', 'jkl']))

On the face of it, this looks like a simple function—one that returns the value you’ll want to use later. And in many ways, it’s similar to what you’d get if you were to define a global variable by the name of “simple_file”.

At the same time, fixtures are used differently from global variables. For example, let’s say you want to include this fixture in one of your tests. You then can mention it in the test’s parameter list. Then, inside the test, you can access the fixture by name. For example:


def test_reverse_lines(simple_file):
   assert reverse_lines(simple_file) == ['cba\n', 'fed\n',
 ↪'ihg\n', 'lkj\n']

But it gets even better. Your fixture might act like data, in that you don’t invoke it with parentheses. But it’s actually a function under the hood, which means it executes every time you invoke a test using that fixture. This means that the fixture, in contrast with regular-old data, can make calculations and decisions.

You also can decide how often a fixture is run. For example, as it’s written now, this fixture will run once per test that mentions it. That’s great in this case, when you want to compare with a list or file-like structure. But what if you want to set up an object and then use it multiple times without creating it again? You can do that by setting the fixture’s “scope”. For example, if you set the scope of the fixture to be “module”, it’ll be available throughout your tests but will execute only a single time. You can do this by passing the scope parameter to the @pytest.fixture decorator:


@pytest.fixture(scope='module')
def simple_file():
   return StringIO('\n'.join(['abc', 'def', 'ghi', 'jkl']))

I should note that giving this particular fixture “module” scope is a bad idea, since the second test will end up having a StringIO whose location pointer (checked with file.tell) is already at the end.

These fixtures work quite differently from the traditional setup/teardown system that many other test systems use. However, the pytest people definitely have convinced me that this is a better way.

But wait—perhaps you can see where the “setup” functionality exists in these fixtures. And, where’s the “teardown” functionality? The answer is both simple and elegant. If your fixture uses “yield” instead of “return”, pytest understands that the post-yield code is for tearing down objects and connections. And yes, if your fixture has “module” scope, pytest will wait until all of the functions in the scope have finished executing before tearing it down.

Coverage

This is all great, but if you’ve ever done any testing, you know there’s always the question of how thoroughly you have tested your code. After all, let’s say you’ve written five functions, and that you’ve written tests for all of them. Can you be sure you’ve actually tested all of the possible paths through those functions?

For example, let’s assume you have a very strange function, only_odd_mul, which multiplies only odd numbers:


def only_odd_mul(x, y):
   if x%2 and y%2:
       return x * y
   else:
       raise NoEvenNumbersHereException(f'{x} and/or {y}
 ↪not odd')

Here’s a test you can run on it:


def test_odd_numbers():
   assert only_odd_mul(3, 5) == 15

Sure enough, the test passed. It works great! The software is terrific!

Oh, but wait—as you’ve probably noticed, that wasn’t a very good job of testing it. There are ways in which the function could give a totally different result (for example, raise an exception) that the test didn’t check.

Perhaps it’s easy to see it in this example, but when software gets larger and more complex, it’s not going to be so easy to eyeball it. That where you want to have “code coverage”, checking that your tests have run all of the code.

Now, 100% code coverage doesn’t mean that your code is perfect or that it lacks bugs. But it does give you a greater degree of confidence in the code and the fact that it has been run at least once.

So, how can you include code coverage with pytest? It turns out that there’s a package called pytest-cov on PyPI that you can download and install. Once that’s done, you can invoke pytest with the --cov option. If you don’t say anything more than that, you’ll get a coverage report for every part of the Python library that your program used, so I strongly suggest you provide an argument to --cov, specifying which program(s) you want to test. And, you should indicate the directory into which the report should be written. So in this case, you would say:


pytest --cov=mymul .

Once you’ve done this, you’ll need to turn the coverage report into something human-readable. I suggest using HTML, although other output formats are available:


coverage html

This creates a directory called htmlcov. Open the index.html file in this directory using your browser, and you’ll get a web-based report showing (in red) where your program still lacks coverage. Sure enough, in this case, it showed that the even-number path wasn’t covered. Let’s add a test to do this:


def test_even_numbers():
   with pytest.raises(NoEvenNumbersHereException):
       only_odd_mul(2,4)

And as expected, coverage has now gone up to 100%! That’s definitely something to appreciate and celebrate, but it doesn’t mean you’ve reached optimal testing. You can and should cover different mixtures of arguments and what will happen when you pass them.

Summary

If you haven’t guessed from my three-part focus on pytest, I’ve been bowled over by the way this testing system has been designed. After years of hanging my head in shame when talking about testing, I’ve started to incorporate it into my code, including in my online “Weekly Python Exercise” course. If I can get into testing, so can you. And although I haven’t covered everything pytest offers, you now should have a good sense of what it is and how to start using it.

Resources

The pytest website is at http://pytest.org.
An excellent book on the subject is Brian Okken’s Python testing with pytest, published by Pragmatic Programmers. He also has many other resources, about pytest and code testing in general, athttp://pythontesting.net.
Brian’s blog posts about pytest’s fixtures are informative and useful to anyone wanting to get started with them.

Source

January 15, 2019January 24, 2019

(Don’t) Return to Sender: How to Protect Yourself From Email Tracking | Linux.com

There are a lot of different ways to track email, and different techniques can lie anywhere on the spectrum from marginally acceptable to atrocious. Responsible tracking should aggregate a minimal amount of anonymous data, similar to page hits: enough to let the sender get a sense of how well their campaign is doing without invading users’ privacy. Email tracking should always be disclosed up-front, and users should have a clear and easy way to opt out if they choose to. Lastly, organizations that track should minimize and delete user data as soon as possible according to an easy-to-understand data retention and privacy policy.

Unfortunately, that’s often not how it happens. Many senders, including the U.S. government, do email tracking clumsily. Bad email tracking is ubiquitous, secretive, pervasive, and leaky. It can expose sensitive information to third parties and sometimes even others on your network. According to a comprehensive study from 2017, 70% of mailing list emails contain tracking resources. To make matters worse, around 30% of mailing list emails also leak your email address to third party trackers when you open them. And although it wasn’t mentioned in the paper, a quick survey we did of the same email dataset they used reveals that around 80% of these links were over insecure, unencrypted HTTP.

Here are some friendly suggestions to help make tracking less pervasive, less creepy, and less leaky.

MultiBootUSB | SparkyLinux

There is a new tool available: MultiBootUSB.

What is MultiBootUSB?

MultiBootUSB is a software/utility to create multi boot live Linux on a removable USB disk. It is similar to UNetbootin but many distros can be installed, provided you have enough space on the disk. MultiBootUSB also provides an option to uninstall distro(s) at any time, if you wish.

MultiBootUSB allows you to do the following:
– Install multiple live Linux and other Operating Systems to a USB disk and make it bootable without erasing existing data.
– Ability to uninstall installed OS later.
– Write ISO image directly to a USB disk (you can think of GUI for Linux dd command).
– Boot ISO images directly without rebooting your system using QEMU option.
– Boot bootable USBs without rebooting your system using QEMU option.
– Boot USB on UEFI/EFI system through GRUB2 bootloader (limited support).

Installation:
sudo apt update
sudo apt install python3-multibootusb

The MultiBootUSB GitHub project page: github.com/mbusb/multibootusb
The project author is Sundar; and co-author is Ian Bruce.

Source

January 14, 2019January 24, 2019

Key Resources for Effective, Professional Open Source Management | Linux.com

At organizations everywhere, managing the use of open source software well requires the participation of business executives, the legal team, software architecture, software development and maintenance staff and product managers. One of the most significant challenges is integrating all of these functions with their very different points of view into a coherent and efficient set of practices.

More than ever, it makes sense to investigate the many free and inexpensive resources for open source management that are available, and observe the practices of professional open source offices that have been launched within companies ranging from Microsoft to Oath to Red Hat.

Fundamentals

The Linux Foundation’s Fundamentals of Professional Open Source Management (LFC210) course is a good place to start. The course is explicitly designed to help individuals in disparate organizational roles understand the best practices for success.

The course is organized around the key phases of developing a professional open source management program:

Open Source Software and Open Source Management Basics
Open Source Management Strategy
Open Source Policy
Open Source Processes
Open Source Management Program Implementation

Best Practices

The Linux Foundation also offers a free ebook on open source management: Enterprise Open Source: A Practical Introduction. The 45-page ebook can teach you how to accelerate your company’s open source efforts, based on the experience of hundreds of companies spanning more than two decades of professional enterprise open source management. The ebook covers:

Why use open source
Various open source business models
How to develop your own open source strategy
Important open source workflow practices
Tools and integration

Official open source programs play an increasingly significant role in how DevOps and open source best practices are adopted by organizations, according to a survey conducted by The New Stack and The Linux Foundation (via the TODO Group). More than half of respondents to the survey (53 percent) across many industries said their organization has an open source software program or has plans to establish one.

“More than anything, open source programs are responsible for fostering open source culture,” the survey’s authors have reported. “By creating an open source culture, companies with open source programs see the benefits we’ve previously reported, including increased speed and agility in the development cycle, better license compliance and more awareness of which open source projects a company’s products depend on.”

Free Guides

How can your organization professionally create and manage a successful open source program, with proper policies and a strong organizational structure? The Linux Foundation offers a complete guide to the process, available here for free. The guide covers an array of topics for open source offices including: roles and responsibilities, corporate structures, elements of an open source management program, how to choose and hire an open source program manager, and more.

The free guide also features contributions from open source leaders. “The open source program office is an essential part of any modern company with a reasonably ambitious plan to influence various sectors of software ecosystems,” notes John Mark Walker, Founder of the Open Source Entrepreneur Network (OSEN) in the guide. “If a company wants to increase its influence, clarify its open source messaging, maximize the clout of its projects, or increase the efficiency of its product development, a multifaceted approach to open source programs is essential.”

Interested in even more on professional open source management? Don’t miss The Linux Foundation’s other free guides, which delve into tools for open source management, how to measure the success of an open source program, and much more.

This article originally appeared at The Linux Foundation

Source

January 14, 2019January 24, 2019

MongoDB Atlas – Women in Linux

My First Cluster – Getting started with MongoDB Atlas!

We will be exploring one innovation in data storage system known as MongoDB.

MongoDB provides schema-less design, high performance, high availability, and automatic scaling qualities which have now become a need and cannot be satisfactorily met by traditional RDBMS systems.

Speaker: Jay Gordon, Cloud Developer Advocate, MongoDB

Agenda:

• Intro to MongoDB / document model / compatibility

• Guided lesson on how to setup

• Connecting and creating a document

• how to use as part of an app

Required for class:

• Virtual Machine (if you need a vm email organizers to get set-up)

• Command line experience

• Basic understand of distributed systems and cloud computing (API’s, virtual machines, basic understanding of HTTP, PUT/GET, knowledge of AWS/Azure/Goggle Compute)

• Community Server: https://mongodb.com/download

• Atlas: https://cloud.mongodb.com/

Source

January 14, 2019January 24, 2019

How To Move Multiple File Types Simultaneously From Commandline

The other day I was wondering how can I move (not copy) multiple file types from directory to another. I already knew how to find and copy certain type of files from one directory to another. But, I don’t know how to move multiple file types simultaneously. If you’re ever in a situation like this, I know a easy way to do it from commandline in Unix-like systems.

Move Multiple File Types Simultaneously

Picture this scenario.You have multiple type of files, for example .pdf, .doc, .mp3, .mp4, .txt etc., on a directory named ‘dir1’. Let us take a look at the dir1 contents:

$ ls dir1
file.txt image.jpg mydoc.doc personal.pdf song.mp3 video.mp4

You want to move some of the file types (not all of them) to different location. For example, let us say you want to move doc, pdf and txt files only to another directory named ‘dir2’ in one go.

To copy .doc, .pdf and .txt files from dir1 to dir2 simultaneously, the command would be:

$ mv dir1/*.{doc,pdf,txt} dir2/

It’s easy, isn’t it?

Now, let us check the contents of dir2:

$ ls dir2/
file.txt mydoc.doc personal.pdf

See? Only the file types .doc, .pdf and .txt from dir1 have been moved to dir2.

You can add as many file types as you want to inside curly braces in the above command to move them across different directories. The above command just works fine for me on Bash.

Another way to move multiple file types is go to the source directory i.e dir1 in our case:

$ cd ~/dir1

And, move file types of your choice to the destination (E.g dir2) as shown below.

$ mv *.doc *.txt *.pdf /home/sk/dir2/

To move all files having a particular extension, for example .doc only, run:

$ mv dir1/*.doc dir2/

For more details, refer man pages.

$ man mv

Moving a few number of same or different file types is easy! You could do this with couple mouse clicks in GUI mode or use a one-liner command in CLI mode. However, If you have thousands of different file types in a directory and wanted to move multiple file types to different directory in one go, it would be a cumbersome task. To me, the above method did the job easily! If you know any other one-liner commands to move multiple file types at a time, please share it in the comment section below. I will check and update the guide accordingly.

And, that’s all for now. Hope this was useful. More good stuffs to come. Stay tuned!

Cheers!

Source

January 14, 2019January 24, 2019

NTFS Read/Write Filesystem Addon for Linux

Nice to see my submission right here 🙂 Really fast Eugenia !

I have downloaded and tested captive-ntfs on a notebook (Sony Vaio) here running Fedora Core 1. These are so far my impressions:

– installation of the RPM worked perfectly (it also scans the hdd for ntfs drives and creates entries in /etc/fstab plus the

directories in /mnt )

– first mount is slower than the Linux native NTFS driver from kernel

– read/write seems to work okay at the beginning….BUT i recognized a problem with big files. For example i tried to watch a movie sitting on one of my ntfs partitions…after 10 minutes or so mplayer stops. When i try to continue the playback mplayer loads and then just quits with “file ended”…really curious.

– i also once had a CPU raise to 100% for a few moments after mplayer stops working.

Maybe this is a bug or maybe it´s just because i used the RPM package which is not approved for Fedora (RedHat 9 was tested).

Any others share the same experience here ?

Source

January 14, 2019January 24, 2019

Key Resources for Effective, Professional Open Source Management

The Linux Foundation offers an abundance of resources to help you achieve success with open source.

Fundamentals

The course is organized around the key phases of developing a professional open source management program:

Open Source Software and Open Source Management Basics
Open Source Management Strategy
Open Source Policy
Open Source Processes
Open Source Management Program Implementation

Best Practices

Why use open source
Various open source business models
How to develop your own open source strategy
Important open source workflow practices
Tools and integration

Free Guides

Source

January 14, 2019January 24, 2019

How to Setup DRBD to Replicate Storage on Two CentOS 7 Servers

The DRBD (stands for Distributed Replicated Block Device) is a distributed, flexible and versatile replicated storage solution for Linux. It mirrors the content of block devices such as hard disks, partitions, logical volumes etc. between servers. It involves a copy of data on two storage devices, such that if one fails, the data on the other can be used.

You can think of it somewhat like a network RAID 1 configuration with the disks mirrored across servers. However, it operates in a very different way from RAID and even network RAID.

Originally, DRBD was mainly used in high availability (HA) computer clusters, however, starting with version 9, it can be used to deploy cloud storage solutions.

In this article, we will show how to install DRBD in CentOS and briefly demonstrate how to use it to replicate storage (partition) on two servers. This is the perfect article to get your started with using DRBD in Linux.

Testing Environment

For the purpose of this article, we are using two nodes cluster for this setup.

Node1: 192.168.56.101 – tecmint.tecmint.lan
Node2: 192.168.10.102 – server1.tecmint.lan

Step 1: Installing DRBD Packages

DRBD is implemented as a Linux kernel module. It precisely constitutes a driver for a virtual block device, so it’s established right near the bottom of a system’s I/O stack.

DRBD can be installed from the ELRepo or EPEL repositories. Let’s start by importing the ELRepo package signing key, and enable the repository as shown on both nodes.

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm

Then we can install the DRBD kernel module and utilities on both nodes by running:

# yum install -y kmod-drbd84 drbd84-utils

If you have SELinux enabled, you need to modify the policies to exempt DRBD processes from SELinux control.

# semanage permissive -a drbd_t

In addition, if your system has a firewall enabled (firewalld), you need to add the DRBD port 7789 in the firewall to allow synchronization of data between the two nodes.

Run these commands on the first node:

# firewall-cmd --permanent --add-rich-rule='rule family="ipv4"  source address="192.168.56.102" port port="7789" protocol="tcp" accept'
# firewall-cmd --reload

Then run these commands on second node:

# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.56.101" port port="7789" protocol="tcp" accept'
# firewall-cmd --reload

Step 2: Preparing Lower-level Storage

Now that we have DRBD installed on the two cluster nodes, we must prepare a roughly identically sized storage area on both nodes. This can be a hard drive partition (or a full physical hard drive), a software RAID device, an LVM Logical Volume or a any other block device type found on your system.

For the purpose of this article, we will create a dummy block device of size 2GB using the dd command.

 
# dd if=/dev/zero of=/dev/sdb1 bs=2024k count=1024

We will assume that this is an unused partition (/dev/sdb1) on a second block device (/dev/sdb) attached to both nodes.

Step 3: Configuring DRBD

DRBD’s main configuration file is located at /etc/drbd.conf and additional config files can be found in the /etc/drbd.d directory.

To replicate storage, we need to add the necessary configurations in the /etc/drbd.d/global_common.conf file which contains the global and common sections of the DRBD configuration and we can define resources in .resfiles.

Let’s make a backup of the original file on both nodes, then then open a new file for editing (use a text editor of your liking).

# mv /etc/drbd.d/global_common.conf /etc/drbd.d/global_common.conf.orig
# vim /etc/drbd.d/global_common.conf

Add the following lines in in both files:

global {
 usage-count  yes;
}
common {
 net {
  protocol C;
 }
}

Save the file, and then close the editor.

Let’s briefly shade more light on the line protocol C. DRBD supports three distinct replication modes (thus three degrees of replication synchronicity) which are:

protocol A: Asynchronous replication protocol; it’s most often used in long distance replication scenarios.
protocol B: Semi-synchronous replication protocol aka Memory synchronous protocol.
protocol C: commonly used for nodes in short distanced networks; it’s by far, the most commonly used replication protocol in DRBD setups.

Important: The choice of replication protocol influences two factors of your deployment: protection and latency. And throughput, by contrast, is largely independent of the replication protocol selected.

Step 4: Adding a Resource

A resource is the collective term that refers to all aspects of a particular replicated data set. We will define our resource in a file called /etc/drbd.d/test.res.

Add the following content to the file, on both nodes (remember to replace the variables in the content with the actual values for your environment).

Take note of the hostnames, we need to specify the network hostname which can be obtained by running the command uname -n.

resource test {
        on tecmint.tecmint.lan {
 		device /dev/drbd0;
       		disk /dev/sdb1;
        		meta-disk internal;	
                	address 192.168.56.101:7789;
        }
        on server1.tecmint.lan  {
		device /dev/drbd0;
        		disk /dev/sdb1;
        		meta-disk internal;
                	address 192.168.56.102:7789;
        }
}
}

where:

on hostname: the on section states which host the enclosed configuration statements apply to.
test: is the name of the new resource.
device /dev/drbd0: specifies the new virtual block device managed by DRBD.
disk /dev/sdb1: is the block device partition which is the backing device for the DRBD device.
meta-disk: Defines where DRBD stores its metadata. Using Internal means that DRBD stores its meta data on the same physical lower-level device as the actual production data.
address: specifies the IP address and port number of the respective node.

Also note that if the options have equal values on both hosts, you can specify them directly in the resource section.

For example the above configuration can be restructured to:

resource test {
	device /dev/drbd0;
	disk /dev/sdb1;
        	meta-disk internal;	
        	on tecmint.tecmint.lan {
 		address 192.168.56.101:7789;
        	}
        	on server1.tecmint.lan  {
		address 192.168.56.102:7789;
        		}
}

Step 4: Initializing and Enabling Resource

To interact with DRBD, we will use the following administration tools which communicate with the kernel module in order to configure and administer DRBD resources:

drbdadm: a high-level administration tool of the DRBD.
drbdsetup: a lower-level administration tool for to attach DRBD devices with their backing block devices, to set up DRBD device pairs to mirror their backing block devices, and to inspect the configuration of running DRBD devices.
Drbdmeta:is the meta data management tool.

After adding all the initial resource configurations, we must bring up the resource on both nodes.

# drbdadm create-md test

Initialize Meta Data Storage

Next, we should enable the resource, which will attach the resource with its backing device, then it sets replication parameters, and connects the resource to its peer:

# drbdadm up test

Now if you run the lsblk command, you will notice that the DRBD device/volume drbd0 is associated with the backing device /dev/sdb1:

# lsblk

List Block Devices

To disable the resource, run:

# drbdadm down test

To check the resource status, run the following command (note that the Inconsistent/Inconsistent disk state is expected at this point):

# drbdadm status test
OR
# drbdsetup status test --verbose --statistics 	#for  a more detailed status

Check Resource Status on Nodes

Step 5: Set Primary Resource/Source of Initial Device Synchronization

At this stage, DRBD is now ready for operation. We now need to tell it which node should be used as the source of the initial device synchronization.

Run the following command on only one node to start the initial full synchronization:

# drbdadm primary --force test
# drbdadm status test

Set Primary Node for Initial Device

Once the synchronization is complete, the status of both disks should be UpToDate.

Step 6: Testing DRBD Setup

Finally, we need to test if the DRBD device will work well for replicated data storage. Remember, we used an empty disk volume, therefore we must create a filesystem on the device, and mount it, to test if we can use it for replicated data storage.

We can create a filesystem on the device with the following command, on the node where we started the initial full synchronization (which has the resource with primary role):

# mkfs -t ext4 /dev/drbd0

Make Filesystem on Drbd Volume

Then mount it as shown (you can give the mount point an appropriate name):

# mkdir -p /mnt/DRDB_PRI/
# mount /dev/drbd0 /mnt/DRDB_PRI/

Now copy or create some files in the above mount point and do a long listing using ls command:

# cd /mnt/DRDB_PRI/
# ls -l

List Contents of Drbd Primary Volume

Next, unmount the the device (ensure that the mount is not open, change directory after unmounting it to prevent any errors) and change the role of the node from primary to secondary:

# umount /mnt/DRDB_PRI/
# cd
# drbdadm secondary test

On the other node (which has the resource with a secondary role), make it primary, then mount the device on it and perform a long listing of the mount point. If the setup is working fine, all the files stored in the volume should be there:

# drbdadm primary test
# mkdir -p /mnt/DRDB_SEC/
# mount /dev/drbd0 /mnt/DRDB_SEC/
# cd /mnt/DRDB_SEC/
# ls  -l

Test DRBD Setup Working on Secondary Node

For more information, see the man pages of the user space administration tools:

# man drbdadm
# man drbdsetup
# man drbdmeta

Reference: The DRBD User’s Guide.

Summary

DRBD is extremely flexible and versatile, which makes it a storage replication solution suitable for adding HA to just about any application. In this article, we have shown how to install DRBD in CentOS 7 and briefly demonstrated how to use it to replicate storage.

Source

January 14, 2019January 24, 2019

What metrics matter: A guide for open source projects

“Without data, you’re just a person with an opinion.”

Those are the words of W. Edwards Deming, the champion of statistical process control, who was credited as one of the inspirations for what became known as the Japanese post-war economic miracle of 1950 to 1960. Ironically, Japanese manufacturers like Toyota were far more receptive to Deming’s ideas than General Motors and Ford were.

Community management is certainly an art. It’s about mentoring. It’s about having difficult conversations with people who are hurting the community. It’s about negotiation and compromise. It’s about interacting with other communities. It’s about making connections. In the words of Red Hat’s Diane Mueller, it’s about “nurturing conversations.”

However, it’s also about metrics and data.

Some have much in common with software development projects more broadly. Others are more specific to the management of the community itself. I think of deciding what to measure and how as adhering to five principles.

1. Recognize that behaviors aren’t independent of the measurements you choose to highlight.

In 2008, Daniel Ariely published Predictably Irrational, one of a number of books written around that time that introduced behavioral psychology and behavioral economics to the general public. One memorable quote from that book is the following: “Human beings adjust behavior based on the metrics they’re held against. Anything you measure will impel a person to optimize his score on that metric. What you measure is what you’ll get. Period.”

This shouldn’t be surprising. It’s a finding that’s been repeatedly confirmed by research. It should also be familiar to just about anyone with business experience. It’s certainly not news to anyone in sales management, for example. Base sales reps’ (or their managers’) bonuses solely on revenue, and they’ll try to discount whatever it takes to maximize revenue, even if it puts margin in the toilet. Conversely, want the sales force to push a new product line—which will probably take extra effort—but skip the spiffs? Probably not happening.

And lest you think I’m unfairly picking on sales, this behavior is pervasive, all the way up to the CEO, as Ariely describes in a 2010 Harvard Business Review article: “CEOs care about stock value because that’s how we measure them. If we want to change what they care about, we should change what we measure.”

Developers and other community members are not immune.

2. You need to choose relevant metrics.

There’s a lot of folk wisdom floating around about what’s relevant and important that’s not necessarily true. My colleague Dave Neary offers an example from baseball: “In the late ’90s, the key measurements that were used to measure batter skill were RBI (runs batted in) and batting average (how often a player got on base with a hit, divided by the number of at-bats). The Oakland A’s were the first major league team to recruit based on a different measurement of player performance: on-base percentage. This measures how often they get to first base, regardless of how it happens.”

Indeed, the whole revolution of sabermetrics in baseball and elsewhere, which was popularized in Michael Lewis’ Moneyball, often gets talked about in terms of introducing data in a field that historically was more about gut feel and personal experience. But it was also about taking a game that had actually always been fairly numbers-obsessed and coming up with new metrics based on mostly existing data to better measure player value. (The data revolution going on in sports today is more about collecting much more data through video and other means than was previously available.)

3. Quantity may not lead to quality.

As a corollary, collecting lots of tangential but easy-to-capture data isn’t better than just selecting a few measurements you’ve determined are genuinely useful. In a world where online behavior can be tracked with great granularity and displayed in colorful dashboards, it’s tempting to be distracted by sheer data volume, even when it doesn’t deliver any great insight into community health and trajectory.

This may seem like an obvious point: Why measure something that isn’t relevant? In practice, metrics often get chosen because they’re easy to measure, not because they’re particularly useful. They tend to be more about inputs than outputs: The number of developers. The number of forum posts. The number of commits. Collectively, measures like this often get called vanity metrics. They’re ubiquitous, but most people involved with community management don’t think much of them.

Number of downloads may be the worst of the bunch. It’s true that, at some level, they’re an indication of interest in a project. That’s something. But it’s sufficiently distant from actively using the project, much less engaging with the project deeply, that it’s hard to view downloads as a very useful number.

Is there any harm in these vanity metrics? Yes, to the degree that you start thinking that they’re something to base action on. Probably more seriously, stakeholders like company management or industry observers can come to see them as meaningful indicators of project health.

4. Understand what measurements really mean and how they relate to each other.

Neary makes this point to caution against myopia. “In one project I worked on,” he says, ”some people were concerned about a recent spike in the number of bug reports coming in because it seemed like the project must have serious quality issues to resolve. However, when we looked at the numbers, it turned out that many of the bugs were coming in because a large company had recently started using the project. The increase in bug reports was actually a proxy for a big influx of new users, which was a good thing.”

In practice, you often have to measure through proxies. This isn’t an inherent problem, but the further you get between what you want to measure and what you’re actually measuring, the harder it is to connect the dots. It’s fine to track progress in closing bugs, writing code, and adding new features. However, those don’t necessarily correlate with how happy users are or whether the project is doing a good job of working towards its long-term objectives, whatever those may be.

5. Different measurements serve different purposes.

Some measurements may be non-obvious but useful for tracking the success of a project and community relative to internal goals. Others may be better suited for a press release or other external consumption. For example, as a community manager, you may really care about the number of meetups, mentoring sessions, and virtual briefings your community has held over the past three months. But it’s the number of contributions and contributors that are more likely to grab the headlines. You probably care about those too. But maybe not as much, depending upon your current priorities.

Still, other measurements may relate to the goals of any sponsoring organizations. The measurements most relevant for projects tied to commercial products are likely to be different from pure community efforts.

Because communities differ and goals differ, it’s not possible to simply compile a metrics checklist, but here are some ideas to think about:

Consider qualitative metrics in addition to quantitative ones. Conducting surveys and other studies can be time-consuming, especially if they’re rigorous enough to yield better-than-anecdotal data. It also requires rigor to construct studies so that they can be used to track changes over time. In other words, it’s a lot easier to measure quantitative contributor activity than it is to suss out if the community members are happier about their participation today than they were a year ago. However, given the importance of culture to the health of a community, measuring it in a systematic way can be a worthwhile exercise.

Breadth of community, including how many are unaffiliated with commercial entities, is important for many projects. The greater the breadth, the greater the potential leverage of the open source development process. It can also be instructive to see how companies and individuals are contributing. Projects can be explicitly designed to better accommodate casual contributors.

Are new contributors able to have an impact, or are they ignored? How long does it take for code contributions to get committed? How long does it take for a reported bug to be fixed or otherwise responded to? If they asked a question in a forum, did anyone answer them? In other words, are you letting contributors contribute?

Advancement within the project is also an important metric. Mikeal Rogers of the Node.js community explains: “The shift that we made was to create a support system and an education system to take a user and turn them into a contributor, first at a very low level, and educate them to bring them into the committer pool and eventually into the maintainer pool. The end result of this is that we have a wide range of skill sets. Rather than trying to attract phenomenal developers, we’re creating new phenomenal developers.”

Whatever metrics you choose, don’t forget why you made them metrics in the first place. I find a helpful question to ask is: “What am I going to do with this number?” If the answer is to just put it in a report or in a press release, that’s not a great answer. Metrics should be measurements that tell you either that you’re on the right path or that you need to take specific actions to course-correct.

For this reason, Stormy Peters, who handles community leads at Red Hat, argues for keeping it simple. She writes, “It’s much better to have one or two key metrics than to worry about all the possible metrics. You can capture all the possible metrics, but as a project, you should focus on moving one. It’s also better to have a simple metric that correlates directly to something in the real world than a metric that is a complicated formula or ration between multiple things. As project members make decisions, you want them to be able to intuitively feel whether or not it will affect the project’s key metric in the right direction.”

Source