Systems Engineer Salary Rises Even Higher with Linux Experience | Linux.com

System administration is a very reactive role, with sysadmins constantly monitoring networks for issues. Systems engineers, on the other hand, can build a system that anticipates users’ needs (and potential problems). In certain cases, they must integrate existing technology stacks (e.g., following the merger of two companies), and prototype different aspects of the network before it goes “live.”

In other words, it’s a complex job, with a salary to match. …If you want a truly impressive salary, though, consider specializing in Linux systems—that will translate into a $20,000 pay bump.

Source

IPtables – the Linux Firewall – ls /blog

Iptables is an extremely flexible firewall utility built for Linux operating systems. Whether you’re a novice Linux geek or a system administrator, there’s probably some way that iptables can be a great use to you. Read on as we show you how to configure the most versatile Linux firewall.

About iptables

iptables is a command-line firewall utility that uses policy chains to allow or block traffic. When a connection tries to establish itself on your system, iptables looks for a rule in its list to match it to. If it doesn’t find one, it resorts to the default action.

iptables almost always comes pre-installed on any Linux distribution. To update/install it, just retrieve the iptables package:

sudo apt-get install iptables

There are GUI alternatives to iptables like Firestarter, but iptables isn’t really that hard once you have a few commands down. You want to be extremely careful when configuring iptables rules, particularly if you’re SSH’d into a server, because one wrong command can permanently lock you out until it’s manually fixed at the physical machine.

$299 REGISTERS YOU FOR OUR NEWEST SELF PACED COURSE! LFD201 – INTRODUCTION TO OPEN SOURCE DEVELOPMENT, GIT, AND LINUX!

Types of Chains

iptables uses three different chains: input, forward, and output.

Input – This chain is used to control the behavior for incoming connections. For example, if a user attempts to SSH into your PC/server, iptables will attempt to match the IP address and port to a rule in the input chain.

Forward – This chain is used for incoming connections that aren’t actually being delivered locally. Think of a router – data is always being sent to it but rarely actually destined for the router itself; the data is just forwarded to its target. Unless you’re doing some kind of routing, NATing, or something else on your system that requires forwarding, you won’t even use this chain.

There’s one sure-fire way to check whether or not your system uses/needs the forward chain.

iptables -L -v

The screenshot above is of a server that’s been running for a few weeks and has no restrictions on incoming or outgoing connections. As you can see, the input chain has processed 11GB of packets and the output chain has processed 17GB. The forward chain, on the other hand, has not needed to process a single packet. This is because the server isn’t doing any kind of forwarding or being used as a pass-through device.

REGISTER TODAY FOR YOUR KUBERNETES FOR DEVELOPERS (LFD259) COURSE AND CKAD CERTIFICATION TODAY! $499!

Output – This chain is used for outgoing connections. For example, if you try to ping howtogeek.com, iptables will check its output chain to see what the rules are regarding ping and howtogeek.com before making a decision to allow or deny the connection attempt.

The caveat

Even though pinging an external host seems like something that would only need to traverse the output chain, keep in mind that to return the data, the input chain will be used as well. When using iptables to lock down your system, remember that a lot of protocols will require two-way communication, so both the input and output chains will need to be configured properly. SSH is a common protocol that people forget to allow on both chains.

Policy Chain Default Behavior

Before going in and configuring specific rules, you’ll want to decide what you want the default behavior of the three chains to be. In other words, what do you want iptables to do if the connection doesn’t match any existing rules?

To see what your policy chains are currently configured to do with unmatched traffic, run the iptables -L command.

As you can see, we also used the grep command to give us cleaner output. In that screenshot, our chains are currently figured to accept traffic.

More times than not, you’ll want your system to accept connections by default. Unless you’ve changed the policy chain rules previously, this setting should already be configured. Either way, here’s the command to accept connections by default:

iptables –policy INPUT ACCEPT

iptables –policy OUTPUT ACCEPTiptables –policy FORWARD ACCEPT

By defaulting to the accept rule, you can then use iptables to deny specific IP addresses or port numbers, while continuing to accept all other connections. We’ll get to those commands in a minute.

If you would rather deny all connections and manually specify which ones you want to allow to connect, you should change the default policy of your chains to drop. Doing this would probably only be useful for servers that contain sensitive information and only ever have the same IP addresses connect to them.

iptables –policy INPUT DROP

iptables –policy OUTPUT DROPiptables –policy FORWARD DROP

Connection-specific Responses

With your default chain policies configured, you can start adding rules to iptables so it knows what to do when it encounters a connection from or to a particular IP address or port. In this guide, we’re going to go over the three most basic and commonly used “responses”.

Accept – Allow the connection.

Drop – Drop the connection, act like it never happened. This is best if you don’t want the source to realize your system exists.

Reject – Don’t allow the connection, but send back an error. This is best if you don’t want a particular source to connect to your system, but you want them to know that your firewall blocked them.

The best way to show the difference between these three rules is to show what it looks like when a PC tries to ping a Linux machine with iptables configured for each one of these settings.

Hyperledger Fabric Fundamentals (LFD271) $299

Allowing the connection:

Dropping the connection:

Rejecting the connection:

Allowing or Blocking Specific Connections

With your policy chains configured, you can now configure iptables to allow or block specific addresses, address ranges, and ports. In these examples, we’ll set the connections to DROP, but you can switch them to ACCEPT or REJECT, depending on your needs and how you configured your policy chains.

Note: In these examples, we’re going to use iptables -A to append rules to the existing chain. iptables starts at the top of its list and goes through each rule until it finds one that it matches. If you need to insert a rule above another, you can use iptables -I [chain] [number] to specify the number it should be in the list.

Connections from a single IP address

This example shows how to block all connections from the IP address 10.10.10.10.

iptables -A INPUT -s 10.10.10.10 -j DROP

Connections from a range of IP addresses

This example shows how to block all of the IP addresses in the 10.10.10.0/24 network range. You can use a netmask or standard slash notation to specify the range of IP addresses.

iptables -A INPUT -s 10.10.10.0/24 -j DROP

or

iptables -A INPUT -s 10.10.10.0/255.255.255.0 -j DROP

Connections to a specific port

This example shows how to block SSH connections from 10.10.10.10.

iptables -A INPUT -p tcp –dport ssh -s 10.10.10.10 -j DROP

You can replace “ssh” with any protocol or port number. The -p tcp part of the code tells iptables what kind of connection the protocol uses. If you were blocking a protocol that uses UDP rather than TCP, then -p udp would be necessary instead.

This example shows how to block SSH connections from any IP address.

iptables -A INPUT -p tcp –dport ssh -j DROP

Connection States

As we mentioned earlier, a lot of protocols are going to require two-way communication. For example, if you want to allow SSH connections to your system, the input and output chains are going to need a rule added to them. But, what if you only want SSH coming into your system to be allowed? Won’t adding a rule to the output chain also allow outgoing SSH attempts?

That’s where connection states come in, which give you the capability you’d need to allow two way communication but only allow one way connections to be established. Take a look at this example, where SSH connections FROM 10.10.10.10 are permitted, but SSH connections TO 10.10.10.10 are not. However, the system is permitted to send back information over SSH as long as the session has already been established, which makes SSH communication possible between these two hosts.

iptables -A INPUT -p tcp –dport ssh -s 10.10.10.10 -m state –state NEW,ESTABLISHED -j ACCEPT

iptables -A OUTPUT -p tcp –sport 22 -d 10.10.10.10 -m state –state ESTABLISHED -j ACCEPT

Saving Changes

The changes that you make to your iptables rules will be scrapped the next time that the iptables service gets restarted unless you execute a command to save the changes. This command can differ depending on your distribution:

Ubuntu:

sudo /sbin/iptables-save

Red Hat / CentOS:

/sbin/service iptables save

Or

/etc/init.d/iptables save

Other Commands

List the currently configured iptables rules:

iptables -L

Adding the -v option will give you packet and byte information, and adding -n will list everything numerically. In other words – hostnames, protocols, and networks are listed as numbers.

To clear all the currently configured rules, you can issue the flush command.

iptables -F

Source

Google Shows Off New Android Dev Tools | Developers

Nov 13, 2018 12:20 PM PT

Google announced support for a range of new Android tools for application developers, chief among them the creation of a new support category for foldable devices, at last week’s Developers Summit.

After years of teasing and speculation, it finally looks as though foldable screen smartphones are headed to market. Google’s dev announcement followed closely on the heels of Samsung’s announcement at its own developer conference of a folding phone/tablet prototype with Infinity Flex Display.

The Android tools will take advantage of the new display technology, which literally bends and folds, noted Stephanie Cuthbertson, director of product management at Google. The technology is based on two variations of screen design: two-screen devices and one-screen devices.

Either way, the new devices will look like phones when folded, so they will fit into a pocket or purse. When unfolded, they will display screen continuity seamlessly. For example, as the device unfolds with an active image already in use, the image will transfer to the bigger screen without flutters or distortions.

“Official support from the Android development team means that folding phones are being taken seriously as a new type of device,” said Brandon Ackroyd, head of customer insight at
Tiger Mobiles.

Marketability Unknown

Consumer interest in owning foldable devices is still an unknown factor. It might turn out that if they build them, no one will come.

While Google’s new developer support augurs well for Samsung and other manufacturers working on foldable devices, Ackroyd does not think it means foldable phones are going to be the must-have product of the future.

“Right now, I don’t see the use case or any apparent advantages,” he told LinuxInsider. “What I do see, however, is a proof-of-concept of this type of foldable screen technology, and I think we’re going to see many different products make use of this soon. Perhaps [it will become] the next-generation type wearable that fully wraps around your wrist.”

Foldable mobile devices are not guaranteed to be successful as the technology transitions from concept to reality, observed Charles King, principal analyst at Pund-IT.

“In many ways, smartphone users are the world’s largest community of lab rats, in the sense that so many are willing to follow wherever handset makers lead in terms of new features and functions,” he told LinuxInsider.

It was not so long ago that oversized smartphones made by a few adventuresome vendors were dismissively called “phablets,” he recalled, but today, larger form factors dominate the high end of the market.

Foldables could catch on among consumers who are tired of lugging around phones that resemble paperback books. Those consumers still want large displays for media consumption.

“It will be interesting to see whether that happens, or if buyers simply stick with the designs they know,” said King.

Flexible Future

Given the resources that are available today, manufacturers have a good shot at making foldable screen devices appealing to consumers, suggested Rob Webber, CEO of
MoneySavingPro.

Futuristic-looking foldable smartphones have always been a dream concept, he noted.

“Smartphones have evolved dramatically since they were first invented, especially their displays,” Webber told LinuxInsider, recalling the stylus input before Apple introduced touchscreens that were able to react to the electrical impulses generated by the user’s fingers.

“Now we’re at a stage where edge-to-edge displays are the norm. Some could say that foldable displays are just the next rung on the ladder,” he said.

However, it’s unlikely the new technology will be a simple transition, Webber added. It will face many obstacles before the technology advances. The key to the foldable mobile device being a success is hardware and software integration, which will take time for manufacturers to perfect.

Cost also might be an issue. With the continually rising prices of new smartphones, if foldable devices are going to cost significantly more, they may appeal only to a very niche market, Webber reasoned.

“Having said that, I think we are currently at an exciting stage that marks the beginning of an emerging battle between manufacturers and the start of a very flexible future,” he said.

Potential Exploit and Failure

Money may not be the only cost factor consumers will face if foldable screens catch on. Smaller yet expanding devices may be more appealing to hackers than to consumers, warned Mike Banic, vice president of marketing at
Vectra.

The number of Web searches performed on a mobile device has been increasing steadily. New mobile technology that makes it easier to use mobile devices to create as well as consume information means that attackers will exploit the trend, he told LinuxInsider.

Additionally, the number of mobile vulnerabilities is highest on Android apps, largely due to its open source nature and the questionable security of third-party app stores, Banic said.

“Mechanicals could introduce a point of failure that may cause adoption to stumble,” he noted.

Google Dev Support Wrapup

Google’s plan to bring more Android tools to app developers is part of an ongoing program. The Dev Summit announcements suggest the company has decided to take an aggressive approach.

The new features will be rolled out to a number of developers who are considered “partners.” Availability then will expand further to devs before they eventually become accessible to all.

“It is a solid group of announcements that most Android developers will welcome,” said Pund-IT’s King. “At a time when smartphone market growth appears to have stalled, introducing support for new features and form factors could offer Android device makers what they need to make their products stand out from the crowd and attract customer interest.”

  • Updates to Kotlin Programming LanguageKotlin is not a Google-developed language, but it is one that devs have favored. Last week,
    JetBrains released the latest version of Kotlin, 1.3, which brings new language features, APIs, bug fixes and performance improvements.

    It has become the fastest-growing language, in terms of the number of contributors on GitHub, and has been voted the second most-loved language on Stack Overflow.

    In Google’s surveys, the more developers use Kotlin, the higher their satisfaction, according to Google’s Cuthbertson.

  • Android JetpackGoogle announced new Jetpack libraries, the next generation of tools and Android APIs to accelerate Android application development. It contains two new Architecture Component libraries: Navigation and Work Manager, that will move to Beta this month.

    The Navigation Architecture Component offers a simplified way to implement Android’s navigation principles in an application. Plus, the new Navigation Editor in Android Studio creates and edits navigation architecture.

    Jetpack Navigation Editor

    Jetpack Navigation Editor

    This eliminates navigation boilerplate while adding atomic navigation operations with easier animated transitions and more. WorkManager makes it easy to perform background tasks in the most efficient manner, choosing the most appropriate solution based on the application state and device API level.

  • SlicesGoogle moves Android Slices to public Search experiments. Slices is a new way to bring users to an app. It works like a mini snippet of an app that surfaces content and actions.

    “Aside from the foldable announcement, the most interesting angle was the further development of Google Slices. This new concept allows Android apps to push relevant components and features of their apps into other places, such as the Google Search,” said Tiger Mobiles’ Ackroyd.

    By loading a small part of an app right from the search results, users can get a near-native app experience without actually fully opening the app, he said. For example, if you google ‘What is the Tesla stock price?’ and you have a stocks app installed, that app can use a ‘slice’ to give you the info.

    Google announced the concept of Slices with Android 9 Pie. This new paradigm allows apps to surface relevant components and features in natural places, like Google Search.

    This month, Google will make Slices available as part of a public Early Access Program with partners. Google also will begin experiments in the Search app to surface various Slices when relevant.

  • Android StudioAndroid Studio, Google’s official IDE for Android development, has a new focus on productivity, build speed, quality and fundamentals.

    Google launched Android Studio 3.3 beta 3 last week. Upcoming releases will add a strong focus on quality and fundamentals. Google also announced the development of Android Studio on Chrome OS early next year.

  • Android App Bundles and Dynamic FeaturesThe Android App Bundle is the new publishing format that serves only the code and resources users need to run an app on their specific devices. It will reduce app sizes with an average of 35-40 percent savings compared to a universal APK.

    The app bundle now supports uncompressed native libraries. With no additional developer work needed, the app bundle now makes apps using native libraries an average of 8 percent smaller to download and 16 percent smaller on disk on M+ devices.

  • In-app Updates APIGoogle is giving devs more controls to ensure that users run the latest and greatest version of their apps. In-app Updates API will give devs two options.

    The first is a full-screen experience for critical updates when you expect the user to want the update to be applied immediately.

    The second option is a flexible update. It lets users continue with the existing installed version while the update is downloaded. The developer can use it to ensure that users have the most up-to-date version, because the app can be pushed to install in the background using the automatic updates feature.

  • Instant DiscoveryGoogle hopes to make instant apps easier to adopt. The company recently made the use of Web URLs optional, so devs can send their existing Play Store deep link traffic to their instant experience if it is available.

    Also, Google raised the instant app size limit to 10 MB for the Try Now button on the Play Store and Web banners for greater adoption ease.

Jack M. Germain has been an ECT News Network reporter since 2003. His main areas of focus are enterprise IT, Linux and open source technologies. He has written numerous reviews of Linux distros and other open source software.
Email Jack.

Source

Download Neptune 5.6

Neptune is an open source distribution of Linux based on the world’s most popular operating system, Ubuntu, and built around the modern and productive KDE Plasma desktop environment, which uses a very attractive theme and layout.

Available for download as a dual-arch Live DVD

The user can download this Ubuntu-based operating system via Softpedia or directly from the project’s official website as a dual-arch Live DVD ISO image of approximately 2GB in size, designed from the ground up to support both 64-bit and 32-bit computers, which must be writte on either a DVD disc or a USB stick.

Boot options

From the boot screen, the user can start the live environment with default boot options or in safe mode, with support for the English or German languages, as well as to access a help menu. The Live DVD also includes supports for many other languages.

The best KDE Plasma desktop experience

If you want to experience the true power of the KDE Plasma desktop environment, you should try the ZevenOS Neptune distribution, as it comes with a highly customized KDE session that comprises of a single panel located on the bottom edge of the screen, from where users can launch apps or interact with running programs.

Includes state-of-the-art applications

The operating system includes state-of-the-art applications, such as the Ardour professional audio editing suite, Encode media converting software, Recffmpeg screencasting tool, Dolphin superior file manager, LibreOffice office suite, Chromium web browser, Apper software center, Icedove email and news client, VLC Media Player, and many more.

Bottom line

All in all, ZevenOS Neptune is built on top of a recent Linux kernel and includes a large number of popular open-source applications. The distro uses the KDE Plasma desktop environment with a customized notification center and a superb theme on top of the latest upstream Ubuntu version.

Source

du Command on Linux | Linux Hint

Every Linux distro comes up with a number of tools integrated into the system. Each of the tools has their own purposes. “du” is such a tool that’s part of the standard Unix/Linux. This tool is used for getting info on disk usage and directories on machines. There are a number of available parameters that you can use for getting results in many formats. Here are some of the most useful commands of “du”.

  • Need to find out the disk usage summary of a directory? Run the following command –

In the output, the first column is the disk usage amount and the second column is the list of files present in that directory.

The first column shows the number of disk blocks the corresponding file is occupying.

  • Need the output in a format that human can understand? Use the “-h” option. It tells the “du” to show output in “Human Readable Format”.
  • Use “-a” flag for displaying the disk usage of all the files and directories.

As you’ve noticed, you can use multiple flags together with “du”.

  • For identifying just how much disk space is a directory consuming, use “-s” flag.
  • You can also use the “-c” flag for getting the total size of the directory at the last line of the output.
    • Need to check out the last time of file modification? You have to use “–time” flag.

du -ha –time ~/Downloads/

    • Are you interested in excluding specific file types, for example, MP4 or PDF? Then use the “–exclude=PATTERN” parameter.

du -ha –exclude=*.svg ~/Downloads/

More “du” commands

“du” offers a huge collection of features. You can find out all of them using the man page for “du”.

Every time you need help, you don’t have to open up a terminal and run the command again. You can dump the guide into a text file. Run the following command –

man du > ~/Desktop/du.txt

Enjoy!

Source

New Part Day: A $6 Linux Computer You Might Be Able To Write Code For

The latest news from the world of cheap electronics is a single board computer running Linux. It costs six dollars, and you can buy it right now. You might even be able to compile code for it, too.

The C-Sky Linux development board is listed on Taobao as an ‘OrangePi NanoPi Raspberry Pi Linux Development Board” and despite some flagrant misappropriation of trademarks, this is indeed a computer running Linux, available for seven American dollars.

This board is based on a NationalChip GX6605S SoC, a unique chip with an ISA that isn’t ARM, x86, RISC-V, MIPS, or anything else that would be considered normal. The chip itself was designed for set-top boxes, but there are a surprising number of build tools that include buildroot, GCC and support for qemu. The company behind this chip is maintaining a kernel, and support for this chip has been added to the mainline kernel. Yes, unlike many other single board computers out there, you might actually be able to compile something for this chip.

The features for this board include 64 MB of DDR2 RAM, HDMI out (with a 1280 x 720 framebuffer, upscaled to 1080p, most likely), and a CPU running at just about 600 MHz. There are a few buttons connected to the GPIO pins, two USB host ports, a USB-TTL port for a serial console, and a few more pins for additional GPIOs. There does not appear to be any networking, and we have no idea what the onboard storage is.

If you want a challenge to get something compiled, this is the chip for you.

Source

Download FFmpeg Linux 4.1

FFmpeg is an open source utility that allows Linux, Windows and Mac OS X users to playback, convert, record and stream video and audio files. It is used in almost all Linux distributions. It is a command-line software that can encode, decode, demux, mux, transcode, stream, play and filter almost any media format available. FFmpeg uses libavcodec, the most advanced audio/video codec library for Linux and UNIX-like systems.

Features at a glance

The software is comprised of a multimedia streaming server for live broadcasts, a simple media player based on the powerful SDL library, a simple multimedia stream analyzer, a library that contains functions for simplifying programming, and another library that includes muxers and demuxers for multimedia container formats. Additionally, it comes with support for input and output devices, media filters, a library for performing highly optimized image scaling and color space/pixel format conversion operations, and a library for performing highly optimized audio rematrixing, resampling and sample format conversions.

Used by a wide range of applications to manipulate video files

These days numerous audio/video conversion utilities, as well as video playback apps are based or use the FFmpeg project, in a way or another. For example, Cinelerra is a very powerful application that uses FFmpeg for professional video editing operations. Among other popular FFmpeg-based projects, we can mention VLC Media Player, the Chromium and Google Chrome web browsers, Electric Sheep, ffdshow, HandBrake, Kdenlive, libquicktime, MPlayer, MythTV, OpenH323, QtAV, VeeJay, xine, XBMC, as well as the GStreamer framework that is used in many modern Linux-based operating systems.

Comes pre-installed on many Linux distributions

Experienced Linux users can learn to use FFmpeg directly from the command-line, as the project provides a comprehensive manual and online documentation. It has been created by the same team of developers that started the MPlayer project, a powerful audio/video player on which many applications are based. FFmpeg comes pre-installed on many Linux distributions. If not, it will be automatically added when you install one of the aforementioned FFmpeg-based applications.

Source

Automate Sysadmin Tasks with Python’s os.walk Function

Using Python’s os.walk function to walk through a tree of files and
directories.

I’m a web guy; I put together my first site in early 1993. And
so, when I started to do Python training, I assumed that most of my
students also were going to be web developers or aspiring web
developers. Nothing could be further from the truth. Although some of my
students certainly are interested in web applications, the majority of them
are software engineers, testers, data scientists and system
administrators.

This last group, the system administrators, usually comes into my
course with the same story. The company they work for has been writing Bash
scripts for several years, but they want to move to a higher-level
language with greater expressiveness and a large number of third-party
add-ons. (No offense to Bash users is intended; you can do amazing
things with Bash, but I hope you’ll agree that the scripts can become
unwieldy and hard to maintain.)

It turns out that with a few simple tools and ideas, these system
administrators can use Python to do more with less code, as well as create
reports and maintain servers. So in this article, I describe
one particularly useful tool that’s often overlooked: os.walk, a
function that lets you walk through a tree of files and
directories.

os.walk Basics

Linux users are used to the ls command to get a list of files in a
directory. Python comes with two different functions that can return
the list of files. One is os.listdir, which means the “listdir”
function in the “os” package. If you want, you can pass the name of a
directory to os.listdir. If you don’t do that, you’ll get the names
of files in the current directory. So, you can say:

In [10]: import os

When I do that on my computer, in the current directory, I get the following:

In [11]: os.listdir(‘.’)
Out[11]:
[‘.git’,
‘.gitignore’,
‘.ipynb_checkpoints’,
‘.mypy_cache’,
‘Archive’,
‘Files’]

As you can see, os.listdir returns a list of strings, with each
string being a filename. Of course, in UNIX-type systems, directories
are files too—so along with files, you’ll also see subdirectories
without any obvious indication of which is which.

I gave up on os.listdir long ago, in favor of
glob.glob, which means
the “glob” function in the “glob” module. Command-line users are used
to using “globbing”, although they often don’t know its name. Globbing
means using the * and ? characters, among others, for more flexible
matching of filenames. Although os.listdir can return the list of
files in a directory, it cannot filter them. You can though with
glob.glob:

In [13]: import glob

In [14]: glob.glob(‘Files/*.zip’)
Out[14]:
[‘Files/advanced-exercise-files.zip’,
‘Files/exercise-files.zip’,
‘Files/names.zip’,
‘Files/words.zip’]

In either case, you get the names of the files (and subdirectories) as
strings. You then can use a for loop or a list comprehension to iterate
over them and perform an action. Also note that in contrast with
os.listdir, which returns the list of filenames without any path,
glob.glob returns the full pathname of each file, something I’ve
often found to be useful.

But what if you want to go through each file, including every file in
every subdirectory? Then you have a bit more of a problem. Sure, you could
use a for loop to iterate over each filename and then use
os.path.isdir to figure out whether it’s a subdirectory—and if so,
then you could get the list of files in that subdirectory and add them
to the list over which you’re iterating.

Or, you can use the os.walk function, which does all of this and
more. Although os.walk looks and acts like a function, it’s actually a
“generator function”—a function that, when executed, returns a
“generator” object that implements the iteration protocol. If you’re
not used to working with generators, running the function can be
a bit surprising:

In [15]: os.walk(‘.’)
Out[15]: <generator object walk at 0x1035be5e8>

The idea is that you’ll put the output from os.walk in a
for
loop. Let’s do that:

In [17]: for item in os.walk(‘.’):
…: print(item)

The result, at least on my computer, is a huge amount of output,
scrolling by so fast that I can’t read it easily. Whether that
happens to you depends on where you run this for loop on your
system and how many files (and subdirectories) exist.

In each iteration, os.walk returns a tuple containing three
elements:

  • The current path (that is, directory name) as a string.
  • A list of subdirectory names (as strings).
  • A list of non-directory filenames (as strings).

So, it’s typical to invoke os.walk such that each of these three
elements is assigned to a separate variable in the for loop:

In [19]: for currentdir, dirnames, filenames in os.walk(‘.’):
…: print(currentdir)

The iterations continue until each of the subdirectories under the
argument to os.walk has been returned. This allows you to perform
all sorts of reports and interesting tasks. For example, the above
code will print all of the subdirectories under the current directory,
“.”.

Counting Files

Let’s say you want to count the number of files (not subdirectories)
under the current directory. You can say:

In [19]: file_count = 0

In [20]: for currentdir, dirnames, filenames in os.walk(‘.’):
…: file_count += len(filenames)
…:

In [21]: file_count
Out[21]: 3657

You also can do something a bit more sophisticated, counting how many
files there are of each type, using the extension as a classifier. You
can get the extension with os.path.splitext, which returns two
items—the filename without the extension and the extension itself:

In [23]: os.path.splitext(‘abc/def/ghi.jkl’)
Out[23]: (‘abc/def/ghi’, ‘.jkl’)

You can count the items using one of my favorite Python data structures,
Counter. For example:

In [24]: from collections import Counter

In [25]: counts = Counter()

In [26]: for currentdir, dirnames, filenames in os.walk(‘.’):
…: for one_filename in filenames:
…: first_part, ext =
↪os.path.splitext(one_filename)
…: counts[ext] += 1

This goes through each directory under “.”, getting the
filenames. It then iterates through the list of filenames, splitting
the name so that you can get the extension. You then add 1 to the counter
for that extension.

Once this code has run, you can ask counts for a report. Because it’s
a dict, you can use the items method and print the keys and values
(that is, extensions and counts). You can print them as follows:

In [30]: for extension, count in counts.items():
…: print(f””)

In the above code, f strings displays the extension (in
a field of eight characters) and the count.

Wouldn’t it be nice though to show only the ten most common
extensions? Yes, but then you’d have to sort through the counts
object. It’s much easier just to use the most_common method that
the Counter object provides, which returns not only the keys and
values, but also sorts them in descending order:

In [31]: for extension, count in counts.most_common(10):
…: print(f””)
…:
.py 1149
867
.zip 466
.ipynb 410
.pyc 372
.txt 151
.json 76
.so 37
.conf 19
.py~ 12

In other words—not surprisingly—this example shows that the most common file extension
in the directory I use for teaching Python courses is .py. Files
without any extension are next, followed by .zip, .ipynb (Jupyter
notebooks) and .pyc (byte-compiled Python).

File Sizes

You can ask more interesting questions as well. For example, perhaps
you want to know how much disk space is used by each of these file
types. Now you don’t add 1 for each time you encounter a file
extension, but rather the size of the file. Fortunately, this turns
out to be trivially easy, thanks to the os.path.getsize
function (this returns the same value that you would get from
os.stat):

for currentdir, dirnames, filenames in os.walk(‘.’):
for one_filename in filenames:
first_part, ext = os.path.splitext(one_filename)
try:
counts[ext] +=
↪os.path.getsize(os.path.join(currentdir,one_filename))
except FileNotFoundError:
pass

The above code includes three changes from the previous version:

  1. As indicated, this no longer adds 1 to the count for each extension,
    but rather the size of the file, which comes from
    os.path.getsize.
  2. os.path.join puts the path and filename together
    and (as a
    bonus) uses the current operating system’s path separation character.
    What are the odds of a program being used on a Windows system and,
    thus, needing a backslash rather than a slash? Pretty slim, but it
    doesn’t hurt to use this sort of built-in operation.
  3. os.walk doesn’t normally look at symbolic links, which means
    you potentially can get yourself into some trouble trying to
    measure the sizes of files that don’t exist. For this reason, here
    the counting is wrapped in a try/except block.

Once this is done, you can identify the file types consuming
the greatest amount of space in the directory:

In [46]: for extension, count in counts.most_common(10):
…: print(f””)
…:
.pack 669153001
.zip 486110102
.ipynb 223155683
.sql 125443333
46296632
.json 14224651
.txt 10921226
.pdf 7557943
.py 5253208
.pyc 4948851

Now things seem a bit different! In my case, it looks like I’ve got a lot of
stuff in .pack
files, indicating that my Git repository (where I store all of my
old training examples, exercises and Jupyter notebooks) is quite
large. I have a lot in zipfiles, in which I store my daily updates.
And of course, lots in Jupyter notebooks, which are written in JSON
format and can become quite large. The surprise to me is the .sql
extension, which I honestly had forgotten that I had.

Files per Year

What if you want to know how many files of each type were modified in
each year? This could be useful for removing logfiles or (if you’re
like me) identifying what large, unnecessary files are taking up
space.

In order to do that, you’ll need to get the modification time
(mtime,
in UNIX parlance) for each file. You’ll then need to convert that
mtime
from a UNIX time (that is, the number of seconds since January 1st, 1970)
to something you can parse and use.

Instead of using a Counter object to keep track of things, you
can just
use a dictionary. However, this dict’s values will be a Counter, with
the years serving as keys and the counts as values. Since you know that
all of the main dicts will be Counter objects, you can just use a
defaultdict, which will require you to write less code.

Here’s how you can do all of this:

from collections import defaultdict, Counter
from datetime import datetime

counts = defaultdict(Counter)

for currentdir, dirnames, filenames in os.walk(‘.’):
for one_filename in filenames:
first_part, ext = os.path.splitext(one_filename)
try:
full_filename = os.path.join(currentdir,
↪one_filename)
mtime =
↪datetime.fromtimestamp(os.path.getmtime(full_filename))
counts[ext][mtime.year] += 1
except FileNotFoundError:
pass

First, this creates counts as an instance of
defaultdict with a
Counter. This means if you ask for a key that doesn’t yet exist,
the key will be created, with its value being a new Counter
that allows you to say something like this:

counts[‘.zip’][2018] += 1

without having to initialize either the zip key (for counts) or the
2018 key (for the Counter object). You can just add one to the count,
and know that it’s working.

Then, when you iterate over the filesystem, you grab the mtime
from the
filename (using os.path.getmtime). That is turned into a
datetime
object with datetime.fromtimestamp, a great function that lets
you
move from UNIX timestamps to human-style dates and times. Finally, you
then add 1 to your counts.

Once again, you can display the results:

for extension, year_counts in counts.items():
print(extension)
for year, file_count in sorted(year_counts.items()):
print(f”tt”)

The counts variable is now a defaultdict, but that means it behaves
just like a dictionary in most respects. So, you can iterate over its
keys and values with items, which is shown here, getting each file
extension and the Counter object for each.

Next the extension is printed, and then it iterates over the years and their
counts, sorting them by year and printing them indented somewhat with
a tab (t) character. In this way, you can see precisely how many
files of each extension have been modified per year—and perhaps
understand which files are truly important and which you easily can get
rid of.

Conclusion

Python can’t and shouldn’t replace Bash for simple scripting, but in
many cases, if you’re working with large number of files and/or
creating reports, Python’s standard library can make it easy to
do such tasks with a minimum of code.

Source

WP2Social Auto Publish Powered By : XYZScripts.com