Virtualizing the Clock – Linux Journal

Dmitry Safonov wanted to implement a namespace for time information. The
twisted and bizarre thing about virtual machines is that they get more
virtual all the time. There’s always some new element of the host system
that can be given its own namespace and enter the realm of the virtual
machine. But as that process rolls forward, virtual systems have to share
aspects of themselves with other virtual systems and the host system
itself—for example, the date and time.

Dmitry’s idea is that users should be able to set the day and time on their
virtual systems, without worrying about other systems being given the same
day and time. This is actually useful, beyond the desire to live in the past
or future. Being able to set the time in a container is apparently one of
the crucial elements of being able to migrate containers from one physical
host to another, as Dmitry pointed out in his post.

As he put it:

The kernel provides access to several clocks:
CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each running
system. When a container is migrated from one node to another, all clocks
have to be restored into consistent states; in other words, they have to
continue running from the same points where they have been dumped.

Dmitry’s patch wasn’t feature-complete. There were various questions still
to consider. For example, how should a virtual machine interpret the time
changing on the host hardware? Should the virtual time change by the same
offset? Or continue unchanged? Should file creation and modification times
reflect the virtual machine’s time or the host machine’s time?

Eric W. Biederman supported this project overall and liked the code in the
patch, but he did feel that the patch could do more. He thought it was a little
too lightweight. He wanted users to be able to set up new time namespaces at
the drop of a hat, so they could test things like leap seconds before
they actually occurred and see how their own projects’ code worked under
those various conditions.

To do that, he felt there should be a whole “struct timekeeper” data
structure for each namespace. Then pointers to those structures could be
passed around, and the times of virtual machines would be just as
manipulable and useful as times on the host system.

In terms of timestamps for filesystems, however, Eric felt that it might
be best to limit the feature set a little bit. If users could create files
with timestamps in the past, it could introduce some nasty security
problems. He felt it would be sufficient simply to “do what distributed
filesystems do when dealing with hosts with different clocks”.

The two went back and forth on the technical implementation details. At one
point, Eric remarked, in defense of his preference:

My experience with
namespaces is that if we don’t get the advanced features working there is
little to no interest from the core developers of the code, and the
namespaces don’t solve additional problems. Which makes the namespace a
hard sell. Especially when it does not solve problems the developers of the
subsystem have.

At one point, Thomas Gleixner came into the conversation to remind Eric that
the time code needed to stay fast. Virtualization was good, he said, but
“timekeeping_update() is already heavy and walking through a gazillion of
namespaces will just make it horrible.”

He reminded Eric and Dmitry that:

It’s not only timekeeping, i.e. reading time, this is also affecting all
timers which are armed from a namespace.

That gets really ugly because when you do settimeofday() or adjtimex() for a
particular namespace, then you have to search for all armed timers of that
namespace and adjust them.

The original posix timer code had the same issue because it mapped the clock
realtime timers to the timer wheel so any setting of the clock caused a full
walk of all armed timers, disarming, adjusting and requeing them. That’s
horrible not only performance wise, it’s also a locking nightmare of all
sorts.

Add time skew via NTP/PTP into the picture and you might have to adjust
timers as well, because you need to guarantee that they are not expiring
early.

So, there clearly are many nuances to consider. The discussion ended there,
but this is a good example of the trouble with extending Linux to create
virtual machines. It’s almost never the case that a whole feature can be
fully virtualized and isolated from the host system. Security concerns,
speed concerns, and even code complexity and maintainability come into the
picture. Even really elegant solutions can be shot down by, for example, the
possibility of hostile users creating files with unnaturally old timestamps.

Note: if you’re mentioned above and want to post a response above the comment section, send a message with your response text to ljeditor@linuxjournal.com.

Source

Leave a Reply Cancel reply