Home > Projects > Recorder > Recorder Overview
Abstract
This paper describes a flexible program for recording the desktop screen to make tutorial videos. The program provides a cross platform recording system with options such as exporting to Quicktime video and using speech synthesis to provide the audio.
Introduction
This paper describes a flexible system for recording the desktop screen to make tutorial videos. This system was developed to fit the needs we had for creating training material for our applications. In particular, we wanted a system that would meet the following criteria:
- Cross-platform. Our main application is written in Java and we needed something to record and create demos on whatever platform we were running on. Learning and working with different screen recording software for each system is too cumbersome.
- Easy and fast to use. We wanted to minimize the time spent in creating training material so we could focus on the other applications.
- Video can be edited. We wanted to be able to edit the video, especially deleting some frames, so that the video does not have to be perfect when recorded. Also, we needed the ability to add annotations to the frames.
- Audio can be added. We wanted to be able to add audio as it helps to keep the user's attention and some users will not bother reading annotations.
- Cheap. We did not want to pay a lot of money for the software.
As we did not find something to meet all of our goals, we came up with our own system. This paper describes the file formats and interfaces for our system.
Related Work
There are a lot of other programs for recording the content on the screen. Most of the ones that we found are also fairly easy to use for quickly generating a video, allowing a user to speak while recording so audio would be available, and were reasonably priced (under $100). Most of the programs were for a particular operating system. The primary exception was the free vnc2swf program which could record the data from a VNC server to create a Flash movie. The problem is that it does not support audio very well and we would need software for editing Flash movies. However, most of the options we found did not support editing. This means that if we made a mistake while recording, then we would have to redo the video.
File Formats
This section describes the various different file formats
that are used by the software. The types used
in this section correspond to the
Java types
which means that all of the fields use a big endian byte
order, shorts are 2 bytes, integers are 4 bytes, and
longs are 8 bytes. There is also an Image type which
is just Width X Height short values written out for
each pixel. The pixels are written out for each row of
the image from row 0 to row Height - 1 starting at
column 0 to Width - 1.
As the pixel colors are written using 16 bits and we capture the image with 32 bit colors we need to reduce the number of colors used. The 32 bits consists of four one byte channels including an alpha channel for the opacity level. The other three channels are for red, green, and blue respectively. For the 16 bits we allocate 6 bits for red, 5 bits for green, and 5 bits for blue. The opacity can be ignored for our purposes. This provides a reasonable looking image with half the storage space.
Raw Recording Format
The raw recording format is used when the file is being recorded. This format was chosen to be simple and quick to export to for the recorder software. The table below shows the basic format.
| Number | Field Name | Type |
|---|---|---|
| 1 | Width | Int |
| 2 | Height | Int |
| 3 | Time | Long |
| 4 | Mouse X | Int |
| 5 | Mouse Y | Int |
| 6 | Image | Image |
| 7 | Frame Count | Int |
The header consists of the width and height of the images that are captured. Fields 3-6 are repeated for each image. The last number is the number of images that are in the file. Storing the frame count last is not convenient for a program that is reading this but it is convenient for the recording software. Detailed field descriptions are given below:
- Width: The width of each image.
- Height: The height of each image.
- Time: A time stamp for when the image was taken. This is important so that we can play back the video at the correct rate if it is not captured at a consistent rate.
- Mouse X: The X coordinate of the mouse relative
to the image. The pixel at the top left of the
image is
(0, 0)and the bottom right pixel is at(Width-1, Height-1). - Mouse Y: The Y coordinate of the mouse relative to the image.
- Image: The actual image data.
- Frame Count: The number of frames in the file.
Editing Format
The editing format is similar to the recording format but it is organized in a way that is more convenient for editing as opposed to recording. A directory is created that contains a contents file, image files, and annotation files. The tables below show the formats for these files.
| Contents File | ||
|---|---|---|
| Number | Field Name | Type |
| 1 | Width | Int |
| 2 | Height | Int |
| 3 | Frame Count | Int |
| Image File | ||
|---|---|---|
| Number | Field Name | Type |
| 4 | Type | Byte |
| 5 | Time | Long |
| 6 | Image | Image |
| Annotation File | ||
|---|---|---|
| Number | Field Name | Type |
| 4 | Type | Byte |
| 7 | X | Int |
| 8 | Y | Int |
| 9 | Size | Int |
| 10 | Delay | Int |
| 11 | Foreground | Int |
| 12 | Background | Int |
| 13 | Message | String |
The contents file must be named contents. Image
files are named frame# where # is the number
of the frame from 0 to FrameCount - 1. Annotation
files are named annotation# where # is the number
of the frame the annotation is for. Detailed field
descriptions are given below:
- Width: The width of each image.
- Height: The height of each image.
- Frame Count: The number of frames in the recording.
- Type: The type of the file, 0 for images and 1 for annotations.
- Time: A time stamp for when the image was taken. This is important so that we can play back the video at the correct rate if it is not captured at a consistent rate.
- Image: The actual image data.
- X: The X coordinate of the top left corner of the annotation box.
- Y: The Y coordinate of the top left corner of the annotation box.
- Size: The font size for the annotation.
- Delay: The delay or how long to show the annotation when playing the video. For example a long message should be shown longer than a short one.
- Foreground: The color of the text for the annotation.
- Background: The color of the background for the annotation.
- Message: The actual text to show. Note that line feeds are used to determine where to wrap the message when it is displayed.
Note that we no longer store the location of the mouse cursor. When the conversion from the recording format takes place a cursor image is drawn onto the screen capture at that location so the image will have the cursor in it. You can change the image that is used for the cursor for various videos to get the effect you want. For example, a completely transparent image could be used so that the cursor would not be visible or a large cursor could be used so it is clearly visible. This also has the benefit that you do not have to change the system cursor to change the video cursor as it is difficult to get custom cursors for some systems and a large cursor that is good for a training video would not be desired for normal use.
Playable Formats
Both the recording and editing formats use a lot of file space. For the playable formats one of the primary goals is to compress the data so that it can be sent to someone for viewing at a reasonable size. Our system has two playable formats one is our proprietary format and the other is the quicktime movie format.
Quicktime
Quicktime is a good format because it is a standard and there are players available for pretty much every platform. And since the Java Media Framework (JMF) can play quicktime there is a player on all platforms that our software would run on. The quicktime video is generated by converting the image files to JPEG images and putting these images together into a video. If a file named audio.wav is also placed in the editing directory then this will also be encoded into the video. Note that this file has to be recorded using some other audio recording software.
Proprietary
We also have our own proprietary format. The proprietary format is used because it has a number of advantages to the quicktime format. The format is basically to create a zip file of the editing directory. This allows us to get the editing files back if we want to change the video later. Furthermore, this format has the actual text of the messages and so we can have our player use a text to speech engine to read the annotations to the user when playing. This saves us a lot of work because we do not have to record a separate audio track to put with the video. Also if the video is edited by removing frames or changing the text of annotations there is no need to re-record the audio. The primary disadvantage is that you have to have our player which is not as likely to be on a users system and it cannot be streamed over the web like the quicktime format.
Interface
This section describes the interface that we have for the system. We have three separate programs a recorder, editor, and player each of which provides a simple Swing interface.
Recorder
Figure 1 shows the basic interface for the recorder application. First an output file must be selected. By default the software will create a file called ''recording1. srec in the current working directory. The srec'' file extension is used to indicate it is a screen recording. You can also choose to record either the entire desktop or a particular selection. When you are ready to record press the record button. When you are done recording press the stop button. If record selection was indicated you will get a dialog after pressing record that asks you to select the region you wish to record.
Figure 1: Recorder interface.
Figure 2: Screen selection interface.
Figure 2 shows the recorder screen selection dialog. It shows a current screen capture and allows you to select a particular region. A rectangle will allow you to see the currently selected region. To the left of the buttons a zoomed in view of the area under the mouse is shown with a red box on the pixel where the mouse tip is pointing. Pressing cancel will close the dialog and will not start recording. Pressing OK will start recording the selected region.
Editor
Figure 3 shows the editor interface. Be sure to notice that the area in the top part of the figure is the selection that was chosen in figure 2. You can see the big mouse cursor that was put in as part of the image in the conversion process that is done automatically when a raw recording file is opened. A slider allows you to change the current frame that is being viewed. The bottom gives information such as the current frame number, the frame count, frame dimensions, and the time stamp for the current frame.
Figure 3: Editor interface.
Figure 4 shows the editor operations menu. Open recording allows you to open a raw recording which will be converted to the editing format or a directory that is already in the editing format. When you have selected a particular frame you can insert or edit the annotation for that frame. Figure 5 shows the editor annotation dialog. It allows you to set the coordinates of the upper left corner, font size, delay, foreground color, background color, and the actual message to be shown. The message area will use the selected colors. You can also delete the annotation for the frame. A frame can be deleted by using the menu or by pressing the DELETE key on the keyboard.
Figure 4: Editor operations menu.
Figure 5: Editor annotation interface.
When you are done editing you can export the video as a video archive or as a quicktime video. Note that there is no save function. All operations are written to disk as you do them, once you delete a frame it is deleted and you cannot undo it, so it is wise to work on a backup copy. Note that the original raw recording is not altered and so you could just re-open that file if you screw up and start over.
Player
Figure 6 shows the player interface. You can open a video archive that was created by the editor and play it like most other video player software. You can also pause or stop the video at any time. Note that this player will only play the proprietary format it will not play the quicktime formats as there are a lot of other players for that format that are much better.
Figure 6: Player interface.
If an audio.wav file was in the editing directory when the video was created it will play the audio file for the audio. If no audio file was present it will use the FreeTTS software to synthesize the annotation messages and use that for the audio. Note the delays for the annotations will be ignored in this case and the frame will be shown until the speech synthesis engine has finished reading the annotation.
Future Work
Future work is needed in a number of areas. With audio we are looking for a way to use the speech synthesis engine to generate an audio file. By doing this we could automatically generate the audio file that would be used with the Quicktime format which would make updating the video much more convenient. We are also looking at ways we can improve the editor application to make it easier and faster to use or to add new features that will make the videos more effective for training. In particular, some sort of picture-in-picture option would be nice so that we could blend several videos. As we want to generate training for collaborative applications this would be useful so they could see several screens and the effect an action on one system has on another. Multiple annotations per frame would also be useful in some cases and should be fairly easy to implement. For separate audio files that are generated by the user a recording interface would also be useful. We are also looking at ways to improve the screen capture rate so that we can get higher quality video.
Conclusion
This software meets the goals that we wanted. It can run on any system with Java 5.0 or higher which matches the needs of our application. It is pretty easy to use and we can edit the video as well as put in audio or use speech synthesis for the audio. Furthermore, it is free as we wrote the software. There are a number of areas that could use improvement and these will hopefully be addressed in the near future.