Categories
JavaScript Nodejs

Node.js FS Module — Read Streams

Spread the love

Manipulating files and directories are basic operations for any program. Since Node.js is a server-side platform and can interact with the computer that it’s running on directly, being able to manipulate files is a basic feature.

Fortunately, Node.js has a fs module built into its library. It has many functions that can help with manipulating files and folders. File and directory operations that are supported include basic ones like manipulating and opening files in directories.

Likewise, it can do the same for files. It can do this both synchronously and asynchronously. It has an asynchronous API that has functions that support promises.

Also, it can show statistics for a file. Almost all the file operations that we can think of can be done with the built-in fs module. In this article, we will create read streams to read a file’s data sequentially and listen to events from a read stream. Since Node.js ReadStreams are descendants of the Readable object, we will also listen to events to it.

Streams are collections of data that may not be available all at once and don’t have to fit in memory. This makes stream handy for processing large amounts of data.

It’s handy for files because files can be big and streams can let us get a small amount of data at one time. In the fs module, there are 2 kinds of streams. There’s the ReadStream and the WriteStream.

ReadStream

ReadStreams are for reading in data from a file and then outputting them a small part at a time. A ReadStream can read a small part of a file or it can read in the whole file.

To create a ReadStream, we can use the fs.createReadStream function. The function takes in 2 arguments. The first argument is the path of the file.

The path can be in the form of a string, a Buffer object, or an URL object.

The second argument is an object that can have a variety of options as properties. The flag option is the file system flag for setting the mode for opening the file. The default flag is r. The list of flags are below:

  • 'a' – Opens a file for appending, which means adding data to the existing file. The file is created if it does not exist.
  • 'ax' – Like 'a' but exception is thrown if the path exists.
  • 'a+' – Open file for reading and appending. The file is created if it doesn’t exist.
  • 'ax+' – Like 'a+' but exception is thrown if the path exists.
  • 'as' – Opens a file for appending in synchronous mode. The file is created if it does not exist.
  • 'as+' – Opens a file for reading and appending in synchronous mode. The file is created if it does not exist.
  • 'r' – Opens a file for reading. An exception is thrown if the file doesn’t exist.
  • 'r+' – Opens a file for reading and writing. An exception is thrown if the file doesn’t exist.
  • 'rs+' – Opens a file for reading and writing in synchronous mode.
  • 'w' – Opens a file for writing. The file is created (if it does not exist) or overwritten (if it exists).
  • 'wx' – Like 'w' but fails if the path exists.
  • 'w+' – Opens a file for reading and writing. The file is created (if it does not exist) or overwritten (if it exists).
  • 'wx+' – Like 'w+' but exception is thrown if the path exists.

The encoding option is a string that sets the character encoding in the form of the string. The default value is null .

The fd option is the integer file descriptor which can be obtained with the open function and its variants. If the fd option is set, then the path argument will be ignored. The default value is null .

The mode option is the file permission and sticky bits of the file, which is an octal number that are the same as Unix or Linux file permissions. It’s only set if the file is created. The default value is 0o666. The autoClose option specifies that the file descriptor will be closed automatically. The default value is true .

If it’s false , then the file descriptor won’t be closed even if there’s an error. It’s completely up to us to close it it autoClose is set to false to make sure there’s no file descriptor leak. Otherwise, the file descriptor will be closed automatically if there’s an error or end event emitted.

The emitClose option will emit the close event when the read stream ends. The default value is false .

The start and end options specifies the beginning and end parts of the file to read. Everything in between will be read in addition to the start and end . start and end are numbers that are the starting and ending bytes of the file to read.

The highWaterMark option is limit to the number of bytes that are read in the stream. The read stream will continue to be read and buffered if the highWaterMark value is reached, but the memory usage will be high and the garbage collection performance will be poor, or it can crash your program with the Allocation failed - JavaScript heap out of memory error.

The createReadStream function returns a ReadStream object where you can attach event handlers to it.

To create a ReadStream, we can use the createReadStream like in the following code:

const fs = require("fs");  
const file = "./files/file.txt";  
const stream = fs.createReadStream(file, {  
  flags: "r",  
  encoding: "utf8",  
  mode: 0o666,  
  autoClose: true,  
  emitClose: true,  
  start: 0  
});

stream.on("open", () => {  
  console.log("Stream opened");  
});

stream.on("ready", () => {  
  console.log("Stream ready");  
});

stream.on("data", data => {  
  console.log(data);  
});

stream.on("readable", () => {  
  while ((chunk = stream.read())) {  
    console.log(chunk);  
  }  
});

stream.on("close", () => {  
  console.log("Stream closed");  
});

When we run the code above, we should get something like the following outputted to the screen, assuming that you have ‘datadatadatadata’ written your a files.txt file:

Stream opened  
Stream ready  
datadatadatadata  
datadatadatadata  
Stream closed  
Stream closed

ReadStream Events

With a ReadStream, we can listen to the following events. There’s a close event that is emitted when the close event is emitted after the file is read.

The open event is emitted when the stream is opened. The file descriptor number fd will be passed with the event when it’s emitted. The ready event is emitted when the ReadStream is ready to be used. It’s fired immediately after the open event is fired.

The ReadStream extends the stream Readable object, which emits events of its own. The data event is emitted whenever the stream data is sent to the consumer. It’s emitted when the readable.pipe() function or readable.resume() are called, or by attaching a listener callback to the data event.

The data event will also be emitted when the readable.read() function is called and a chunk of data is available to be returned. The end event is emitted when there’s no more data to be consumed from the stream. It won’t be emitted until the data is completely consumed.

This can be done by switch the stream to flowing more or calling stream.read() repeated until all the data are consumed.

The error event is emitted whenever an error occurs during the streaming or consumption of the stream. It can be because the stream can’t generate data due to internal failure or a stream attempts to push invalid chunks of data. The pause event is emitted whenever ReadStream.pause() is called and readableFlowing isn’t false .

readableFlowing can have one of 3 states. One is null. When it’s null , this means that no mechanism for consuming the stream’s data is provided and therefore the stream won’t generate data.

When readableFlowing is null, attaching a listener for the 'data' event, calling the readable.pipe() method, or calling the readable.resume() method will switch readable.readableFlowing to true, causing the ReadStream to start emitting events as data is generated.

Calling readable.pause(), readable.unpipe(), or receiving backpressure, which is the situation where data fills the buffer, readable.readableFlowing to be set as false, temporarily halting the flow of events but not halting the generation of data.

Attaching a listener for the 'data' event will not switch readable.readableFlowing to true when readable.readableFlowing is set as false .

The readable event is emitted when there’s data available to be read from the stream or the end of the stream has been reached. Attaching an event listener for the readable event may cause some amount of data to be read into an internal buffer.

It will also be emitted when the end of the stream is reached but before the end event is emitted. The resume event is emitted when ReadStream.resume() is called and the readableFlowing isn’t true .

A ReadStream object also has the following properties. The bytesRead property let us get the number of bytes read so far.

The path property is a string or a buffer that gets us the reference to the file. It’s the same as the first argument of createReadStream() .

The data type will also be the same as what we pass in as the first argument. The pending property is a boolean which is true if the underlying file hasn’t been opened yet, or before the ready event is emitted.

By using the fs.createReadStream function, we created read streams to read a file’s data sequentially and listen to events from a read stream. Since Node.js ReadStreams are descendants of the Readable object, we will also listen to events to it.

We have lots of control over how the read stream is created. We can set the path or file descriptor of the file. Also, we can set the mode of the file to be read and the permission and sticky bit of the file being read.

Also, we can choose to close the streams automatically or not or emit close event automatically. We can also set the highWaterMark option which sets the event of maximum buffer size for storing the read data.

Also, we can call pipe to move data to a writable stream, and pause the streaming of data with the pause function, and resume streaming with the resume function.

By John Au-Yeung

Web developer specializing in React, Vue, and front end development.

Leave a Reply

Your email address will not be published. Required fields are marked *