Q1. What is AOP i.e. aspect oriented programming and what is the need for same?
For ex: Logging. It crosscuts all logged classes and methods. Suppose we do logging at both the beginning and the end of each function body. This will result in crosscutting all classes that have at least one function. Other typical crosscutting concerns include context-sensitive error handling, performance optimization, and design patterns.
Drawbacks:
Ans. AOP is a new technology for separating crosscutting concerns into single units called aspects. It encapsulates behaviours that affect multiple classes into reusable modules. It entails breaking down program logic into distinct parts known as crosscutting concerns that are usually hard to do in object-oriented programming. These units/concerns are termed aspects; hence the name aspect oriented programming.
For ex: Logging. It crosscuts all logged classes and methods. Suppose we do logging at both the beginning and the end of each function body. This will result in crosscutting all classes that have at least one function. Other typical crosscutting concerns include context-sensitive error handling, performance optimization, and design patterns.
With AOP, we start by implementing our project using our OO language (for example, Java), and then we deal separately with crosscutting concerns in our code by implementing aspects. Finally, both the code and aspects are combined into a final executable form using an aspect weaver.
Above figure explains the weaving process. You should note that the original code doesn't need to know about any functionality the aspect has added; it needs only to be recompiled without the aspect to regain the original functionality. In that way, AOP complements object-oriented programming and doesn't replace it.
Q2. What are the differences between OOP and AOP?
Ans:
Q3. How many different types of JDBC drivers are present? Discuss them.
Ans. There are four JDBC driver types.
Type 1: JDBC-ODBC Bridge plus ODBC Driver: The first type of JDBC driver is the JDBC-ODBC Bridge. It is a driver that provides JDBC access to databases through ODBC drivers. The ODBC driver must be configured on the client for the bridge to work. This driver type is commonly used for prototyping or when there is no JDBC driver available for a particular DBMS.
Type 2: Native-API partly-Java Driver: The Native to API driver converts JDBC commands to DBMS-specific native calls. This is much like the restriction of Type 1 drivers. The client must have some binary code loaded on its machine. These drivers do have an advantage over Type 1 drivers because they interface directly with the database.
Type 3: JDBC-Net Pure Java Driver: The JDBC-Net drivers are a three-tier solution. This type of driver translates JDBC calls into a database-independent network protocol that is sent to a middleware server. This server then translates this DBMS-independent protocol into a DBMS-specific protocol, which is sent to a particular database. The results are then routed back through the middleware server and sent back to the client. This type of solution makes it possible to implement a pure Java client. It also makes it possible to swap databases without affecting the client.
Type 4: Native-Protocol Pure Java Driver: These are pure Java drivers that communicate directly with the vendor’s database. They do this by converting JDBC commands directly into the database engine’s native protocol. This driver has no additional translation or middleware layer, which improves performance tremendously.
Q4. What is Ajax.
Ans: There is a lot of hype surrounding the latest Web development Ajax (Asynchronous JavaScript And XML). The intent of Ajax is to make Web pages more responsive and interactive by exchanging small amounts of data with the server behind the scenes without refreshing the page, so that the entire Web page does not have to be reloaded each time the user makes a change. Ajax technique uses a combination of JavaScript, XHTML (or HTML) and XMLHttp.
Q5. Web services
Ans. Web service is an implementation technology and one of the ways to implement SOA (Service Oriented Architecture). You can build SOA based applications without using Web services – for example by using other traditional technologies like Java RMI, EJB, JMS based messaging, etc. But what Web services offer is the standards based and platform independent service via HTTP, XML, SOAP, WSDL and UDDI, thus allowing interoperability between heterogeneous technologies such as J2EE and .NET.
Web services are language and platform independent. Web service uses language neutral protocols such as HTTP and communicates between disparate applications by passing XML messages to each other via a Web API (Messages must be in XML and binary data attachments). Interfaces must be based on Internet protocols such as HTTP, FTP and SMTP. There are two main styles of Web services: SOAP and REST.
A service is an application that exposes its functionality through an API (Application Programming Interface). A service is a component that can be used remotely through a remote interface either synchronously or asynchronously. The term service also implies something special about the application design, which is called a service-oriented architecture (SOA). One of the most important features of SOA is the separation of interface from implementation. A service exposes its functionality through interface and interface hides the inner workings of the implementation.
For ex: Google also provides a Web service interface through the Google API to query their search engine from an application rather than a browser.
Q6. SOAP
Ans: SOAP stands for Simple Object Access Protocol. It is an XML based lightweight protocol, which allows software components and application components to communicate, mostly using HTTP (can use SMTP etc). SOAP sits on top of the HTTP protocol. SOAP is nothing but XML message based document with pre-defined format. SOAP is designed to communicate via the Internet in a platform and language neutral manner and allows you to get around firewalls as well. Let’s look at thr structure of a SOAP message:
Ans: WSDL stands for Web Services Description Language. A WSDL document is an XML document that describes how the messages are exchanged. Let’s say we have created a Web service. Who is going to use that and how does the client know which method to invoke and what parameters to pass? There are tools that can generate WSDL from the Web service. Also there are tools that can read a WSDL document and create the necessary code to invoke the Web service. So the WSDL is the Interface Definition Language (IDL) for Web services.
Q8. UDDI
Ans: UDDI stands for Universal Description Discovery and Integration. UDDI provides a way to publish and discover information about Web services. UDDI is like a registry rather than a repository. A registry contains only reference information like JNDI etc.
So far we have looked at some open standards/protocols relating to Web services, which enable interoperability between disparate systems (e.g. Between .Net and J2EE etc). These standards provide a common and interoperable approach for defining (WSDL), publishing (UDDI) and using (SOAP) Web services.
Q9. SAX Vs DOM Parser
Ans. Main differences between SAX (Simple API for XML) and DOM (Document Object Model), which are the two most popular APIs for processing XML documents in Java, are:-
Q2. What are the differences between OOP and AOP?
Ans:
OOP
|
AOP
|
OOP looks at an application as a set of collaborating objects. OOP code scatters system level code like logging, security etc with the business logic code.
|
AOP looks at the complex software system as combined implementation of multiple concerns like business logic, data persistence, logging, security, and so on. Separates business logic code from the system level code. In fact one concern remains unaware of other concerns.
|
OOP nomenclature has classes, objects, interfaces etc.
|
AOP nomenclature has join points, point cuts, advice, and aspects.
|
Provides benefits such as code reuse, flexibility, improved maintainability, modular architecture, reduced development time etc with the help of polymorphism, inheritance and encapsulation.
|
AOP implementation coexists with the OOP by choosing OOP as the base language.
Ex: AspectJ uses Java as the base language.
AOP provides benefits provided by OOP plus some additional benefits. |
Ans. There are four JDBC driver types.
Type 1: JDBC-ODBC Bridge plus ODBC Driver: The first type of JDBC driver is the JDBC-ODBC Bridge. It is a driver that provides JDBC access to databases through ODBC drivers. The ODBC driver must be configured on the client for the bridge to work. This driver type is commonly used for prototyping or when there is no JDBC driver available for a particular DBMS.
Type 2: Native-API partly-Java Driver: The Native to API driver converts JDBC commands to DBMS-specific native calls. This is much like the restriction of Type 1 drivers. The client must have some binary code loaded on its machine. These drivers do have an advantage over Type 1 drivers because they interface directly with the database.
Type 3: JDBC-Net Pure Java Driver: The JDBC-Net drivers are a three-tier solution. This type of driver translates JDBC calls into a database-independent network protocol that is sent to a middleware server. This server then translates this DBMS-independent protocol into a DBMS-specific protocol, which is sent to a particular database. The results are then routed back through the middleware server and sent back to the client. This type of solution makes it possible to implement a pure Java client. It also makes it possible to swap databases without affecting the client.
Type 4: Native-Protocol Pure Java Driver: These are pure Java drivers that communicate directly with the vendor’s database. They do this by converting JDBC commands directly into the database engine’s native protocol. This driver has no additional translation or middleware layer, which improves performance tremendously.
Q4. What is Ajax.
Ans: There is a lot of hype surrounding the latest Web development Ajax (Asynchronous JavaScript And XML). The intent of Ajax is to make Web pages more responsive and interactive by exchanging small amounts of data with the server behind the scenes without refreshing the page, so that the entire Web page does not have to be reloaded each time the user makes a change. Ajax technique uses a combination of JavaScript, XHTML (or HTML) and XMLHttp.
Q5. Web services
Ans. Web service is an implementation technology and one of the ways to implement SOA (Service Oriented Architecture). You can build SOA based applications without using Web services – for example by using other traditional technologies like Java RMI, EJB, JMS based messaging, etc. But what Web services offer is the standards based and platform independent service via HTTP, XML, SOAP, WSDL and UDDI, thus allowing interoperability between heterogeneous technologies such as J2EE and .NET.
Web services are language and platform independent. Web service uses language neutral protocols such as HTTP and communicates between disparate applications by passing XML messages to each other via a Web API (Messages must be in XML and binary data attachments). Interfaces must be based on Internet protocols such as HTTP, FTP and SMTP. There are two main styles of Web services: SOAP and REST.
A service is an application that exposes its functionality through an API (Application Programming Interface). A service is a component that can be used remotely through a remote interface either synchronously or asynchronously. The term service also implies something special about the application design, which is called a service-oriented architecture (SOA). One of the most important features of SOA is the separation of interface from implementation. A service exposes its functionality through interface and interface hides the inner workings of the implementation.
For ex: Google also provides a Web service interface through the Google API to query their search engine from an application rather than a browser.
Q6. SOAP
Ans: SOAP stands for Simple Object Access Protocol. It is an XML based lightweight protocol, which allows software components and application components to communicate, mostly using HTTP (can use SMTP etc). SOAP sits on top of the HTTP protocol. SOAP is nothing but XML message based document with pre-defined format. SOAP is designed to communicate via the Internet in a platform and language neutral manner and allows you to get around firewalls as well. Let’s look at thr structure of a SOAP message:
- A SOAP message MUST be encoded using XML
- A SOAP message MUST use the SOAP Envelope namespace
- A SOAP message MUST use the SOAP Encoding namespace
- A SOAP message must NOT contain a DTD reference
- A SOAP message must NOT contain XML Processing Instructions
Ans: WSDL stands for Web Services Description Language. A WSDL document is an XML document that describes how the messages are exchanged. Let’s say we have created a Web service. Who is going to use that and how does the client know which method to invoke and what parameters to pass? There are tools that can generate WSDL from the Web service. Also there are tools that can read a WSDL document and create the necessary code to invoke the Web service. So the WSDL is the Interface Definition Language (IDL) for Web services.
Q8. UDDI
Ans: UDDI stands for Universal Description Discovery and Integration. UDDI provides a way to publish and discover information about Web services. UDDI is like a registry rather than a repository. A registry contains only reference information like JNDI etc.
So far we have looked at some open standards/protocols relating to Web services, which enable interoperability between disparate systems (e.g. Between .Net and J2EE etc). These standards provide a common and interoperable approach for defining (WSDL), publishing (UDDI) and using (SOAP) Web services.
Q9. SAX Vs DOM Parser
Ans. Main differences between SAX (Simple API for XML) and DOM (Document Object Model), which are the two most popular APIs for processing XML documents in Java, are:-
- Read v/s Read/Write: SAX can be used only for reading XML documents and not for the manipulation of the underlying XML data whereas DOM can be used for both read and write of the data in an XML document.
- Sequential Access v/s Random Access: SAX can be used only for a sequential processing of an XML document whereas DOM can be used for a random processing of XML docs. So what to do if you want a random access to the underlying XML data while using SAX? You got to store and manage that information so that you can retrieve it when you need.
- Call back v/s Tree: SAX uses call back mechanism and uses event-streams to read chunks of XML data into the memory in a sequential manner. A SAX parser does not create any internal structure. Instead, it takes the occurrences of components of an input document as events (i.e., event driven), and tells the client what it reads as it reads through the input document, whereas a DOM parser creates a tree structure in memory from an input document and then waits for requests from client and facilitates random access/manipulation of the underlying XML data.
- API: From functionality point of view, SAX provides a fewer functions which means that the users themselves have to take care of more, such as creating their own data structures. A DOM parser is rich in functionality. It creates a DOM tree in memory and allows you to access any part of the document repeatedly and allows you to modify the DOM tree.
- XML-Dev mailing list v/s W3C: SAX was developed by the XML-Dev mailing list whereas DOM was developed by W3C (World Wide Web Consortium).
- Information Set: SAX doesn't retain all the info of the underlying XML document such as comments whereas DOM retains almost all the info. New versions of SAX are trying to extend their coverage of information.
Usual Misconceptions
SAX is always faster: this is a very common misunderstanding and one should be aware that SAX may not always be faster because it might not enjoy the storage-size advantage in every case due to the cost of call backs depending upon the particular situation, SAX is being used in.
DOM always keeps the whole XML doc in memory: it's not always true. DOM implementations not only vary in their code size and performance, but also in their memory requirements and few of them don't keep the entire XML doc in memory all the time. Otherwise, processing/manipulation of very large XML docs may virtually become impossible using DOM, which is of course not the case.
Q10. How to choose one between SAX & DOM?
Ans. It primarily depends upon the requirement. If the underlying XML data requires manipulation then almost always DOM will be used as SAX doesn't allow that. Similarly if the nature of access is random (for example, if you need contextual info at every stage) then DOM will be the way to go in most of the cases. But, if the XML document is only required to be read and that too sequentially, then SAX will probably be a better alternative in most of the cases. SAX was developed mainly for pasring XML documents and it's certainly good at it. Use DOM when your application has to access various parts of the document and using your own structure is just as complicated as the DOM tree. If your application has to change the tree very frequently and data has to be stored for a significant amount of time.
Q11. What is a socket? How do you facilitate inter process communication in Java?
Ans: A socket is a communication channel, which facilitates inter-process communication (ex: communicating between two JVMs). A socket is an endpoint for communication. There are two kinds of sockets, depending on whether one wishes to use a connectionless or a connection-oriented protocol.
1. The connectionless communication protocol of the Internet is called UDP
2. The connection-oriented communication protocol of the Internet is called TCP.
Q10. How to choose one between SAX & DOM?
Ans. It primarily depends upon the requirement. If the underlying XML data requires manipulation then almost always DOM will be used as SAX doesn't allow that. Similarly if the nature of access is random (for example, if you need contextual info at every stage) then DOM will be the way to go in most of the cases. But, if the XML document is only required to be read and that too sequentially, then SAX will probably be a better alternative in most of the cases. SAX was developed mainly for pasring XML documents and it's certainly good at it. Use DOM when your application has to access various parts of the document and using your own structure is just as complicated as the DOM tree. If your application has to change the tree very frequently and data has to be stored for a significant amount of time.
Q11. What is a socket? How do you facilitate inter process communication in Java?
Ans: A socket is a communication channel, which facilitates inter-process communication (ex: communicating between two JVMs). A socket is an endpoint for communication. There are two kinds of sockets, depending on whether one wishes to use a connectionless or a connection-oriented protocol.
1. The connectionless communication protocol of the Internet is called UDP
2. The connection-oriented communication protocol of the Internet is called TCP.
UDP sockets are also called datagram sockets. Each socket is uniquely identified on the entire Internet with two numbers. First number is a 32-bit (IPV4 or 128-bit is IPV6) integer called the IP address is the location of the machine, which you are trying to connect to. Second number is a 16-bit integer called the port of the socket, port on which the server you are trying to connect is running. The port numbers 0 to 1023 are reserved for standard services such as e-mail, FTP, HTTP etc.
Q12. Memory map file.
Ans. Memory-mapping a file uses the OS virtual memory to access the data on the file system directly, instead of using normal I/O functions. Most modern OS that support virtual memory also run each process in its own dedicated address space, allowing a program to be designed as though it has sole access to the virtual memory. Use mmap to make a connection between your address space and the file on the disk. Memory mapped files are loaded into memory one entire page at a time. The page size is selected by the operating system for maximum performance.While memory mapped files offer a way to read and write directly to a file at specific locations, the actual action of reading/writing to the disk is handled at a lower level. Consequently, data is not actually transferred at the time the above instructions are executed. Instead, much of the file input/output (I/O) is cached to improve general system performance. You can override this behavior and force the system to perform disk transactions immediately by using the memory-mapped file function FlushViewOfFile.
Note: jmap prints shared object memory maps or heap memory details of a given process or core file or a remote debug server.
Q12. Memory map file.
Ans. Memory-mapping a file uses the OS virtual memory to access the data on the file system directly, instead of using normal I/O functions. Most modern OS that support virtual memory also run each process in its own dedicated address space, allowing a program to be designed as though it has sole access to the virtual memory. Use mmap to make a connection between your address space and the file on the disk. Memory mapped files are loaded into memory one entire page at a time. The page size is selected by the operating system for maximum performance.While memory mapped files offer a way to read and write directly to a file at specific locations, the actual action of reading/writing to the disk is handled at a lower level. Consequently, data is not actually transferred at the time the above instructions are executed. Instead, much of the file input/output (I/O) is cached to improve general system performance. You can override this behavior and force the system to perform disk transactions immediately by using the memory-mapped file function FlushViewOfFile.
Note: jmap prints shared object memory maps or heap memory details of a given process or core file or a remote debug server.
Benefits:
- Increased I/O Performance: Especially when used on large files. For small files, memory-mapped files can result in a waste of slack space as memory maps are always aligned to the page size, which is mostly 4 KB. Therefore a 5 KB file will allocate 8 KB and thus 3 KB are wasted. Accessing memory mapped files is faster for two reasons. Firstly, it does not involve a separate system call for each access. Secondly, in most OS the memory region mapped actually is the kernel's page cache (file cache), meaning that no copies need to be created in user space. It does not require copying data between buffers – the memory is accessed directly.
- Faster read/write operations: Applications can access and update data in the file directly and in-place, as opposed to seeking from the start of the file or rewriting the entire edited contents to a temporary location. Since the memory-mapped file is handled internally in pages, linear file access requires disk access only when a new page boundary is crossed, and can write larger sections of the file to disk in a single operation.
- Lazy Loading: It uses small amounts of RAM even for a very large file. Trying to load the entire contents of a file that is significantly larger than the amount of memory available can cause severe thrashing.
- The memory mapped approach has its cost in minor page faults - when a block of data is loaded in page cache, but is not yet mapped into the process's virtual memory space. In some circumstances, memory mapped file I/O can be substantially slower than standard file I/O.
- Another drawback relates to a given architecture's address space - a file larger than the addressable space can have only portions mapped at a time, complicating reading it. For ex: a 32-bit architecture such as Intel's IA-32 can only directly address 4 GB or smaller portions of files.
- Most common use is the process loader in most modern OS (including Windows & Unix). When a process is started, OS uses a memory mapped file to bring the executable file, along with any loadable modules, into memory for execution.
- Another common use is to share memory between multiple processes. In modern OS, processes are generally not permitted to access memory space that is allocated for use by another process. There are a number of techniques available to safely share memory, and memory-mapped file I/O is one of the most popular. Two or more applications can simultaneously map a single physical file into memory and access this memory.
Most modern OS or runtime environments support some form of memory mapped file access. The function mmap(), which creates a mapping of a file given a file descriptor, starting location in the file, and a length, is part of the POSIX specification. So, POSIX-compliant systems, such as Unix, Linux, Mac OS etc. support a common mechanism for memory mapping files. The mmap() function establishes a mapping between a process' address space and a stream file.
The Microsoft Windows operating systems also support a group of API functions for this purpose, such as CreateFileMapping(). Java provides classes and methods to access memory mapped files, such as FileChannel.
Q13. MemoryMapFile Usage example.
Ans. We used the FileChannel class along with the ByteBuffer class to perform memory-mapped IO for data of type byte. These byte is then retrieved by using get() method of ByteBuffer class.
FileChannel: An abstract class used for reading, writing, mapping, and manipulating a file.
ByteBuffer: An abstract class which provides methods for reading and writing values of all primitive types except Boolean.
map() method: This method maps the region of the channel's file directly into memory.
size() method: This method returns the current size of this channel's file.
Usage Modes:
FileChannel.MapMode.PRIVATE: Mode for a private (copy-on-write) mapping.
FileChannel.MapMode.READ_ONLY: Mode for a read-only mapping.
FileChannel.MapMode.READ_WRITE: Mode for a read/write mapping.
Usage example:
File file = new File("filename");
// Create a read-only memory-mapped file
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, (int)roChannel.size());
// Create a read-write memory-mapped file
FileChannel rwChannel = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE,0,(int)rwChannel.size());
// Create a private (copy-on-write) memory-mapped file.
// Any write to this channel results in a private copy of the data.
FileChannel pvChannel = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer pvBuf = roChannel.map(FileChannel.MapMode.READ_WRITE,0,(int)rwChannel.size());
Although the return value from map() is assigned to a ByteBuffer variable, it's actually a MappedByteBuffer. Most of the time there's no reason to differentiate, but the latter class has two methods that some programs may find useful - load() and force().
The load() method will attempt to load all of the file's data into RAM, trading an increase in startup time for a potential decrease in page faults later. This is a form of premature optimization. Unless your program constantly accesses those pages, OS may choose to use them for something else, meaning that you'll have to fault them in. To flush dirty pages to disk, call the buffer's force() method.
buf.putInt(0, 0x87654321);
buf.force();Above two lines of code are actually an anti-pattern: you don't want to flush dirty pages after every write. Take a lesson from database developers, and group your changes into atomic units.
Q14. Mapping Files Bigger than 2 GB
Ans. Depending on your filesystem, you can create files larger than 2GB. But ByteBuffer uses an int for all indexes, which means that buffers are limited to 2GB, which means that you need to create multiple buffers to work with large files.
Sol1: Create those buffers as needed. The same underlying FileChannel can support as many buffers as you can create, limited only by the OS and available virtual memory; simply pass a different starting offset each time. The problem with this approach is that creating a mapping is expensive, because it's a kernel call (and you're using mapped files to avoid kernel calls). In addition, a page table full of mappings will mean more expensive context switches. As a result, as-needed buffers aren't a good approach unless you can divide the file into large chunks that are processed as a unit.
Sol2: Create a “super buffer” that maps the entire file and presents an API that uses long offsets. Internally, it maintains an array of mappings with a known size, so that you can easily translate the original index into a buffer and an offset within that buffer:
public int getInt(long index) {
return buffer(index).getInt();
}
private ByteBuffer buffer(long index) {
ByteBuffer buf = _buffers[(int)(index / _segmentSize)];
buf.position((int)(index % _segmentSize));
return buf;
}
What's a good value for _segmentSize? Your first thought might be Integer.MAX_VALUE, since this is the maximum index value for a buffer. While that would result in the fewest number of buffers to cover the file, it has one big flaw - you won't be able to access multi-byte values at segment boundaries. Instead, you should overlap buffers, with the size of the overlap being the maximum sub-buffer that you need to access.
NOTE: Buffer will persist after the channel is closed, it's removed by the garbage collector (and this explains the reason that MappedByteBuffer doesn't have its own close() method).
Q15. Garbage Collection of Direct/Mapped Buffers
Ans. How does the non-heap i.e. virtual memory for direct buffers and mapped files get released? After all, there's no method to explicitly close or release them. The answer is that they get garbage collected like any other object, but with one twist: if you don't have enough virtual memory space, that will trigger a full collection even if there's plenty of heap memory available. Normally, this won't be an issue: you probably won't be allocating and releasing direct buffers more often than heap-resident objects. If, however, you see full GC's appearing when you don't think they should, take a look at your program's use of buffers.
The Microsoft Windows operating systems also support a group of API functions for this purpose, such as CreateFileMapping(). Java provides classes and methods to access memory mapped files, such as FileChannel.
Q13. MemoryMapFile Usage example.
Ans. We used the FileChannel class along with the ByteBuffer class to perform memory-mapped IO for data of type byte. These byte is then retrieved by using get() method of ByteBuffer class.
FileChannel: An abstract class used for reading, writing, mapping, and manipulating a file.
ByteBuffer: An abstract class which provides methods for reading and writing values of all primitive types except Boolean.
map() method: This method maps the region of the channel's file directly into memory.
size() method: This method returns the current size of this channel's file.
Usage Modes:
FileChannel.MapMode.PRIVATE: Mode for a private (copy-on-write) mapping.
FileChannel.MapMode.READ_ONLY: Mode for a read-only mapping.
FileChannel.MapMode.READ_WRITE: Mode for a read/write mapping.
Usage example:
File file = new File("filename");
// Create a read-only memory-mapped file
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, (int)roChannel.size());
// Create a read-write memory-mapped file
FileChannel rwChannel = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer wrBuf = rwChannel.map(FileChannel.MapMode.READ_WRITE,0,(int)rwChannel.size());
// Create a private (copy-on-write) memory-mapped file.
// Any write to this channel results in a private copy of the data.
FileChannel pvChannel = new RandomAccessFile(file, "rw").getChannel();
ByteBuffer pvBuf = roChannel.map(FileChannel.MapMode.READ_WRITE,0,(int)rwChannel.size());
Although the return value from map() is assigned to a ByteBuffer variable, it's actually a MappedByteBuffer. Most of the time there's no reason to differentiate, but the latter class has two methods that some programs may find useful - load() and force().
The load() method will attempt to load all of the file's data into RAM, trading an increase in startup time for a potential decrease in page faults later. This is a form of premature optimization. Unless your program constantly accesses those pages, OS may choose to use them for something else, meaning that you'll have to fault them in. To flush dirty pages to disk, call the buffer's force() method.
buf.putInt(0, 0x87654321);
buf.force();Above two lines of code are actually an anti-pattern: you don't want to flush dirty pages after every write. Take a lesson from database developers, and group your changes into atomic units.
Q14. Mapping Files Bigger than 2 GB
Ans. Depending on your filesystem, you can create files larger than 2GB. But ByteBuffer uses an int for all indexes, which means that buffers are limited to 2GB, which means that you need to create multiple buffers to work with large files.
Sol1: Create those buffers as needed. The same underlying FileChannel can support as many buffers as you can create, limited only by the OS and available virtual memory; simply pass a different starting offset each time. The problem with this approach is that creating a mapping is expensive, because it's a kernel call (and you're using mapped files to avoid kernel calls). In addition, a page table full of mappings will mean more expensive context switches. As a result, as-needed buffers aren't a good approach unless you can divide the file into large chunks that are processed as a unit.
Sol2: Create a “super buffer” that maps the entire file and presents an API that uses long offsets. Internally, it maintains an array of mappings with a known size, so that you can easily translate the original index into a buffer and an offset within that buffer:
public int getInt(long index) {
return buffer(index).getInt();
}
private ByteBuffer buffer(long index) {
ByteBuffer buf = _buffers[(int)(index / _segmentSize)];
buf.position((int)(index % _segmentSize));
return buf;
}
What's a good value for _segmentSize? Your first thought might be Integer.MAX_VALUE, since this is the maximum index value for a buffer. While that would result in the fewest number of buffers to cover the file, it has one big flaw - you won't be able to access multi-byte values at segment boundaries. Instead, you should overlap buffers, with the size of the overlap being the maximum sub-buffer that you need to access.
NOTE: Buffer will persist after the channel is closed, it's removed by the garbage collector (and this explains the reason that MappedByteBuffer doesn't have its own close() method).
Q15. Garbage Collection of Direct/Mapped Buffers
Ans. How does the non-heap i.e. virtual memory for direct buffers and mapped files get released? After all, there's no method to explicitly close or release them. The answer is that they get garbage collected like any other object, but with one twist: if you don't have enough virtual memory space, that will trigger a full collection even if there's plenty of heap memory available. Normally, this won't be an issue: you probably won't be allocating and releasing direct buffers more often than heap-resident objects. If, however, you see full GC's appearing when you don't think they should, take a look at your program's use of buffers.
No comments:
Post a Comment