1. Introduction

This study is conducted based on the theoretical framework of Multimodal Systemic Functional Linguistic (MSFL) and Forensic Linguistics (FL). Systemic Functional Linguistic (LFS) is known as the study of texts and contexts, while multimodal is the study of visual text. FL, according to Olsson (2008: 4), is the study related to the application of linguistic knowledge and techniques to the language facts implicated in legal cases, private disputes involving certain parties which may at a later stage result in legal action of some kind being taken.

MSFL and FL are used in this article to analyze functional features of forensic visual multimodal texts. From the analysis, it is expected to develop a framework of analysis that contributes to the development of research framework of the forensic-multimodal studies. Forensic linguistics limits its discussion to the use of language in the field of law, in terms of how the language is used in a judicial process from the examination on the defendant and witnesses done by the police up to the language used by judges, prosecutors, and legal counsel in the courtroom.

In relation to the parameters of multimodal and forensic linguistics, the functional features of interaction are the source of data of this research. It focuses on the motion objects such as gestures, one's gaze, head movements and the other that can have a profound effect on verbal communication in forensic case. The interaction took place in the Indonesian Court for Corruption Crimes and it is used as the source of data of this research. In this research, the problem is formulated as “What are the multimodal functional features of law enforcement and witnesses in the proceedings of forensic corruption case in Indonesia?”

Multimodal and forensic linguistic studies, as a science that is still on the lookout for its identity despite its rapid development in Indonesia since 2000's, and SFL theory has been widely spread throughout the world since the 1980's but has become more well known in Indonesia since 2013 as the foundation of the K-13 or 2013 Curriculum. In terms of the phenomena of corruption courts and due to the fact that corruption cases are currently the leading national problems faced by the Republic of Indonesia, therefore, forensic linguistic research at the theoretical level along with its various rules has been done by other researchers who put forth a theory and its implementation model of analysis yet it requires development, linking the theoretical-conceptual level as the foundation of functional linguistics based-theory which is believed to be reliable in analyzing verbal and visual discourse. The previous researche by Susanto (2017) results on theoretical-conceptual level as a basis in decoding and interpreting all verbal language in the trials to find out whether the speech recordings are spoken by the suspects or not. The task is called as Forensic Speaker Verification (FSV).

2. Literature Review

The scope of forensic linguistics both practically and theoretically in the linguistic study has been published by the Center for Strategy Development and Linguistic Diplomacy (2016) as the guidance is illustrated in the following Figure 1.

Figure 1

The Scope of Forensic Linguistic Studies (adapted from the source: Guidance of Forensic Linguistic Studies; Center for Strategy Development and Linguistic Diplomacy: 2016).


In the scope of analysis of forensic linguistic studies displayed in Figure 1 above, the parameters used in a forensic linguistic research in Indonesia involves three aspects, i.e. (1) language in legal products, (2) language in courts, and (3) language as evidence; focusing on written and oral language. The linguistic aspects scientifically studied in a forensic linguistic research focuses on phonetics and phonology in forensic phonetics, morphology, syntax, semantics, pragmatics and socio-pragmatics, language styles, discourse analysis, linguistic proficiency, dialectology, language honesty, analysis of language structure and authorship.

Forensic phonetic analysis identifies the acoustic sound quality of the speaker to identify various sounds previously heard, speaker profiling, voice recording to check the authentication and speech coding in an argumentation [3]. The morphological examination is to (a) conform of morphological processes of words in legal products with their grammatical rules and (b) its tendency that certain morpheme to be used in a person's language style can be distinguished from his/her language so they can be used in the analysis process of author identification. Through syntax, the identification can be used as (a) the compatibility of sentences with their grammatical rules, (b) of the original author of a work, (c) the analysis of transitivity in a critical systemic discourse analysis, and (d) the simplification of complex sentences as well as semantics analysis of linguistic meaning of a legal product to investigate the ambiguity of meanings which can lead to multiple interpretations of such legal product, and the discourse meaning investigate the choice of word that has a particular meaning both literally and figuratively which implies the specific purpose of the speaker.

Pragmatics and socio-pragmatics are used in the analysis of oral and written discourse such as conversation between the perpetrators, conversation in the investigation process, or conversation in the proceedings, as well as written discourse such as social media texts potentially causing a legal action. Language styles are used to analyze sound, translation and interpreting, dialect identification, and discourse analysis (Mcnamin, 2010 in Coulthard and Johnson, 2010) and to identify the true author of anonymous writings such as in anonymous letters, threatening letters, and so forth [14].

Linguistic proficiency helps to identify whether the suspect is intentionally silent or because he/she has no such language skills that make him/her unable to catch the intended meaning of the investigator's question or unable to speak well to express his/her intention. Dialectology analyzes language data, particularly speech, in order to recognize unknown speaker dialects, to determine his/her social accent [14], and to explore and identify the origin and authenticity of language from unknown speakers.

Discourse analysis focuses on the cognitive system affecting the use of language and social interaction and the structure of both oral and written discourse by applying linguistic criteria such as morphology, syntax, semantics, pragmatics, etc., including by utilizing discourse markers to obtain the unity and meaningfulness of the discourse [3].

Language honesty identifies whether a suspect is telling the truth, making up or covering the actual event through the research on the sentence structure or the word choice of the suspect's testimony [14] combined with the help of a lie detector effectively supporting the success of the investigation. Analysis of language structure examines the structure of language in legal products whether it is in accordance with the linguistic rules. Authorship is used in the case of plagiarism or the investigation of a text whose actual writer is unknown [11].

Multimodal analysis

This study focuses on discourse analysis to examine the multimodal use of language affecting the social interaction. Multimodal analysis refers to a particular approach to studying social interaction that seeks to analyse all the modes through which people act. A multimodal analysis would examine written language and grapheme as well as spoken language such as sounds, gesture, posture shifts, gaze shifts, head movements and so forth. Multimodal analysis might also include the wider environment such as the placement of furniture, the layout of the cafe, and the other people present [10].

Based on Halliday's (1985, 1994) metafunction of language, Kress and Van Leeuwen (1996) developed the terminology in discussing the meaning of image in visual communication: representational for ideational meaning; interactive for interpersonal meaning; and compositional for textual meaning.

Representational meanings

Narrative structure realizes a narrative representation: connecting lines without an indicator of directionality form a particular kind of analytical structure, and mean something like `is connected to', `is conjoined to', `is related to'. The `Actor' is the participant from whom or which the vector departs, and which may be fused with the vector to different degrees. The narrative processes are realized by a visual element, an image as a visual element, a number or an equation as a visual element.

Processes take a whole visual (or verbal) proposition as their `object' projective, and the others non-projective. Transactional action is either unidirectional (connects two participants, an actor and a goal.) or bidirectional (connects two Interactors). Non-transactional (emanates from a participant, the Actor, but does not point at any other participant). Projective divided into non-transactional reaction, mental process (similar conventional device connects two participants, the Senser and the phenomenon), and verbal process (similar device connects two participants, a Sayer and an utterance). Conversion: A process in which a participant is the goal of one action and the actor of another. This involves a change of state in the participant.

The second element is Circumstances that is divided into setting, means, and accompaniment. Setting: a process is recognizable because the participants in the foreground overlap and hence partially obscure it; because it is often drawn or painted in less detail, or, in the case of photography, has a softer focus; and because of contrasts in color saturation and overall darkness or lightness between foreground and background. Means: a process is formed by the tool with which the action is executed. It usually also forms the vector. Accompaniment: is a participant in a narrative structure which has no vectorial relation with other participants and cannot be interpreted as a Symbolic Attribute.

Interactive meaning

Interactive meanings of images are related to depicting human or quasi-human participants, must choose to make them look at the viewer or not, so they must also, and at the same time, choose to depict them as close to or far away from the viewer and this applies to the depiction of objects also. The choice of distance can suggest different relations between represented participants and viewers.

Contact is divided into demand and offer. Demand: A subject, who demands, for instance, makes eye contact with the viewer. They gaze, or gesture, as though they want something. They might smile, demanding social affinity. They could stare coldly, demanding we relate to them like a superior. Or, they could pout -demanding we desire them. A subject, who demands, commands a sense of respect, connection, and engagement from their audience. Demand-subject images are common with photographs of authority figures, like celebrities and role models. Offer: conversely, subjects who offer do not engage with the viewer at all. They are passive. Their image is offered to the viewer, as an object of dispassionate reflection.

Social Distance is talking about context of range which means range of taken images. It divided into intimate/personal, social, and impersonal. Intimate/personal: image that taken at close shot, intimate shots, can either humanize the subject or distort them unflatteringly. Social: image that taken at medium shot, the subject is visible from below waist/halfway thigh, with space around the figure.

Attitude is subjective when talking about context of angle that image taken that is involvement frontal angle, oblique angle, high angle, and eye-level angle as representation of power.


Composition relates the representational and interactive meanings of the image to each other through three interrelated systems: information value, salience and framing. The information value is the placement of elements (participants that relate them to each other and to the viewer) endows them with the specific informational values attached to the various `zones' of the image: left and right, top and bottom, centre and margin.

Salience is the elements (participants as well as representational) are made to attract the viewer's attention to different degrees, as realized by such factors as placement in the foreground or background, relative size, contrasts in tonal value (or colour), differences in sharpness, etc. Framing is the presence or absence of framing devices (realized by elements which create dividing lines, or by actual frame lines) disconnects or connects elements of the image, signifying that they belong or do not belong together in some sense.

These three principles of composition apply not just too single pictures; they apply also to composite visuals, visuals which combine text and image and, perhaps, other graphic elements, be it on a page or on a television or computer screen which means it can also be applied to video game visual images.

3. Research Method

This research used descriptive qualitative method in collecting data of actual information. It used to identify the problems, make comparison or evaluation, and learn through experiences to decide and conclude plan. The data analysis technique that researcher used was descriptive qualitative approach based on Miles, Huberman and Saldana (2014) consisting of three interconnected sub-processes: data condensation, data display and conclusion drawing/ verification.

The data were analyzed using an interactive model proposed by Miles, Huberman and Saldana (2014). The condensation consists of selecting the data, deals to select the forensic visual texts that convey hidden message, focusing the researcher to analyse the data focused on forensic texts that consist of five visual texts. Simplifying the data, the researcher noted the kinds of linguistic and visual components separately based on multimodal metafunction theory which deal to the data that have been selected. Then abstracting the data that have been based on verbal and visual metafunction components. Transforming the data which have been analysed based on variables, verbal text and visual multimodal to derive connections among them.

The researcher displayed the verification of visual components through tables. The researcher drew conclusion based on the problem of the study, how visual components are interrelated in conveying the meaning.

4. Analysis and Discussion

Representation and narrative structure

Five courts had been identified portraying their characteristics of representational, interactive and compositional meanings. Their narrative structure consists of human participants in the court as Sayers and Phenomena, the circumstances of setting, means and accompaniments, and the processes of mental and verbal. All data perform the narrative structures, the process agentive and non-agentive conversion existed when interaction took place between two or more actors in the court the CHIEF JUDGE (CJ), judges, prosecutors and witness. The human participants are around 20 to 30 participants in the court as the Sayers, the Phenomenon, and the Reacters. The Sayers are CJ and judge.

In this court, witnesses as participants wear batik shirt and white shirt. They are ready with a microphone holding in each palm. The witnesses as the participants acting as Phenomenon and they are at which all the human participants' eye lines directed to CJ and J. From the images, the eye lines are directed to the left-sided position. There are also other participant's forms the object of the Reacters' look, but was seen as passive participants in transactional reaction

The circumstance has the focus setting more on CJ in the centre, the circumstance means of CJ is represented by the means are the hammer that he holds and the microphone in front of him set facing at him in the middle position. The linguistic emblem has the name that written in the name plate “HAKIM KETUA”, while the visual emblem is visualized by his robes as his identity as CJ. The accompaniment is the document hold by CJ, etc. The reactor is the man sitting on the right which has less focus since he has no vectoral relation to the other participants.

The Setting in court room, hence, the means is the top of the camera flash in front of the judges' position, and the accompaniment are the documents, papers on the table the emblem name that has PROSECUTORS written in it, as the setting circumstances.

Mental processes are dominantly as vectors formed as a device connects two participants, the Senser and the Phenomenon. In this part, the vector is the witness' eye line toward phenomenon. Verbal processes are also formed by devices which connected two participants, a Sayer and its utterance the Verbiage. Most movements of the witness' (Sayer's) mouth, hand and head are the process of the Sayer in giving and describing the information.

The data also perform the process is `agentive: conversion' because of the interaction between two actors or more, directional transaction while two witnesses at the same time gave action by reaching out his hand to the other witness (they are spouse).

Interactive meaning

The interactive meanings figure the Contact of `offer' and `demand', attitude. It happens when both participants Chief Judge(CJ) and the witness dominantly have their eye contacts through the relationships of CJ and witness by questions-answers and listening process. Demand portrays the social distance when the presence of gaze at the viewer.

The attitude in the court is due to the subjective and objective Involvement use of frontal angle and action orientation. The contact is offer since Sayers and Phenomenon have no eye-contact to the viewers. When the medium shot is presence, the social distance is social. Impersonal displays the whole body of the participants from the top to the feet in image 1 up to image 28. In this case, the researcher focuses on the sayer. Because of these long shot, many signs on the face expressions cannot be identified clearly. For example, the movements of the sayer's eyes could not be identified clearly whether he is telling a lie or truth.

Social is the medium shot. The social elements are seen depicted through the sayer's body and medium shot on the figure that are captured from head to hip or stomach. This social is displayed in image 29 up to 50. By this shot, the figure can be identified clearer compared to long short. Involvement is vanishing lines that can be formed based on the formations of all participants in the court room that they are able to meet to form a 90 degree. The image 1 up to 28 shows involvement between the sayer and the hidden judges in term of giving information.

The contact is `demand' since it has no eye-contact to the viewer. It has medium shot so the social distance is `social'. The angle is frontal made the subjective attitude `involvement' and because it was taken from the front the objective attitude becomes `Action Orientation'.

Detachment as an oblique angle of the position of viewers is not parallel to frontal plane. It is a vanishing line of the frontal represented participants, is drawn in parallel. It is used to ease the researchers to decide and show from where the viewer views the image. In terms of equality, it is the point of view of their eye level to explain that there is power difference involved between the judges and the witness. It considers the level where his eye line always directed at. His eyes always direct at the phenomenon. Impersonal connection happens at this distance, it tends to compress their facial features and look quite flattering.

The evidence was the witness responded the answer minimal by saying yes, no, or hmmmm from the demand requesting and demand on the confirmation. Authority, control, and status also were not balance in terms of its distribution. It caused there was a little change of the expression.

Compositional meaning

The situation can be seen the witness entered the court room. Witness 1 came as a witness for the PC case. PC comes as the suspect already arrives before G and took a seat beside the judge. Witness 1 who was offering a seat to one of the witness. PC seems looking at the judge sometimes and sometimes at the witness. His head looks downward before the judge started the case. When the court started and the witness giving their testimony, PC dominantly looks at the witness with his palm covering mouth. This position shows a high stress. By holding hands and putted the hands around the stomach that person is trying to hold back that feeling (Pease, 1995: 41-42).

Composition has been characterized by Information value, Salience and Framing. The data shows the salience is maximum because the layout position of Chief Judge (CJ) is in the centre. The minimum salience but a full framing with “HAKIM KETUA” written in the plate. Centred takes place as the participant position is in the centre. The second centred position is the witnesses, more focussed. The three participants' witnesses act as Salience since they become the main object in the context. The minimum salience is the sign of Prosecutor inside the green framing.

The other situation portrays a witness and his spouse acted as the second witness. A smile is seen on the witness's face. His head tilts to the right side opposed to his standing pose which is inclining to the left. His spouse barely smiles, just drawing up a little the tip of her lips. Her hand palm was shown open and waving at the viewers at the court. Hands are holding each other put in line with upper side of her spouse's stomach. The witness frequently looks at below while once a while viewing the reporters.

The very clear expression on his face is the rising of eyebrows and the appearance of his wrinkle lines while he is speaking and smiling. Besides that, his eyes are mostly at the same level, but cannot be identified what the cornea movement clearly expresses. The very clear messages have been seen is through the hand position and movements. His left hand always touches the bench handle. It describes someone in a nervous condition therefore he needs something to hold to support himself. Sometimes, he moves his hand with an open palm on the top. It describes a confidence and tries to convince that what he is saying is truth. When he is pointing his hand with the palm down, it describes that he thinks he has higher level. The last, when he is placing on of his hand under another hand, it describes less self-confidence [1].

5. Conclusion

Theoretically, multimodal texts are the combination of oral interaction recording transcript, writing and visual that have functional features and used as the research data. However, in this research, the analysis was limited only in developing the texts visual analysis in the examination on the witnesses used as the source of data to get the information or representative confirmation, interaction, and composition.

It concluded that the presence of functional features on representational, interactive and compositional meanings with images through three systems in forensic context was present. Generally, the result of meaning analysis that human participants in the court as Sayers and Phenomena, the circumstances of setting, means and accompaniments and the processes including processes of mental and verbal resulted the asymmetric correlation between judge, lawyer, and witness which oriented and understood by all the participants in the court room.

In information value, the placement of the elements of participants that relate them to each other and to the viewer provides them the value of specific information about whatever elements exist in the visible images from right and left, top and bottom, and center and margin. At the value of information there are two Centered is a central element placed in the center of a composition consisting of Triptych as a non-central element in a centered composition placed either on the right or left side, or above or below the Center, and Circular as a non-central element in a centered composition placed both above and below and to the sides of the Center, and further elements placed between these polarized positions.

The Saliences are made to draw the viewer's attention to different degrees as the placement of background, foreground, relative size, contrast in tonal values, differences in sharpness, etc., and the Framing is realized by the elements that create dividing lines or frame lines disconnected or connected to the image, indicating that they belong or do not belong together.

A forensic text was proved different from the other social interaction texts which are richer on the representational expression, interactive, and compositional meanings. Theory analysis and the appropriate method were used to analyze the forensic data was the multimodal analysis that have the relations of these contexts.

This research is being supported by the Research Project 2018 (PDUPT 2018), Ministry of Research and Higher Education respectively that is being granted to University of Sumatera Utara.



