Topic modelling of legal documents using NLP and bidirectional encoder representations from transformers

Amar Jeet Rawat, Sunil Ghildiyal, Anil Kumar Dixit

Abstract


Modeling legal text is a difficult task because of its unique features, such as lengthy texts, complex language structures, and technical terms. During the last decade, there has been a big rise in the number of legislative documents, which makes it hard for law professionals to keep up with legislation like analyzing judgements and implementing acts. The relevancy of topics is heavily influenced by the processing and presentation of legal documents in some contexts. The objective of this work is to understand the legal judgement corpus related to cases under the Hindu Marriage Act of India. The study looked into various methods to generate sentence embeddings from the judgement. This paper employs the power of the BERTopic algorithm for generating significant topics.

Keywords


BERTopic; Document clustering; Latent dirichlet allocation; Latent semantic analysis; Natural language processing; Topic modeling

Full Text:

PDF


DOI: http://doi.org/10.11591/ijeecs.v28.i3.pp1749-1755

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Indonesian Journal of Electrical Engineering and Computer Science (IJEECS)
p-ISSN: 2502-4752, e-ISSN: 2502-4760
This journal is published by the Institute of Advanced Engineering and Science (IAES) in collaboration with Intelektual Pustaka Media Utama (IPMU).

shopify stats IJEECS visitor statistics