initial commit

This commit is contained in:
Ken Yasue
2025-03-23 20:32:57 +01:00
parent c2e18462be
commit 4b42f7bf3a
14 changed files with 1884 additions and 132 deletions

146
README.md
View File

@ -1,2 +1,146 @@
# tripadviser_scraper
# TypeScript Selenium Project
A TypeScript project with Selenium WebDriver and VSCode debugging configured. This project demonstrates how to use Selenium to automate browser interactions, specifically opening TripAdvisor.com.
This project is Git-enabled, allowing you to track changes and revert to previous states if needed.
## Project Structure
```
.
├── .vscode/ # VSCode configuration
│ ├── launch.json # Debug configuration
│ └── tasks.json # Build tasks
├── src/ # Source files
│ └── index.ts # Selenium script to open TripAdvisor.com
├── dist/ # Compiled JavaScript files
├── package.json # Project dependencies and scripts
├── tsconfig.json # TypeScript configuration
└── README.md # This file
```
## Dependencies
This project uses the following dependencies:
- **TypeScript**: JavaScript with syntax for types
- **Selenium WebDriver**: Browser automation framework
- **ChromeDriver**: WebDriver for Chrome browser
## Available Scripts
- `npm run build` - Compiles TypeScript to JavaScript
- `npm run start` - Runs the compiled JavaScript (opens TripAdvisor.com in Chrome)
- `npm run dev` - Runs the TypeScript code directly using ts-node (opens TripAdvisor.com in Chrome)
## Selenium WebDriver
The main script (`src/index.ts`) demonstrates:
1. Setting up a Chrome WebDriver instance with session management
2. Navigating to TripAdvisor.com
3. Waiting for the page to load
4. Closing the browser after a delay
### Session Management
This project includes session management capabilities:
- Browser sessions are stored in the `sessions/` directory
- Chrome is configured to use a persistent user data directory
- Sessions persist between runs, allowing for:
- Preserved cookies and login states
- Cached resources for faster loading
- Consistent testing environment
To clear session data and start fresh, you can delete the contents of the `sessions/` directory (except for README.md).
## Debugging in VSCode
This project includes two debug configurations:
1. **Launch Program** - Builds the TypeScript code and then debugs the compiled JavaScript
2. **Debug TS with ts-node** - Directly debugs the TypeScript code using ts-node
Both configurations can be used to debug the Selenium WebDriver script.
To start debugging:
1. Open the Debug view in VSCode (Ctrl+Shift+D or Cmd+Shift+D on macOS)
2. Select the debug configuration you want to use from the dropdown
3. Press F5 or click the green play button
## Adding Breakpoints
1. Click in the gutter next to the line number where you want to add a breakpoint
2. When debugging, execution will pause at the breakpoint
3. You can inspect variables, the call stack, and step through code
## Git Version Control
This project is set up with Git for version control. Here's how to use Git to track and revert changes:
### Viewing Changes
```bash
# See what files have been modified
git status
# See detailed changes in files
git diff
```
### Committing Changes
```bash
# Stage changes for commit
git add .
# Commit changes with a descriptive message
git commit -m "Description of changes"
```
### Reverting Changes
```bash
# Discard changes in working directory for a specific file
git checkout -- <file>
# Discard all changes in working directory
git checkout -- .
# Revert to a specific commit
git reset --hard <commit-hash>
# Undo the last commit but keep the changes
git reset --soft HEAD~1
# Create a new commit that undoes changes from a previous commit
git revert <commit-hash>
```
### Viewing History
```bash
# View commit history
git log
# View commit history with a graph
git log --graph --oneline --all
```
### Branching
```bash
# Create a new branch
git branch <branch-name>
# Switch to a branch
git checkout <branch-name>
# Create and switch to a new branch
git checkout -b <branch-name>
# Merge a branch into the current branch
git merge <branch-name>
```